About 45M Training Data

Thanks for the excellent work! 

In the paper, the text-to-image generation model is reported to be trained on 45M images. However, SAM-1B contains ~10M images, JourneyDB ~4M, and ImageNet-1K ~1M. 

Could you please kindly clarify where the 45M images comes from, or am I misunderstanding something?