-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Hello Emu2 Team,
First and foremost, thank you for your incredible work on Emu2 and for open-sourcing this powerful model. It's a fantastic contribution to the multimodal research community.
I am currently digging deep into your work to better understand its foundations. For the purpose of reproducibility and a thorough analysis of the model's properties, understanding the exact starting checkpoint is crucial.
In your paper (arXiv:2312.13286, Sec. 2.1), you state that the Multimodal Modeling component was initialized with a LLaMA-33B model. As far as I know, Meta's official LLaMA-1 release included 7B, 13B, 30B, and 65B models. This has led to a crucial question for my research setup:
Could you please clarify which specific pre-trained checkpoint was used for the LLaMA-33B initialization?
For instance, was it the LLaMA-1-30B checkpoint? Or perhaps a community-finetuned version like Vicuna-33B?
Knowing the precise origin of the base LLM would be immensely helpful for any researcher looking to build upon or reproduce aspects of your training methodology. I've looked through the repository configs but couldn't pinpoint this specific detail.
Thank you for your time and for creating such an inspiring project!