-
Notifications
You must be signed in to change notification settings - Fork 86
how to enable the 'share' training recipe in the paper #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The scripts for 'share' training strategy are located in script/train/share. |
Thank you very much for sharing!
So it looks like Stage 2 uses the ShareGPT4V as the training data, but inherits the connector trained with LLaVA-1.5-558k in Stage 1. Is my understanding correct? |
Yes, you are right. |
In fact, the share training recipe keeps the vision tower frozen. The partial training of the vision tower mentioned in the paper corresponds to the partially-tune setting. We have identified some issues in the original code, so the scripts and results we provide for the share training recipe are based on keeping the vision tower frozen. |
Thanks for the clarification. So the `share training recipe' just looks like performing a warm-up on the projector using the ShareGPT4V data? |
It also changes the training recipe of the LLM from frozen to full during pretraining. |
Hi, many thanks for your awesome work first!
However, I am a bit confused about how to enable the 'share' training recipe correctly in the paper. This setting looks like a three-stage training.
Stage 1: Use the 'base' training recipe only to train the connector on the pre-training dataset (e.g., LLaVA-1.5-558k).
Stage 2: Inherit the connector from Stage 1, and pertain the partial visual encoder, connector and the llm using the same pre-training dataset (e.g., LLaVA-1.5-558k) for one epoch.
Stage 3: Only train the connector and the llm on SFT data like LLaVA-1.5-mix-665k.
It looks that the pre-training data is actually trained for two epochs (in stage 1 and stage 2) if my understanding is correct.
But what hyperparameter configuration is used in this stage 1? and do we need to train the full-epoch data in this Stage?
Also, May I ask where to find the shell scripts about using 'share' training recipe in the codebase?
The text was updated successfully, but these errors were encountered: