Skip to content

how to enable the 'share' training recipe in the paper #150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Danield21 opened this issue Dec 6, 2024 · 7 comments
Open

how to enable the 'share' training recipe in the paper #150

Danield21 opened this issue Dec 6, 2024 · 7 comments

Comments

@Danield21
Copy link

Hi, many thanks for your awesome work first!

However, I am a bit confused about how to enable the 'share' training recipe correctly in the paper. This setting looks like a three-stage training.

Stage 1: Use the 'base' training recipe only to train the connector on the pre-training dataset (e.g., LLaVA-1.5-558k).
Stage 2: Inherit the connector from Stage 1, and pertain the partial visual encoder, connector and the llm using the same pre-training dataset (e.g., LLaVA-1.5-558k) for one epoch.
Stage 3: Only train the connector and the llm on SFT data like LLaVA-1.5-mix-665k.

It looks that the pre-training data is actually trained for two epochs (in stage 1 and stage 2) if my understanding is correct.
But what hyperparameter configuration is used in this stage 1? and do we need to train the full-epoch data in this Stage?

Also, May I ask where to find the shell scripts about using 'share' training recipe in the codebase?

@ZhangXJ199
Copy link
Collaborator

The scripts for 'share' training strategy are located in script/train/share.

@Danield21
Copy link
Author

The scripts for 'share' training strategy are located in script/train/share.

Thank you very much for sharing!

DATA_PATH=/home/ai/data/llava/dataset/text_files/blip_laion_cc_sbu_558k.json
SHARE_PRETRAIN_DATA_PATH=/mnt/data/sata/ssd/dataset/text_files/really_cleaned_share-captioner_coco_lcs_sam_1246k_1107.json
SHARE_FINETUNE_DATA_PATH=/mnt/data/sata/ssd/dataset/text_files/cleaned_sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json
IMAGE_PATH=/home/ai/data/llava/dataset/llava/llava_pretrain/images
SHARE_PRETRAIN_IMAGE_PATH=/home/ai/data/llava/dataset
SHARE_FINETUNE_IMAGE_PATH=/home/ai/data/llava/dataset

LLM_VERSION=microsoft/phi-2
VT_VERSION=google/siglip-so400m-patch14-384
VT_VERSION2=""
CN_VERSION=mlp2x_gelu
CONV_VERSION=phi
VERSION=share
TRAIN_RECIPE=common
MODEL_MAX_LENGTH=3072



bash scripts/train/pretrain.sh "$DATA_PATH" "$IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH"
bash scripts/train/share/pretrain_share.sh "$SHARE_PRETRAIN_DATA_PATH" "$SHARE_PRETRAIN_IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH" 
bash scripts/train/share/finetune_share.sh "$SHARE_FINETUNE_DATA_PATH" "$SHARE_FINETUNE_IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$CONV_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH"

So it looks like Stage 2 uses the ShareGPT4V as the training data, but inherits the connector trained with LLaVA-1.5-558k in Stage 1. Is my understanding correct?

@ZhangXJ199
Copy link
Collaborator

Yes, you are right.

@Danield21
Copy link
Author

Hi,

I found a point really confusing after reading the script `script/train/share/pretrain_share.sh.'

In the paper, it looks like the `share training recipe' will train partial parameters of the vision encoder (VE). But in the script, it looks that the whole VE is frozen, based on the args '--tune_type_vision_tower frozen '
WeChat32862c4869234d4ebefbd8114f5ae155

Then I check the code in `TinyLLaVA_Factory/tinyllava/training_recipe/base.py'

WeChat2615afecf3d51cf4410c044aa05b9342

It seems that '--tune_type_vision_tower frozen ' refers to the whole VE is not trained during the training. And '--tune_type_vision_tower partially-tune ' is actually suitable for the `share training recipe'.

So I just want to make sure whether the training of phi-2' is really based on the share training recipe' mentioned in the paper, when runing the script `TinyLLaVA_Factory/scripts/train/share/train_phi_share.sh' in the repo?

@Danield21 Danield21 reopened this Dec 7, 2024
@ZhangXJ199
Copy link
Collaborator

In fact, the share training recipe keeps the vision tower frozen. The partial training of the vision tower mentioned in the paper corresponds to the partially-tune setting. We have identified some issues in the original code, so the scripts and results we provide for the share training recipe are based on keeping the vision tower frozen.

@Danield21
Copy link
Author

Thanks for the clarification.

So the `share training recipe' just looks like performing a warm-up on the projector using the ShareGPT4V data?

@ZhangXJ199
Copy link
Collaborator

It also changes the training recipe of the LLM from frozen to full during pretraining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants