how to enable the 'share' training recipe in the paper #150

Danield21 · 2024-12-06T08:04:20Z

Hi, many thanks for your awesome work first!

However, I am a bit confused about how to enable the 'share' training recipe correctly in the paper. This setting looks like a three-stage training.

Stage 1: Use the 'base' training recipe only to train the connector on the pre-training dataset (e.g., LLaVA-1.5-558k).
Stage 2: Inherit the connector from Stage 1, and pertain the partial visual encoder， connector and the llm using the same pre-training dataset (e.g., LLaVA-1.5-558k) for one epoch.
Stage 3: Only train the connector and the llm on SFT data like LLaVA-1.5-mix-665k.

It looks that the pre-training data is actually trained for two epochs (in stage 1 and stage 2) if my understanding is correct.
But what hyperparameter configuration is used in this stage 1? and do we need to train the full-epoch data in this Stage?

Also, May I ask where to find the shell scripts about using 'share' training recipe in the codebase?

ZhangXJ199 · 2024-12-06T08:13:50Z

The scripts for 'share' training strategy are located in script/train/share.

Danield21 · 2024-12-06T08:27:12Z

The scripts for 'share' training strategy are located in script/train/share.

Thank you very much for sharing!

DATA_PATH=/home/ai/data/llava/dataset/text_files/blip_laion_cc_sbu_558k.json
SHARE_PRETRAIN_DATA_PATH=/mnt/data/sata/ssd/dataset/text_files/really_cleaned_share-captioner_coco_lcs_sam_1246k_1107.json
SHARE_FINETUNE_DATA_PATH=/mnt/data/sata/ssd/dataset/text_files/cleaned_sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json
IMAGE_PATH=/home/ai/data/llava/dataset/llava/llava_pretrain/images
SHARE_PRETRAIN_IMAGE_PATH=/home/ai/data/llava/dataset
SHARE_FINETUNE_IMAGE_PATH=/home/ai/data/llava/dataset

LLM_VERSION=microsoft/phi-2
VT_VERSION=google/siglip-so400m-patch14-384
VT_VERSION2=""
CN_VERSION=mlp2x_gelu
CONV_VERSION=phi
VERSION=share
TRAIN_RECIPE=common
MODEL_MAX_LENGTH=3072



bash scripts/train/pretrain.sh "$DATA_PATH" "$IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH"
bash scripts/train/share/pretrain_share.sh "$SHARE_PRETRAIN_DATA_PATH" "$SHARE_PRETRAIN_IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH" 
bash scripts/train/share/finetune_share.sh "$SHARE_FINETUNE_DATA_PATH" "$SHARE_FINETUNE_IMAGE_PATH" "$LLM_VERSION" "$VT_VERSION" "$VT_VERSION2" "$CN_VERSION" "$CONV_VERSION" "$VERSION" "$TRAIN_RECIPE" "$MODEL_MAX_LENGTH"

So it looks like Stage 2 uses the ShareGPT4V as the training data, but inherits the connector trained with LLaVA-1.5-558k in Stage 1. Is my understanding correct?

ZhangXJ199 · 2024-12-06T08:33:15Z

Yes, you are right.

Danield21 · 2024-12-07T07:02:36Z

Hi,

I found a point really confusing after reading the script `script/train/share/pretrain_share.sh.'

In the paper, it looks like the `share training recipe' will train partial parameters of the vision encoder (VE). But in the script, it looks that the whole VE is frozen, based on the args '--tune_type_vision_tower frozen '

Then I check the code in `TinyLLaVA_Factory/tinyllava/training_recipe/base.py'

It seems that '--tune_type_vision_tower frozen ' refers to the whole VE is not trained during the training. And '--tune_type_vision_tower partially-tune ' is actually suitable for the `share training recipe'.

So I just want to make sure whether the training of phi-2' is really based on the share training recipe' mentioned in the paper, when runing the script `TinyLLaVA_Factory/scripts/train/share/train_phi_share.sh' in the repo?

ZhangXJ199 · 2024-12-07T07:30:38Z

In fact, the share training recipe keeps the vision tower frozen. The partial training of the vision tower mentioned in the paper corresponds to the partially-tune setting. We have identified some issues in the original code, so the scripts and results we provide for the share training recipe are based on keeping the vision tower frozen.

Danield21 · 2024-12-07T07:49:14Z

Thanks for the clarification.

So the `share training recipe' just looks like performing a warm-up on the projector using the ShareGPT4V data?

ZhangXJ199 · 2024-12-07T07:54:07Z

It also changes the training recipe of the LLM from frozen to full during pretraining.

Danield21 closed this as completed Dec 6, 2024

Danield21 reopened this Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to enable the 'share' training recipe in the paper #150

how to enable the 'share' training recipe in the paper #150

Danield21 commented Dec 6, 2024

ZhangXJ199 commented Dec 6, 2024

Uh oh!

Danield21 commented Dec 6, 2024

Uh oh!

ZhangXJ199 commented Dec 6, 2024

Uh oh!

Danield21 commented Dec 7, 2024

Uh oh!

ZhangXJ199 commented Dec 7, 2024

Uh oh!

Danield21 commented Dec 7, 2024

Uh oh!

ZhangXJ199 commented Dec 7, 2024

Uh oh!

how to enable the 'share' training recipe in the paper #150

how to enable the 'share' training recipe in the paper #150

Comments

Danield21 commented Dec 6, 2024

ZhangXJ199 commented Dec 6, 2024

Uh oh!

Danield21 commented Dec 6, 2024

Uh oh!

ZhangXJ199 commented Dec 6, 2024

Uh oh!

Danield21 commented Dec 7, 2024

Uh oh!

ZhangXJ199 commented Dec 7, 2024

Uh oh!

Danield21 commented Dec 7, 2024

Uh oh!

ZhangXJ199 commented Dec 7, 2024

Uh oh!