cd eval/understanding
Please follow the instructions in LLaVA to download the data for alignment pretraining and instruction finetuning. Then install the packages required for training:
pip install transformers==4.37.2 deepspeed==0.12.6 peft==0.13.2
pip install flash-attn --no-build-isolation
pip install sentencepiece==0.1.99
pip install accelerate==0.21.0
pip install scikit-learn==1.2.2
Alignment Pretraining
bash scripts/v1_5/pretrain.sh \
--vision_tower /path/to/unitok/ckpt \
--data_path /path/to/blip_laion_cc_sbu_558k.json \
--image_folder /path/to/blip_laion_cc_sbu_558k/imgs \
--custom_encoder True --quantize True
Instruction Finetuning
bash scripts/v1_5/finetune.sh \
--vision_tower /path/to/unitok/ckpt \
--data_path /path/to/llava_v1_5_mix665k.json \
--image_folder /path/to/llava_v1_5_mix665k/imgs \
--custom_encoder True --quantize True
Please follow instructions in LLaVA to evaluate the model on various VQA benchmarks.
cd eval/generation
Download Imagenet for class-conditional image generation training. Then extract the VQ codes:
bash scripts/autoregressive/extract_codes_c2i.sh \
--vq-ckpt /path/to/unitok/ckpt \
--data-path /path/to/imagenet/train \
--code-path /path/to/save/imagenet_code_c2i_flip_ten_crop \
--ten-crop --crop-range 1.1 --image-size 256
Before running, please configure nnodes, nproc_per_node, node_rank, master_addr, master_port
in train_c2i.sh
.
bash scripts/autoregressive/train_c2i.sh
--cloud-save-path /path/to/cloud_disk \
--code-path /path/to/imagenet_code_c2i_flip_ten_crop \
--num-output-layer 4 --gpt-model GPT-XL \
--num-codebooks 8 --vocab-size 32768 --image-size 256
Note: To fulfill the potential of UniTok, we suggest using GPT-L or larger generators for LlamaGen Training.
bash scripts/autoregressive/sample_c2i.sh \
--vq-ckpt /path/to/unitok/ckpt \
--gpt-ckpt /path/to/llamagen/ckpt \
--gpt-model GPT-XL --num-output-layer 4 \
--num-codebooks 8 --codebook-size 32768 \
--image-size 256 --cfg-scale 1.15
For FID evaluation, please follow instructions from LlamaGen to install required packages and download reference images.