Skip to content

Files

Latest commit

 

History

History
86 lines (70 loc) · 2.65 KB

EVAL.md

File metadata and controls

86 lines (70 loc) · 2.65 KB

Understanding Tasks

Setup

cd eval/understanding

Please follow the instructions in LLaVA to download the data for alignment pretraining and instruction finetuning. Then install the packages required for training:

 pip install transformers==4.37.2 deepspeed==0.12.6 peft==0.13.2
 pip install flash-attn --no-build-isolation
 pip install sentencepiece==0.1.99
 pip install accelerate==0.21.0
 pip install scikit-learn==1.2.2

LLaVA Training

Alignment Pretraining

bash scripts/v1_5/pretrain.sh \
    --vision_tower /path/to/unitok/ckpt \
    --data_path /path/to/blip_laion_cc_sbu_558k.json \
    --image_folder /path/to/blip_laion_cc_sbu_558k/imgs \
    --custom_encoder True --quantize True

Instruction Finetuning

bash scripts/v1_5/finetune.sh \
    --vision_tower /path/to/unitok/ckpt \
    --data_path /path/to/llava_v1_5_mix665k.json \
    --image_folder /path/to/llava_v1_5_mix665k/imgs \
    --custom_encoder True --quantize True

LLaVA Evaluation

Please follow instructions in LLaVA to evaluate the model on various VQA benchmarks.

Generation Tasks

Setup

cd eval/generation

Download Imagenet for class-conditional image generation training. Then extract the VQ codes:

bash scripts/autoregressive/extract_codes_c2i.sh \
    --vq-ckpt /path/to/unitok/ckpt \
    --data-path /path/to/imagenet/train \
    --code-path /path/to/save/imagenet_code_c2i_flip_ten_crop \
    --ten-crop --crop-range 1.1 --image-size 256

LlamaGen Training

Before running, please configure nnodes, nproc_per_node, node_rank, master_addr, master_port in train_c2i.sh.

bash scripts/autoregressive/train_c2i.sh 
    --cloud-save-path /path/to/cloud_disk \
    --code-path /path/to/imagenet_code_c2i_flip_ten_crop \
    --num-output-layer 4 --gpt-model GPT-XL \
    --num-codebooks 8 --vocab-size 32768 --image-size 256 

Note: To fulfill the potential of UniTok, we suggest using GPT-L or larger generators for LlamaGen Training.

LlamaGen Sampling

bash scripts/autoregressive/sample_c2i.sh \
    --vq-ckpt /path/to/unitok/ckpt \
    --gpt-ckpt /path/to/llamagen/ckpt \
    --gpt-model GPT-XL --num-output-layer 4 \
    --num-codebooks 8 --codebook-size 32768 \
    --image-size 256 --cfg-scale 1.15

For FID evaluation, please follow instructions from LlamaGen to install required packages and download reference images.