Understanding Tasks

Setup

cd eval/understanding

Please follow the instructions in LLaVA to download the data for alignment pretraining and instruction finetuning. Then install the packages required for training:

 pip install transformers==4.37.2 deepspeed==0.12.6 peft==0.13.2
 pip install flash-attn --no-build-isolation
 pip install sentencepiece==0.1.99
 pip install accelerate==0.21.0
 pip install scikit-learn==1.2.2

LLaVA Training

Alignment Pretraining

bash scripts/v1_5/pretrain.sh \
    --vision_tower /path/to/unitok/ckpt \
    --data_path /path/to/blip_laion_cc_sbu_558k.json \
    --image_folder /path/to/blip_laion_cc_sbu_558k/imgs \
    --custom_encoder True --quantize True

Instruction Finetuning

bash scripts/v1_5/finetune.sh \
    --vision_tower /path/to/unitok/ckpt \
    --data_path /path/to/llava_v1_5_mix665k.json \
    --image_folder /path/to/llava_v1_5_mix665k/imgs \
    --custom_encoder True --quantize True

LLaVA Evaluation

Please follow instructions in LLaVA to evaluate the model on various VQA benchmarks.

Generation Tasks

Setup

cd eval/generation

Download Imagenet for class-conditional image generation training. Then extract the VQ codes:

bash scripts/autoregressive/extract_codes_c2i.sh \
    --vq-ckpt /path/to/unitok/ckpt \
    --data-path /path/to/imagenet/train \
    --code-path /path/to/save/imagenet_code_c2i_flip_ten_crop \
    --ten-crop --crop-range 1.1 --image-size 256

LlamaGen Training

Before running, please configure nnodes, nproc_per_node, node_rank, master_addr, master_port in train_c2i.sh.

bash scripts/autoregressive/train_c2i.sh 
    --cloud-save-path /path/to/cloud_disk \
    --code-path /path/to/imagenet_code_c2i_flip_ten_crop \
    --num-output-layer 4 --gpt-model GPT-XL \
    --num-codebooks 8 --vocab-size 32768 --image-size 256

Note: To fulfill the potential of UniTok, we suggest using GPT-L or larger generators for LlamaGen Training.

LlamaGen Sampling

bash scripts/autoregressive/sample_c2i.sh \
    --vq-ckpt /path/to/unitok/ckpt \
    --gpt-ckpt /path/to/llamagen/ckpt \
    --gpt-model GPT-XL --num-output-layer 4 \
    --num-codebooks 8 --codebook-size 32768 \
    --image-size 256 --cfg-scale 1.15

For FID evaluation, please follow instructions from LlamaGen to install required packages and download reference images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

EVAL.md

EVAL.md

Understanding Tasks

Setup

LLaVA Training

LLaVA Evaluation

Generation Tasks

Setup

LlamaGen Training

LlamaGen Sampling

Files

EVAL.md

Latest commit

History

EVAL.md

File metadata and controls

Understanding Tasks

Setup

LLaVA Training

LLaVA Evaluation

Generation Tasks

Setup

LlamaGen Training

LlamaGen Sampling