Skip to content

Commit 8dd4ac0

Browse files
authored
add (#58)
1 parent 84a7660 commit 8dd4ac0

16 files changed

+11754
-96
lines changed

README.md

Lines changed: 62 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ The vision features (detr, resnet, clip, vit) are available at https://huggingfa
2727
Alternatively, you may download the extracted vision features (detr, resnet, clip) from [vision_features](https://drive.google.com/file/d/13B0hc_F_45-UlqPLKSgRz-ALtFQ8kIJr/view?usp=share_link) and unzip the files under `vision_features`
2828

2929
## Extract Features (optional)
30+
31+
The processed vision features for ScienceQA are available at https://huggingface.co/cooelf/vision_features/tree/main.
32+
33+
The following instructions show how we obtain those features.
34+
3035
Download the image files from [Google Drive](https://drive.google.com/drive/folders/1w8imCXWYn2LxajmGeGH_g5DaL2rabHev?usp=sharing) and unzip all the images (train, dev, test) in the same folder (). The structure should be:
3136

3237
```
@@ -43,54 +48,83 @@ images
4348
│ └── image.png
4449
```
4550

46-
Run ```extract_features.py --data_root images --output_dir vision_features --img_type detr```
51+
Run ```extract_features.py --data_root images --output_dir vision_features --img_type vit```
4752

4853
If you hope to use your own images, please structure those images in the way above, or modify the script ```extract_features.py```.
4954

55+
## Extract Captions (optional)
56+
57+
The processed captions for ScienceQA are available at ```data/instruct_captions.json```.
58+
59+
The following instructions show how we obtain those features.
60+
61+
Intall lavis and prepare Vicuna weights to use InstructBLIP for caption extraction.
62+
63+
https://github.com/salesforce/LAVIS/tree/f982acc73288408bceda2d35471a8fcf55aa04ca/projects/instructblip
64+
65+
Assume that the images are stored in the ```images``` folder.
66+
67+
```
68+
python extract_caption.py
69+
```
70+
5071
## Instructions
5172

5273
### Training
5374

5475
```
5576
# rationale generation
56-
CUDA_VISIBLE_DEVICES=0,1 python main.py \
57-
--model allenai/unifiedqa-t5-base \
58-
--user_msg rationale --img_type detr \
59-
--bs 8 --eval_bs 4 --eval_acc 10 --output_len 512 \
60-
--final_eval --prompt_format QCM-LE
77+
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
78+
--data_root data/ScienceQA/data \
79+
--caption_file data/instruct_captions.json \
80+
--model declare-lab/flan-alpaca-large \
81+
--user_msg rationale --img_type vit \
82+
--bs 2 --eval_bs 4 --epoch 50 --lr 5e-5 --output_len 512 \
83+
--use_caption --use_generate --prompt_format QCM-E \
84+
--output_dir experiments
6185
6286
# answer inference
63-
CUDA_VISIBLE_DEVICES=0,1 python main.py \
64-
--model allenai/unifiedqa-t5-base \
65-
--user_msg answer --img_type detr \
66-
--bs 8 --eval_bs 4 --eval_acc 10 --output_len 64 \
67-
--final_eval --prompt_format QCMG-A \
68-
--eval_le experiments/rationale_allenai-unifiedqa-t5-base_detr_QCM-LE_lr5e-05_bs16_op512_ep20/predictions_ans_eval.json \
69-
--test_le experiments/rationale_allenai-unifiedqa-t5-base_detr_QCM-LE_lr5e-05_bs16_op512_ep20/predictions_ans_test.json
87+
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_central.py \
88+
--data_root data/ScienceQA/data \
89+
--caption_file data/instruct_captions.json \
90+
--model declare-lab/flan-alpaca-large \
91+
--user_msg answer --img_type vit \
92+
--bs 4 --eval_bs 8 --epoch 50 --lr 5e-5 --output_len 64 \
93+
--use_caption --use_generate --prompt_format QCMG-A \
94+
--output_dir experiments \
95+
--eval_le experiments/rationale_declare-lab-flan-alpaca-large_vit_QCM-E_lr5e-05_bs8_op512_ep50/predictions_ans_eval.json \
96+
--test_le experiments/rationale_declare-lab-flan-alpaca-large_vit_QCM-E_lr5e-05_bs8_op512_ep50/predictions_ans_test.json
97+
7098
```
7199

72100
### Inference
73101

74-
Our trained models are available at [models](https://drive.google.com/file/d/1FtTYOJPHnWnFfCxNC6M3gar4RAX5E21b/view?usp=share_link). To use our trained models, please put the them under the ```models``` folder.
102+
Our trained models are available at https://huggingface.co/cooelf/mm-cot/tree/main. To use our trained models, please put the them under the ```models``` folder.
75103

76104
```
77105
# rationale generation
78-
CUDA_VISIBLE_DEVICES=0,1 python main.py \
79-
--model allenai/unifiedqa-t5-base \
80-
--user_msg rationale --img_type detr \
81-
--bs 8 --eval_bs 4 --eval_acc 10 --output_len 512 \
82-
--final_eval --prompt_format QCM-LE \
83-
--evaluate_dir models/MM-CoT-UnifiedQA-base-Rationale
106+
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
107+
--data_root data/ScienceQA/data \
108+
--caption_file data/instruct_captions.json \
109+
--model declare-lab/flan-alpaca-large \
110+
--user_msg rationale --img_type vit \
111+
--bs 2 --eval_bs 4 --epoch 50 --lr 5e-5 --output_len 512 \
112+
--use_caption --use_generate --prompt_format QCM-E \
113+
--output_dir experiments
114+
--evaluate_dir models/mm-cot-large-rationale
84115
85116
# answer inference
86-
CUDA_VISIBLE_DEVICES=0,1 python main.py \
87-
--model allenai/unifiedqa-t5-base \
88-
--user_msg answer --img_type detr \
89-
--bs 8 --eval_bs 4 --eval_acc 10 --output_len 64 \
90-
--final_eval --prompt_format QCMG-A \
91-
--eval_le models/rationale/predictions_ans_eval.json \
92-
--test_le models/rationale/predictions_ans_test.json \
93-
--evaluate_dir models/MM-CoT-UnifiedQA-base-Answer
117+
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_central.py \
118+
--data_root data/ScienceQA/data \
119+
--caption_file data/instruct_captions.json \
120+
--model declare-lab/flan-alpaca-large \
121+
--user_msg answer --img_type vit \
122+
--bs 4 --eval_bs 8 --epoch 50 --lr 5e-5 --output_len 64 \
123+
--use_caption --use_generate --prompt_format QCMG-A \
124+
--output_dir experiments \
125+
--eval_le experiments/rationale_declare-lab-flan-alpaca-large_vit_QCM-E_lr5e-05_bs8_op512_ep50/predictions_ans_eval.json \
126+
--test_le experiments/rationale_declare-lab-flan-alpaca-large_vit_QCM-E_lr5e-05_bs8_op512_ep50/predictions_ans_test.json \
127+
--evaluate_dir models/mm-cot-large-answer
94128
```
95129

96130
## Citing MM-CoT
2.44 KB
Binary file not shown.

__pycache__/model.cpython-39.pyc

11.9 KB
Binary file not shown.
5.88 KB
Binary file not shown.
2.46 KB
Binary file not shown.
5.22 KB
Binary file not shown.

0 commit comments

Comments
 (0)