-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Not using distributed mode
[18:17:54.955532] job dir: /home/23031212503/projects/Flipped-VQA
[18:17:54.955618] Namespace(batch_size=1,
epochs=5,
accum_iter=4,
llama_model_path='./pretrained/llama/',
model='7B',
adapter_layer=32,
adapter_len=10,
max_seq_len=650,
max_feats=10,
weight_decay=0.02,
lr=None,
blr=0.07,
min_lr=0.0,
warmup_epochs=2,
dataset='tvqa',
output_dir='./checkpoint/tvqa',
device='cuda',
seed=0,
resume='',
start_epoch=0,
num_workers=2,
pin_mem=True,
world_size=1,
local_rank=-1,
dist_on_itp=False,
dist_url='env://',
vaq=True,
qav=True,
bias=3.0,
tau=100.0,
sub=True,
distributed=False)
[18:18:16.740925] Num train data: 122039
[18:18:24.026051] Num val data: 15253
[18:18:24.039350] Using model: 7B
[18:18:24.041255] loading from pretrained/llama/7B/consolidated.00.pth
[18:19:13.553202] base lr: 7.00e-02
[18:19:13.553243] actual lr: 1.09e-03
[18:19:13.553254] accumulate grad iterations: 4
[18:19:13.553258] effective batch size: 4
[18:19:13.554187] AdamW (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.95)
capturable: False
eps: 1e-08
foreach: None
lr: 0.00109375
maximize: False
weight_decay: 0.0
Parameter Group 1
amsgrad: False
betas: (0.9, 0.95)
capturable: False
eps: 1e-08
foreach: None
lr: 0.00109375
maximize: False
weight_decay: 0.02
)
[18:19:13.554305] Start training for 5 epochs
[18:19:17.576096] Epoch: [0] [ 0/122039] eta: 5 days, 16:15:56 lr: 0.000000 loss: 5.6871 (5.6871) vqa_loss: 1.4844 (1.4844) vaq_loss: 1.8125 (1.8125) qav_loss: 2.3903 (2.3903) time: 4.0197 data: 0.7782 max mem: 37679
[18:19:23.617162] Loss is nan, stopping training
But according to the printed, loss is not nan.
Command is the training command in README with some arguments about distributed training removed:
python train.py --model 7B --max_seq_len 650 --batch_size 1 --epochs 5 --warmup_epochs 2 --bias 3 --tau 100. --max_feats 10 --dataset tvqa --blr 7e-2 --weight_decay 0.02 --output_dir ./checkpoint/tvqa --dataset tvqa --accum_iter 4 --sub --vaq --qavMetadata
Metadata
Assignees
Labels
No labels