Skip to content

Conversation

i-gao
Copy link
Collaborator

@i-gao i-gao commented Sep 21, 2023

Modular eval code

TODOs:

  • test an eval for each dataset

@liyongqi67
Copy link

liyongqi67 commented Oct 4, 2023

Does this branch test the "evaluate" code? I tested with the eval_flickr30 flag, and found it reported an error:

  File "/home/share/yongqi/project/open_flamingo/open_flamingo/src/helpers.py", line 240, in forward
    assert (
AssertionError: current text cannot be longer than conditioned media locations

My script is

CUDA_VISIBLE_DEVICES=3,4,6,7 torchrun --nnodes=1 --nproc_per_node=4 --master_port=1997 ./open_flamingo/eval/evaluate.py \
    --model_family flamingo \
    --vision_encoder_path ViT-L-14 \
    --vision_encoder_pretrained openai\
    --lm_path anas-awadalla/mpt-1b-redpajama-200b-hf-style  \
    --tokenizer_path anas-awadalla/mpt-1b-redpajama-200b-hf-style  \
    --cross_attn_every_n_layers 1 \
    --results_file results.json \
    --precision fp32 \
    --batch_size 1 \
    --eval_flickr30 \
    --shots 0 \

I printed the two corresponding length values via " # print(x.shape[1], media_locations.shape[1])" in helpers.py before line 240.

47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
47 47
48 47

At the last call, 48>47.

And if I set batchsize=2, it will report another error.

  File "/home/share/yongqi/project/open_flamingo/open_flamingo/src/helpers.py", line 273, in forward
    sim = sim.masked_fill(~text_to_media_mask, -torch.finfo(sim.dtype).max)
RuntimeError: The size of tensor a (2) must match the size of tensor b (6) at non-singleton dimension 0

@liyongqi67
Copy link

In the evaluate.py line 747, the code should be revised from

        outputs = eval_model.get_outputs(
            batch_images=batch_images,
            batch_text=batch_text,
            min_generation_length=min_generation_length,
            max_generation_length=max_generation_length,
            num_beams=num_beams,
            length_penalty=length_penalty,
        )

to

        outputs = eval_model.get_outputs(
            batch_images=batch_images,
            batch_text=batch_text,
            min_new_tokens=min_generation_length,
            max_new_tokens=max_generation_length,
            num_beams=num_beams,
            length_penalty=length_penalty,
        )

Because min_new_tokens and max_new_tokens are accepted arguments for the LLM generate().

@i-gao
Copy link
Collaborator Author

i-gao commented Oct 4, 2023

Hi @liyongqi67, thanks for pointing out these issues! Sorry, I have not finished cleaning up this gnarly merge yet -- will get to it in the next few days.

@liyongqi67
Copy link

Hi @liyongqi67, thanks for pointing out these issues! Sorry, I have not finished cleaning up this gnarly merge yet -- will get to it in the next few days.

Many thanks for your effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants