Skip to content

[Reproduction Issue] Low Accuracy on MSVD/MSRVTT, No Results on NextQA, and Runtime Error on Egoschema #5

@xiongyuaay

Description

@xiongyuaay

Hi @authors,

First of all, thank you for open-sourcing this impressive work! I've tried to reproduce the results from your paper, but I encountered the following issues:

Environment

  • Python: 3.10.14
  • PyTorch: 2.2.0
  • CUDA: 12.1
  • Code version: main branch (commit bea0f73)

Issue 1: Low Accuracy on MSVD and MSRVTT Datasets

  • Expected:
    • MSVD: 79.1/4.1 (Table 1 in paper)
    • MSRVTT: 65.8/3.6
  • Actual:
    • MSVD: 61.6/3.4
    • MSRVTT: 46.0/2.83
  • Config Used: configs/slowfast_llava_7b-resize-slow_10frms_spatial_1d_max_pool_fast_4x4-50_frms.yaml

Issue 2: No Results on NextQA Dataset

Steps:

python run_inference.py --exp_config ./cfgs/slowfast_llava_7b-resize-slow_10frms_spatial_1d_max_pool_fast_4x4-50_frms.yaml

Output:

:::: Start Inference ::::
evaluating nextqa ...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:28<00:00, 29.42s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4996/4996 [00:00<00:00, 16708.75it/s]
0it [00:00, ?it/s]

Logs:

[2025-03-06 23:34:10,273] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
...

The output indicates that the inference process started but no results were produced.

Issue 3: Runtime Error on Egoschema Dataset

Steps:

python run_inference.py --exp_config ./cfgs/slowfast_llava_7b-resize-slow_10frms_spatial_1d_max_pool_fast_4x4-50_frms.yaml

Error:

evaluating egoschema ...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:35<00:00, 31.73s/it]
  0%|                                                                                                                                                         | 0/500 [00:00<?, ?it/s]The `seq_len` argument is deprecated and unused. It will be removed in v4.39.
  0%|| 2/500 [00:27<1:53:40, 13.70s/it]
Traceback (most recent call last):
  File "/root/private_data/ml-slowfast-llava/run_inference_multiple_choice_qa.py", line 182, in <module>
    run_inference(args)
  File "/root/private_data/ml-slowfast-llava/run_inference_multiple_choice_qa.py", line 133, in run_inference
    output = llava_inference(
  File "/root/private_data/ml-slowfast-llava/run_inference_multiple_choice_qa.py", line 54, in llava_inference
    output_ids = model.generate(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/private_data/ml-slowfast-llava/slowfast_llava/llava/model/language_model/llava_llama.py", line 138, in generate
    return super().generate(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1544, in generate
    return self.greedy_search(
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2404, in greedy_search
    outputs = self(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/private_data/ml-slowfast-llava/slowfast_llava/llava/model/language_model/llava_llama.py", line 91, in forward
    return super().forward(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1176, in forward
    outputs = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 993, in forward
    causal_mask = self._update_causal_mask(attention_mask, inputs_embeds)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1079, in _update_causal_mask
    padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[:, None, None, :].eq(0.0)
RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3

Questions

  • Are there any dataset-specific hyperparameters not mentioned in the repo?
  • Is there any additional data preprocessing required for NextQA or Egoschema datasets?

Looking forward to your feedback!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions