-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Hi @authors,
First of all, thank you for open-sourcing this impressive work! I've tried to reproduce the results from your paper, but I encountered the following issues:
Environment
- Python: 3.10.14
- PyTorch: 2.2.0
- CUDA: 12.1
- Code version: main branch (commit
bea0f73)
Issue 1: Low Accuracy on MSVD and MSRVTT Datasets
- Expected:
- MSVD: 79.1/4.1 (Table 1 in paper)
- MSRVTT: 65.8/3.6
- Actual:
- MSVD: 61.6/3.4
- MSRVTT: 46.0/2.83
- Config Used:
configs/slowfast_llava_7b-resize-slow_10frms_spatial_1d_max_pool_fast_4x4-50_frms.yaml
Issue 2: No Results on NextQA Dataset
Steps:
python run_inference.py --exp_config ./cfgs/slowfast_llava_7b-resize-slow_10frms_spatial_1d_max_pool_fast_4x4-50_frms.yamlOutput:
:::: Start Inference ::::
evaluating nextqa ...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:28<00:00, 29.42s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4996/4996 [00:00<00:00, 16708.75it/s]
0it [00:00, ?it/s]Logs:
[2025-03-06 23:34:10,273] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
...
The output indicates that the inference process started but no results were produced.
Issue 3: Runtime Error on Egoschema Dataset
Steps:
python run_inference.py --exp_config ./cfgs/slowfast_llava_7b-resize-slow_10frms_spatial_1d_max_pool_fast_4x4-50_frms.yamlError:
evaluating egoschema ...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:35<00:00, 31.73s/it]
0%| | 0/500 [00:00<?, ?it/s]The `seq_len` argument is deprecated and unused. It will be removed in v4.39.
0%|▌ | 2/500 [00:27<1:53:40, 13.70s/it]
Traceback (most recent call last):
File "/root/private_data/ml-slowfast-llava/run_inference_multiple_choice_qa.py", line 182, in <module>
run_inference(args)
File "/root/private_data/ml-slowfast-llava/run_inference_multiple_choice_qa.py", line 133, in run_inference
output = llava_inference(
File "/root/private_data/ml-slowfast-llava/run_inference_multiple_choice_qa.py", line 54, in llava_inference
output_ids = model.generate(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/private_data/ml-slowfast-llava/slowfast_llava/llava/model/language_model/llava_llama.py", line 138, in generate
return super().generate(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1544, in generate
return self.greedy_search(
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2404, in greedy_search
outputs = self(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/private_data/ml-slowfast-llava/slowfast_llava/llava/model/language_model/llava_llama.py", line 91, in forward
return super().forward(
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1176, in forward
outputs = self.model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 993, in forward
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1079, in _update_causal_mask
padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[:, None, None, :].eq(0.0)
RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3Questions
- Are there any dataset-specific hyperparameters not mentioned in the repo?
- Is there any additional data preprocessing required for NextQA or Egoschema datasets?
Looking forward to your feedback!
Metadata
Metadata
Assignees
Labels
No labels