This repo contains all the data and code related to the paper Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
- [14 Feb, 2025] Paper submitted to ARR February cycle.
- [24 July, 2025] Paper will be presented on the MAGMaR workshop.
Caution
Installation was tested on CUDA 12.4 and A100. If you see errors, please use the Docker instructions.
To run the code in this project, first, create a Python virtual environment using uv. To install uv
, follow the UV Installation Guide.
uv venv --seed --python 3.10
uv sync
source .venv/bin/activate
gdown --fuzzy https://drive.google.com/file/d/1qcr9ZqHptibJKHOwyOrjjbwQTjcsp_Vk/view
tar -xzvf data.tgz
Due to the redistribution policy, we cannot provide the videos directly. However, you can download the videos using the following instructions.
- Download the MultiVENT videos, and save them in the
data/MultiVENT/videos
directory. - Download the MSR-VTT videos, and save them in the
data/MSR-VTT-1kA/videos
directory.
mkdir -p data/models/MultiCLIP
wget -O data/models/MultiCLIP/open_clip_pytorch_model.bin https://huggingface.co/laion/CLIP-ViT-H-14-frozen-xlm-roberta-large-laion5B-s13B-b90k/resolve/main/open_clip_pytorch_model.bin
mkdir -p data/models/InternVideo2
Due to different licensing agreements, we cannot provide the InternVideo2 model directly. However, you can download the InternVideo2 model from here and save it as data/models/InternVideo2/InternVideo2-stage2_1b-224p-f4.pt
.
Data is generated and already populated in the data
directory. To generate the data, follow the instructions in the Data Generation section.
Evaluating MultiVENT
bash scripts/eval_multivent.sh
Evaluating MSR-VTT-1kA
bash scripts/eval_msrvtt.sh
Data for MSR-VTT-1kA and MultiVENT datasets can be generated using the scripts below. The scripts will transcribe the audio, and generate the data for evaluation. Pre-generated data is available in the Download Pre-Generated Data section.
Dataset | Audio | Script Location |
---|---|---|
MultiVENT | ✓ | scripts/generate_multivent_asr.sh |
MultiVENT | - | scripts/generate_multivent_noasr.sh |
MSR-VTT-1kA | ✓ | scripts/generate_msrvtt_asr.sh |
MSR-VTT-1kA | - | scripts/generate_msrvtt_noasr.sh |
ALL | ALL | scripts/grid_search_data.py |
If you want to generate data using your own dataset, i.e, {DATA_DIR}
, follow the instructions below.
- Download all videos to
{DATA_DIR}/videos
- Write a CSV file with the following columns:
query, video_id
and save it as{DATA_DIR}/dataset.csv
- Generate the data
echo "Transcribing videos"
python -m src.data.transcribe_audios \
--video_dir={DATA_DIR}/videos
echo "Processing raw data"
python -m src.data.query_decomp \
--data_dir={DATA_DIR} \
--video_dir={DATA_DIR}/videos \
--gen_max_model_len=2048
echo "Captioning frames"
python -m src.data.frame_caption \
--data_dir={DATA_DIR} \
--video_dir={DATA_DIR}/videos \
--gen_max_model_len=16384 \
--num_of_frames=16
echo "Captioning videos"
python -m src.data.frame2video_caption \
--data_dir={DATA_DIR} \
--video_dir={DATA_DIR}/videos \
--gen_max_model_len=16384 \
--num_of_frames=16
- Evaluate using MultiCLIP
echo "Without ASR"
python -m src.eval.MultiCLIP.infer \
--note=eval \
--dataset_dir={HFDatasetDIR} \
--aggregation_methods=inv_entropy
echo "With ASR"
python -m src.eval.MultiCLIP.infer \
--note=eval \
--dataset_dir={HFDatasetDIR}\
--aggregation_methods=inv_entropy
- Evaluate using InternVideo2
echo "Without ASR"
python -m src.eval.InternVideo2.infer \
--note=eval \
--dataset_dir={HFDatasetDIR}\
--aggregation_methods=inv_entropy
echo "With ASR"
python -m src.eval.InternVideo2.infer \
--note=eval \
--dataset_dir={HFDatasetDIR}\
--aggregation_methods=inv_entropy
Many of us doesn't have root permission to the server, there comes the udocker to the rescue. To use the code in a Docker container, follow the instructions below.
# Install udocker
uv add udocker
# Create and run the container
udocker pull runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
udocker create --name="runpod" runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
udocker setup --nvidia runpod
udocker run --volume="/${PWD}:/workspace" --name="runpod" runpod bash
# Inside the container
## install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
## install the dependencies
uv venv --seed --python=3.10
uv sync
If you find this code useful for your research, please consider citing:
@article{dipta2025q2e,
title={Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval},
author={Dipta, Shubhashis Roy and Ferraro, Francis},
journal={arXiv preprint arXiv:2506.10202},
year={2025}
}