Q₂E

This repo contains all the data and code related to the paper Q₂E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval

🔥News:

[14 Feb, 2025] Paper submitted to ARR February cycle.
[24 July, 2025] Paper will be presented on the MAGMaR workshop.

Outlines

Installation
Data
Evaluation
- MultiVENT
- MSR-VTT-1kA
Data Generation Scripts
Use Your Own Data
Using Docker
Citation

Installation

Caution

Installation was tested on CUDA 12.4 and A100. If you see errors, please use the Docker instructions.

To run the code in this project, first, create a Python virtual environment using uv. To install uv, follow the UV Installation Guide.

uv venv --seed --python 3.10
uv sync

Data

Pre-Generated Data

source .venv/bin/activate
gdown --fuzzy https://drive.google.com/file/d/1qcr9ZqHptibJKHOwyOrjjbwQTjcsp_Vk/view
tar -xzvf data.tgz

Videos

Due to the redistribution policy, we cannot provide the videos directly. However, you can download the videos using the following instructions.

Download the MultiVENT videos, and save them in the data/MultiVENT/videos directory.
Download the MSR-VTT videos, and save them in the data/MSR-VTT-1kA/videos directory.

Pre-Trained Models

mkdir -p data/models/MultiCLIP
wget -O data/models/MultiCLIP/open_clip_pytorch_model.bin https://huggingface.co/laion/CLIP-ViT-H-14-frozen-xlm-roberta-large-laion5B-s13B-b90k/resolve/main/open_clip_pytorch_model.bin

mkdir -p data/models/InternVideo2

Due to different licensing agreements, we cannot provide the InternVideo2 model directly. However, you can download the InternVideo2 model from here and save it as data/models/InternVideo2/InternVideo2-stage2_1b-224p-f4.pt.

Evaluation

Data is generated and already populated in the data directory. To generate the data, follow the instructions in the Data Generation section.

Evaluating MultiVENT

bash scripts/eval_multivent.sh

Evaluating MSR-VTT-1kA

bash scripts/eval_msrvtt.sh

Data Generation Scripts

Data for MSR-VTT-1kA and MultiVENT datasets can be generated using the scripts below. The scripts will transcribe the audio, and generate the data for evaluation. Pre-generated data is available in the Download Pre-Generated Data section.

Dataset	Audio	Script Location
MultiVENT	✓	scripts/generate_multivent_asr.sh
MultiVENT	-	scripts/generate_multivent_noasr.sh
MSR-VTT-1kA	✓	scripts/generate_msrvtt_asr.sh
MSR-VTT-1kA	-	scripts/generate_msrvtt_noasr.sh
ALL	ALL	scripts/grid_search_data.py

Use Your Own Data

If you want to generate data using your own dataset, i.e, {DATA_DIR}, follow the instructions below.

Download all videos to {DATA_DIR}/videos
Write a CSV file with the following columns: query, video_id and save it as {DATA_DIR}/dataset.csv
Generate the data

echo "Transcribing videos"
python -m src.data.transcribe_audios \
    --video_dir={DATA_DIR}/videos

echo "Processing raw data"
python -m src.data.query_decomp  \
    --data_dir={DATA_DIR} \
    --video_dir={DATA_DIR}/videos \
    --gen_max_model_len=2048

echo "Captioning frames"
python -m src.data.frame_caption \
    --data_dir={DATA_DIR} \
    --video_dir={DATA_DIR}/videos \
    --gen_max_model_len=16384 \
    --num_of_frames=16

echo "Captioning videos"
python -m src.data.frame2video_caption \
    --data_dir={DATA_DIR} \
    --video_dir={DATA_DIR}/videos \
    --gen_max_model_len=16384 \
    --num_of_frames=16

Evaluate using MultiCLIP

echo "Without ASR"
python -m src.eval.MultiCLIP.infer \
    --note=eval \
    --dataset_dir={HFDatasetDIR} \
    --aggregation_methods=inv_entropy


echo "With ASR"
python -m src.eval.MultiCLIP.infer \
    --note=eval \
    --dataset_dir={HFDatasetDIR}\
    --aggregation_methods=inv_entropy

Evaluate using InternVideo2

echo "Without ASR"
python -m src.eval.InternVideo2.infer \
    --note=eval \
    --dataset_dir={HFDatasetDIR}\
    --aggregation_methods=inv_entropy


echo "With ASR"
python -m src.eval.InternVideo2.infer \
    --note=eval \
    --dataset_dir={HFDatasetDIR}\
    --aggregation_methods=inv_entropy

Using Docker

Many of us doesn't have root permission to the server, there comes the udocker to the rescue. To use the code in a Docker container, follow the instructions below.

# Install udocker
uv add udocker
# Create and run the container
udocker pull runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
udocker create --name="runpod" runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
udocker setup --nvidia runpod
udocker run --volume="/${PWD}:/workspace" --name="runpod" runpod bash

# Inside the container
## install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

## install the dependencies
uv venv --seed --python=3.10
uv sync

Citation

If you find this code useful for your research, please consider citing:

@article{dipta2025q2e,
  title={Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval},
  author={Dipta, Shubhashis Roy and Ferraro, Francis},
  journal={arXiv preprint arXiv:2506.10202},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Q₂E

🔥News:

Outlines

Installation

Data

Pre-Generated Data

Videos

Pre-Trained Models

Evaluation

Evaluating MultiVENT

Evaluating MSR-VTT-1kA

Data Generation Scripts

Use Your Own Data

Using Docker

Citation

About

Uh oh!

Uh oh!

Languages

dipta007/Q2E

Folders and files

Latest commit

History

Repository files navigation

Q2E

🔥News:

Outlines

Installation

Data

Pre-Generated Data

Videos

Pre-Trained Models

Evaluation

Evaluating MultiVENT

Evaluating MSR-VTT-1kA

Data Generation Scripts

Use Your Own Data

Using Docker

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Q₂E