To evaluate on NaVQA, we provide the following instructions for downloading, preprocessing, and evaluating on the data.

Download and preprocess the CODa dataset

First download the relevant subsets of the CODa dataset, which consists of 22 sequences.

We only need 7 of them which are 0, 3, 4, 6, 16, 21, 22. These numbers will be referred to as sequence IDs. Each sequence ID has 30 questions associated with it.

Because of the number of videos, be sure to have a large amount of storage. The processed dataset is ~335GB, but since the pre-processing phase also downloads LiDAR and other outputs, we would recommend having ~500GB extra storage.

Download the CODa devkit to some directory not inside ReMEmbR. Ideally place this in a larger HDD that has enough storage for all the data.

git clone https://github.com/ut-amrl/coda-devkit.git
cd coda-devkit && mkdir data

Then let us set a few environment variables. Fill them with the appropriate paths. The REMEMBR_PATH is the folder where the scripts folder is accessible. We would recommend adding these to your ~/.bashrc

export CODA_ROOT_DIR=/path/to/coda-devkit/data
export REMEMBR_PATH=/path/to/remembr

Then we need to ensure we can run CODa's scripts. We have to first install their coda environment:

# while in the coda-devkit directory
conda env create -f environment.yml

Then run the following command which will preprocess the data in the appropriate format from the remembr directory:

conda activate coda
cd remembr
bash scripts/bash_scripts/preprocess_coda_all.sh

Caption the dataset offline

Ensure the location of your preprocessed coda data is located in /path/to/remembr/coda_data

Given the dataset, run the following command for each. We describe the meaning of each below:

conda activate remembr
python scripts/preprocess_captions.py \
    --seq_id 0 \
    --seconds_per_caption 3 \
    --model-path Efficient-Large-Model/VILA1.5-13b
    --captioner_name VILA1.5-13b 
    --out_path data/captions/0/captions

seq_id: The sequence ID from the CODa dataset (of the 7 listed in the previous section)
seconds_per_caption: The number of seconds of frames aggregated for generating a caption
model-path: The name of the specific VILA model as described in their code
captioner_name: The name of the output file prefix based on the captioner type
out_path: The format of the captions must be: data/captions/{seq_id}/captions

Be sure to set the captioner_name correctly so that it matches the model used in model-path!

The captions for each frame should be put into a JSON file located in data/captions/{seq_id}/captions.

We provide an example to preprocess all captions as above in scripts/bash_scripts/preprocess_captions_all.sh

Check the dataset and preprocess it

1. Ensure `data/navqa/data.csv` exists

This folder contains the questions and answers that must be converted into the proper format.

2. Form the questions in the proper format

Run the following script, providing it a base captioner file that you ran previously.

python scripts/question_scripts/form_question_jsons.py --caption_file captions_{{captioner_name}}_{{seconds_per_caption}}_secs

This is meant to also aggregate the "optimal" context required to answer the question based on the captioner and seconds per caption, so you must set captioner_name and seconds_per_caption. We recommend using a 3 seconds per caption value. Here is an example coninuing from above:

python scripts/question_scripts/form_question_jsons.py --caption_file captions_VILA1.5-13b_3_secs

After this step, a folder called data/questions should exist.

Run the evaluation

To run the evaluation, you must first run the MilvusDB container. All evaluations create a MilvusDB collection per sequence ID.

python scripts/eval.py \
    --model {{eval_method}} \
    --sequence_id {{seq_id}} \
    --caption_file captions_{{captioner_name}}_{{seconds_per_caption}}_secs \
    --postfix {{postfix}}

Because of how the code is written, if seconds_per_caption is changed, we would recommend re-running questions/form_question_jsons.py

To continue the example on sequence ID 0, we show an example here:

python scripts/eval.py \
    --model remembr+llama3.1:8b \
    --sequence_id 0 \
    --caption_file captions_VILA1.5-13b_3_secs \
    --postfix _0

An example of running eval.py across multiple tries and across all sequences, look at scripts/bash_scripts/run_all_evals.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval.md

eval.md

Download and preprocess the CODa dataset

Caption the dataset offline

Check the dataset and preprocess it

1. Ensure `data/navqa/data.csv` exists

2. Form the questions in the proper format

Run the evaluation

Files

eval.md

Latest commit

History

eval.md

File metadata and controls

Download and preprocess the CODa dataset

Caption the dataset offline

Check the dataset and preprocess it

1. Ensure data/navqa/data.csv exists

2. Form the questions in the proper format

Run the evaluation

1. Ensure `data/navqa/data.csv` exists