This is the official repository which provides a baseline model for our proposed task: GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations.
Model Architecture (see [Paper] for details):
(1) PyTorch. See https://pytorch.org/ for instruction. For example,
conda install pytorch torchvision torchtext cudatoolkit=11.3 -c pytorch
(2) PyTorch Lightning. See https://www.pytorchlightning.ai/ for instruction. For example,
python -m pip install lightning
The released dataset is under this repository [Dataset]
The processed data can be downloaded from the link [processed_data]
Before starting, you should encode the instructional videos, scripts, QAs.
Just run the code with single GPU. The code will automatically process training and evalutaion process.
python train.py
Feel free to contact us if you have any problems: [email protected], or leave an issue in this repo.