Skip to content

Latest commit

 

History

History
42 lines (28 loc) · 1.31 KB

README.md

File metadata and controls

42 lines (28 loc) · 1.31 KB

[EMNLP23 Poster]GazeVQA

This is the official repository which provides a baseline model for our proposed task: GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations.

[Paper]

Model Architecture (see [Paper] for details):

arch

Install

(1) PyTorch. See https://pytorch.org/ for instruction. For example,

conda install pytorch torchvision torchtext cudatoolkit=11.3 -c pytorch

(2) PyTorch Lightning. See https://www.pytorchlightning.ai/ for instruction. For example,

python -m pip install lightning

Data

The released dataset is under this repository [Dataset]

The processed data can be downloaded from the link [processed_data]

Encoding

Before starting, you should encode the instructional videos, scripts, QAs.

Training & Evaluation

Just run the code with single GPU. The code will automatically process training and evalutaion process.

python train.py

Contact

Feel free to contact us if you have any problems: [email protected], or leave an issue in this repo.