[CVPR 2025] Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
![]() |
![]() |
---|
Note: The color of the gaze vector follows a gradient from blue (frontal gaze toward the camera) to green (gaze directed away from the camera). Red indicates a gaze perpendicular to the camera, appearing in the middle of the gradient.
Authors: Pierre Vuillecard, Jean-marc Odobez
This repository contains the code and checkpoints for our CVPR 2025 paper "Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels".
First, we need to clone the repository
git clone
cd <name_of_the_repo>
Next, create the conda environment and activate it after installing the necessary packages:
conda env create -f environment.yaml
conda activate gazeCVPR
Download model for head detection:
bash setup.sh
The demo code is located in the demo.py file. Our Gaze Transformer model can perform inference on both images and videos. The demo first detects all the heads in the input using a head detector. Then, for each detected head, it predicts the gaze direction using our Gaze Transformer model. The output is drawn on the image or video and saved to the specified output directory.
To run the demo on images, only image inference is availabale:
python demo.py --input-filename data/pexels-jopwell-2422290.jpg --output-dir output/ --modality image
To run the demo on videos, you can either run the model with image inference where each frame is processed independently:
python demo.py --input-filename data/7149282-hd_1280_720_25fps.mp4 --output-dir output/ --modality image
Or you can run the model with video inference where the model uses temporal information to process the video.
python demo.py --input-filename data/7149282-hd_1280_720_25fps.mp4 --output-dir output/ --modality video
Additional parameters can be set such as the model checkpoint
--checkpoint ./checkpoints/gat_stwsge_gaze360_gf.ckpt # best SOTA model trained on Gaze360 and Gazefollow
--checkpoint ./checkpoints/gat_gaze360.ckpt # best model trained on Gaze360 only
You can also run the demo on GPU by adding the --device cuda
. To reduce the inference time on video a batch processing is performed. Therefore, the --batch-size
and --num-workers
parameters can be set to speed up inference.
The dataset preprocessing for training will be released in the future.
- Gaze360 License: Research use only
- Gazefollow License: NonCommercial
- Our model is buit upon Swin3D from omnivore and used as pretrained model. code License: Attribution-NonCommercial 4.0 International
- Tracking of face bounding box Boxmot code License: AGPL-3.0
- Head detector from code Licence: GPL-3.0 using yolov5 (Licence : AGPL-3.0) code trained on crowdhuman dataset data (Licence: NonCommercial)
If you use this work, please cite the following paper:
@INPROCEEDINGS{vuillecard3DGAZEWILD2025,
author = {Vuillecard, Pierre and Odobez, Jean-Marc},
projects = {Idiap},
month = jun,
title = {Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025},
pdf = {https://publications.idiap.ch/attachments/papers/2025/Pierre_3DGAZEESTIMATIONINTHEWILD_2025.pdf}
}