RadGazeIntent is a deep learning framework that interprets the diagnostic intentions behind radiologists' eye movements during chest X-ray analysis. Unlike existing approaches that mimic radiologist behavior, our method decodes the why behind each fixation point, bridging visual search patterns with diagnostic reasoning.
🎉 Paper accepted at ACM MM 2025 - A top-tier international conference on multimedia research
# Remove existing environment (if any)
conda env remove --name radgazeintent
# Create new environment
conda create -n radgazeintent python=3.8 -y
conda activate radgazeintent
# Install PyTorch and dependencies
conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.3 -c pytorch -y
conda install mkl==2024.0
conda install -c conda-forge cudatoolkit-dev=11.3.1
# Install Detectron2
pip install 'git+https://github.com/facebookresearch/detectron2.git'
# Build custom CUDA operations
cd ./radgazeintent/pixel_decoder/ops
sh make.sh
# Install additional dependencies
pip install uv
uv pip install timm scipy opencv-python wget setuptools==59.5.0 einops protobuf==4.25.0
RadGazeIntent introduces three intention-labeled datasets derived from existing eye-tracking datasets (EGD and REFLACX):
📥 Download: All three datasets are available on 🤗 Hugging Face
Models radiologists following a structured checklist, focusing on one finding at a time.
Captures opportunistic visual search where radiologists consider all findings simultaneously.
Combines initial broad scanning with focused examination, representing real-world diagnostic behavior.
# Train RadGazeIntent on RadSeq dataset
python train.py \
--hparams configs/train_for_real_egd_s2.json \
--dataset-root /path/to/your/dataset \
--gpu-id 0
# Train on RadExplore dataset
python train.py \
--hparams configs/train_for_real_reflacx_s2.json \
--dataset-root /path/to/your/dataset \
--gpu-id 0
# Run inference on all test datasets
python infer_all.py \
--hparams configs/infer_egd_s2.json \
--dataset-root /path/to/your/dataset \
--gpu-id 0
# Quick inference with shell script
bash infer.sh
RadGazeIntent enables several downstream applications:
- 🤖 Intention-aware AI Assistants: Systems that understand what radiologists are looking for
- 📚 Medical Education: Training tools that analyze student gaze patterns
- 🔬 Cognitive Research: Understanding expert visual reasoning processes
If you find RadGazeIntent useful in your research, please consider citing:
@article{pham2025interpreting,
title={Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis},
author={Pham, Trong-Thang and Nguyen, Anh and Deng, Zhigang and Wu, Carol C and Van Nguyen, Hien and Le, Ngan},
journal={arXiv preprint arXiv:2507.12461},
year={2025}
}
⭐ Star this repository if you find it useful! ⭐
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License - see the LICENSE file for details.
This material is based upon work supported by the National Science Foundation (NSF) under Award No OIA-1946391, NSF 2223793 EFRI BRAID, National Institutes of Health (NIH) 1R01CA277739-01.
- Datasets: Built upon EGD and REFLACX eye-tracking datasets
- Backbone: Uses Detectron2 for feature extraction
- Inspiration: Motivated by cognitive science research on expert visual reasoning
Primary Contact: Trong Thang Pham ([email protected])
For questions, feedback, or collaboration opportunities, feel free to reach out! I would love to hear from you if you have any thoughts or suggestions about this work.
Note: While we don't actively seek contributions to the codebase, we greatly appreciate and welcome feedback, discussions, and suggestions for improvements.
- Improve code structure and modularity for better maintainability
- Expand documentation with detailed tutorials and examples
- Docker Support: Add containerized deployment options