Quick Start • Weights • Inference • Benchmark • Tuning • Citation
This repository contains the official code for the paper, "MIRAGE: A multimodal foundation model and benchmark for comprehensive retinal OCT image analysis", led by José Morano and Hrvoje Bogunović, from the CD-AIR lab of the Medical University of Vienna. The paper has been accepted for publication in npj Digital Medicine.
[arXiv
]
MIRAGE is a multimodal foundation model for comprehensive retinal OCT/SLO image analysis. It is trained on a large-scale dataset of multimodal data, and is designed to perform a wide range of tasks, including disease staging, diagnosis, and layer and lesion segmentation. MIRAGE is based on the MultiMAE architecture, and is pretrained using a multi-task learning strategy. The model, based on ViT, is available in two sizes: MIRAGE-Base and MIRAGE-Large.
Important
All scripts and code are intended to run on Linux systems.
Overview of the proposed model (MIRAGE) and other general (DINOv2) and domain-specific (MedSAM, RETFound) foundation models. In contrast to existing unimodal foundation models, our approach utilizes multimodal self-supervised learning to train a Vision Transformer on a large dataset of paired multimodal retinal images, including optical coherence tomography (OCT), scanning laser ophthalmoscopy (SLO), and automatically generated labels for retinal layers. We evaluated the model on a comprehensive benchmark consisting of 19 tasks from 14 publicly available datasets and two private datasets, covering both OCT and SLO classification and segmentation tasks. Statistical significance was calculated using the Wilcoxon signed-rank test across all datasets. Our foundation model, MIRAGE, significantly outperforms state-of-the-art foundation models across all task types.
For a quick start, use the provided script prepare_env.py to create a new python environment, install the required packages, and download the model weights and the datasets.
Important
The script will download the model weights and the datasets, which are large files. Make sure you have enough disk space and a stable internet connection.
In addition, if the system Python version is not 3.10.*, it will install Python 3.10.16 (from source) in the same directory. It will also install PyTorch 2.5.1 (CUDA 11.8).
./prepare_env.py
Tip
Run the script with the -h
or --help
flag to see the available options.
The models can be easily used with the hf/mirage_hf.py
code and loading the weights with Hugging Face 🤗. The only requirement is having the torch
, einops
, huggingface_hub
, and safetensors
packages installed.
from huggingface_hub import PyTorchModelHubMixin
from mirage_hf import MIRAGEWrapper
class MIRAGEhf(MIRAGEWrapper, PyTorchModelHubMixin):
def __init__(
self,
input_size=512,
patch_size=32,
modalities='bscan-slo',
size='base',
):
super().__init__(
input_size=input_size,
patch_size=patch_size,
modalities=modalities,
size=size,
)
# For the MIRAGE model based on ViT-Base
model = MIRAGEhf.from_pretrained("j-morano/MIRAGE-Base")
# For the MIRAGE model based on ViT-Large
model = MIRAGEhf.from_pretrained("j-morano/MIRAGE-Large")
Note
The code has been tested with PyTorch 2.5.1 (CUDA 11.8) and Python 3.10.10.
Create a new python environment and activate it:
python -m venv venv # if not already created
source venv/bin/activate
Install the required packages:
pip install -r requirements.txt
The model weights are available in the Model weights release on GitHub.
Model | Link |
---|---|
MIRAGE-Base | Weights-Base |
MIRAGE-Large | Weights-Large |
The script mirage_wrapper.py
provides a simple pipeline to load the model and run inference on a single sample.
This sample is already included in the repository (_example_images/
) and consists of a triplet of OCT, SLO, and layer segmentation images.
To run the inference, simply execute the script:
python mirage_wrapper.py
Check the code for more details.
We provide all the publicly available datasets used in the benchmark with the data splits. See docs/segmentation_benchmark.md for more details on the segmentation benchmark, and docs/classification_benchmark.md for the classification benchmark.
Although we do not provide the pretraining data due to privacy concerns, we provide the code to pretrain MIRAGE on a multimodal dataset. Please check the docs/pretraining.md for more details.
We provide the code to fine-tune MIRAGE and other state-of-the-art foundation models for OCT segmentation tasks. Please check the docs/segmentation_tuning.md for more details.
We also provide the code to fine-tune the models for OCT and SLO classification tasks. More information can be found in the docs/classification_tuning.md file.
If you have any questions or find problems with the code, please open an issue on GitHub.
If you find this repository useful, please consider giving it a star ⭐ and a citation 📝:
@misc{morano2025mirage,
title={{MIRAGE}: Multimodal foundation model and benchmark for comprehensive retinal {OCT} image analysis},
author={José Morano and Botond Fazekas and Emese Sükei and Ronald Fecso and Taha Emre and Markus Gumpinger and Georg Faustmann and Marzieh Oghbaie and Ursula Schmidt-Erfurth and Hrvoje Bogunović},
year={2025},
eprint={2506.08900},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.08900},
}
The models and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. See LICENSE for more details.
MIRAGE code is mainly based on MultiMAE, along with timm, DeiT, DINO, MoCo-v3, BEiT, MAE-priv, MAE, mmsegmentation, MONAI, and RETFound. We thank the authors for making their code available.
- https://github.com/EPFL-VILAB/MultiMAE
- https://github.com/rwightman/pytorch-image-models/tree/master/timm
- https://github.com/facebookresearch/deit
- https://github.com/facebookresearch/dino
- https://github.com/facebookresearch/moco-v3
- https://github.com/microsoft/unilm/tree/master/beit
- https://github.com/BUPT-PRIV/MAE-priv
- https://github.com/facebookresearch/mae
- https://github.com/open-mmlab/mmsegmentation
- https://github.com/Project-MONAI/MONAI
- https://github.com/rmaphoh/RETFound_MAE