This repository is the official implementation of Vision-Language Model for Multitask Medical Text Generation.
- Ubuntu 22.04
- NVIDIA GeForce RTX 3090 24GB
- CUDA 12.4
- python 3.9.21
To install requirements:
git clone https://github.com/HongkunSun/MICCAI-FLARE-2025-Challenge-Task-5.git
cd MICCAI-FLARE-2025-Challenge-Task-5
conda env create -f environment.yml
conda activate FLARE_2025_Challenge_Task_5_MTYW
The pipeline expects the FLARE 2025 2D MLLM dataset to be organized in the following structure:
organized_dataset/
├── training/
│ ├── Retinography/
│ │ ├── retino/
│ │ │ ├── imagesTr/
│ │ │ └── retino_questions_train.json
│ │ └── fundus/
│ │ └── ...
│ └── ...
├── validation-hidden/
│ └── ...
└── validation-public/
└── ...
We have organized the data according to task types, and the specific files are located in the all_learning_task_split directory. You need to replace the image paths for each data entry. We provide the processing script process_json.py for this purpose. You only need to replace organized_dataset with the actual path.
Running the data preprocessing code:
python process_json.py --folder ./all_learning_task_split --out_folder ./all_learning_task_split --old organized_dataset --new /path/to/your/organized_datasetfrom huggingface_hub import snapshot_download
snapshot_download(repo_id="meta-llama/Llama-2-7b-chat-hf", local_dir=local_dir_1)Then modify line 8 at MICCAI-FLARE-2025-Challenge-Task-5/train_configs/medsiglip_llama2_7b_finetune.yaml to be the path of Llama-2-7b-chat-hf.
snapshot_download(repo_id="google/medsiglip-448", local_dir=local_dir_2)Then modify line 9 at MICCAI-FLARE-2025-Challenge-Task-5/train_configs/medsiglip_llama2_7b_finetune.yaml to be the path of Medsiglip-448.
Before training, you also need to input the path of each task's JSON file into the corresponding YAML file's ann_path. You can find these files in the MICCAI-FLARE-2025-Challenge-Task-5/mtyw/configs/datasets directory.
To train the model in the paper, run this command:
python train.py
To infer the testing cases, run this command:
python inference_flare2025.py --dataset_path <path_to_validation-hidden> --ckpt <path_to_trained_model_pth> --llama_model local_dir_1 --vit_model local_dir_2 --output_file <path_to_output_json_file>Our method achieves the following performance on MICCAI FLARE 2025 Task 5 Validation-Hidden
| Model name | classification | multi_label_classification | detection | regression |
|---|---|---|---|---|
| maiahmed | 0.74 | 0.57 | 0.82 | 11.84 |
| mtyw(ours team) | 0.70 | 0.54 | 0.80 | 13.63 |
| lujiazho | 0.68 | 0.17 | 0.69 | 16.50 |
| phucnlt | 0.45 | 0.54 | 0.85 | 22.89 |
This project is licensed under the MIT License. We welcome contributions from the community! To contribute:
- Fork the repository.
- Open a Pull Request.
We thank the contributors of public datasets.