MICCAI-FLARE-2025-Challenge-Task-5

Vision-Language Model for Multitask Medical Text Generation

This repository is the official implementation of Vision-Language Model for Multitask Medical Text Generation.

Environments and Requirements

Ubuntu 22.04
NVIDIA GeForce RTX 3090 24GB
CUDA 12.4
python 3.9.21

To install requirements:

git clone https://github.com/HongkunSun/MICCAI-FLARE-2025-Challenge-Task-5.git
cd MICCAI-FLARE-2025-Challenge-Task-5
conda env create -f environment.yml
conda activate FLARE_2025_Challenge_Task_5_MTYW

Dataset

The pipeline expects the FLARE 2025 2D MLLM dataset to be organized in the following structure:

organized_dataset/
├── training/
│   ├── Retinography/
│   │   ├── retino/
│   │   │   ├── imagesTr/
│   │   │   └── retino_questions_train.json
│   │   └── fundus/
│   │       └── ...
│   └── ...
├── validation-hidden/
│   └── ...
└── validation-public/
    └── ...

Preprocessing

We have organized the data according to task types, and the specific files are located in the all_learning_task_split directory. You need to replace the image paths for each data entry. We provide the processing script process_json.py for this purpose. You only need to replace organized_dataset with the actual path.

Running the data preprocessing code:

python process_json.py --folder ./all_learning_task_split --out_folder ./all_learning_task_split --old organized_dataset --new /path/to/your/organized_dataset

Prepare weight for vision-language model

Llama2 Version

from huggingface_hub import snapshot_download
snapshot_download(repo_id="meta-llama/Llama-2-7b-chat-hf", local_dir=local_dir_1)

Then modify line 8 at MICCAI-FLARE-2025-Challenge-Task-5/train_configs/medsiglip_llama2_7b_finetune.yaml to be the path of Llama-2-7b-chat-hf.

Medsiglip Version

snapshot_download(repo_id="google/medsiglip-448", local_dir=local_dir_2)

Then modify line 9 at MICCAI-FLARE-2025-Challenge-Task-5/train_configs/medsiglip_llama2_7b_finetune.yaml to be the path of Medsiglip-448.

Training

Before training, you also need to input the path of each task's JSON file into the corresponding YAML file's ann_path. You can find these files in the MICCAI-FLARE-2025-Challenge-Task-5/mtyw/configs/datasets directory.

To train the model in the paper, run this command:

python train.py

Inference

To infer the testing cases, run this command:

python inference_flare2025.py --dataset_path <path_to_validation-hidden> --ckpt <path_to_trained_model_pth> --llama_model local_dir_1 --vit_model local_dir_2 --output_file <path_to_output_json_file>

Results

Our method achieves the following performance on MICCAI FLARE 2025 Task 5 Validation-Hidden

Model name	classification	multi_label_classification	detection	regression
maiahmed	0.74	0.57	0.82	11.84
mtyw(ours team)	0.70	0.54	0.80	13.63
lujiazho	0.68	0.17	0.69	16.50
phucnlt	0.45	0.54	0.85	22.89

Contributing

This project is licensed under the MIT License. We welcome contributions from the community! To contribute:

Fork the repository.
Open a Pull Request.

Acknowledgement

We thank the contributors of public datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MICCAI-FLARE-2025-Challenge-Task-5

Vision-Language Model for Multitask Medical Text Generation

Environments and Requirements

Dataset

Preprocessing

Prepare weight for vision-language model

Llama2 Version

Medsiglip Version

Training

Inference

Results

Contributing

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
all_learning_task_split		all_learning_task_split
mtyw		mtyw
train_configs		train_configs
README.md		README.md
environment.yml		environment.yml
inference_flare2025.py		inference_flare2025.py
process_json.py		process_json.py
train.py		train.py

HongkunSun/MICCAI-FLARE-2025-Challenge-Task-5

Folders and files

Latest commit

History

Repository files navigation

MICCAI-FLARE-2025-Challenge-Task-5

Vision-Language Model for Multitask Medical Text Generation

Environments and Requirements

Dataset

Preprocessing

Prepare weight for vision-language model

Llama2 Version

Medsiglip Version

Training

Inference

Results

Contributing

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages