![]() Figure 1: Schematic overview of safety mirage findings of safety fine-tuned VLM. |
This is the official code repository for the paper Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning.
Our safety-unlearn framework has been developed on the LLaVA-1.5, so the require installments could also be found from here. Also, you could use following steps:
- Clone this repository and navigate to LLaVA folder
git clone https://github.com/OPTML-Group/VLM-Safety-MU
cd VLM-Safety-MU
- Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
Our base model LLava-1.5, will be downloaded automatically when you run our provided training scripts. No action is needed.
For full-parameter unlearning fine-tune, you should run
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/finetune_unlearn.sh
For LoRA unlearning fine-tune, you should run
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/finetune_unlearn_lora.sh
Some unlearn options to note:
--unlearn_type
: unlearning algorithm type, which could be 'npo' or 'rmu'.--rmu_XXX
: are the specific hyperparameters for rmu algortihm.--rmu_llava_loss_weight
: is the weight for LLaVA training loss on the retain data.--rmu_retain_alpha
: is the weight for rmu loss on the retain data.--npo_beta
: is the balancing parameter for npo algortihm.--npo_forget_alpha
: is the weight for npo loss on the forget data.--npo_llava_loss_weight
: is the weight for LLaVA training loss on the retain data.
Also, the data path and the output dictionary should also be specified~
If you found our code or paper helpful, please cite our work~
@article{chen2025safety,
title={Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning},
author={Chen, Yiwei and Yao, Yuguang and Zhang, Yihua and Shen, Bingquan and Liu, Gaowen and Liu, Sijia},
journal={arXiv preprint arXiv:2503.11832},
year={2025}
}