Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning

Figure 1: Schematic overview of safety mirage findings of safety fine-tuned VLM.

This is the official code repository for the paper Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning.

Install

Our safety-unlearn framework has been developed on the LLaVA-1.5, so the require installments could also be found from here. Also, you could use following steps:

Clone this repository and navigate to LLaVA folder

git clone https://github.com/OPTML-Group/VLM-Safety-MU
cd VLM-Safety-MU

Install Package

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Unlearning Fine-tune

Our base model LLava-1.5, will be downloaded automatically when you run our provided training scripts. No action is needed.

For full-parameter unlearning fine-tune, you should run

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/finetune_unlearn.sh

For LoRA unlearning fine-tune, you should run

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/finetune_unlearn_lora.sh

Some unlearn options to note:

--unlearn_type: unlearning algorithm type, which could be 'npo' or 'rmu'.
--rmu_XXX: are the specific hyperparameters for rmu algortihm.
--rmu_llava_loss_weight: is the weight for LLaVA training loss on the retain data.
--rmu_retain_alpha: is the weight for rmu loss on the retain data.
--npo_beta: is the balancing parameter for npo algortihm.
--npo_forget_alpha: is the weight for npo loss on the forget data.
--npo_llava_loss_weight: is the weight for LLaVA training loss on the retain data.

Also, the data path and the output dictionary should also be specified~

Cite This Work

If you found our code or paper helpful, please cite our work~

@article{chen2025safety,
  title={Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning},
  author={Chen, Yiwei and Yao, Yuguang and Zhang, Yihua and Shen, Bingquan and Liu, Gaowen and Liu, Sijia},
  journal={arXiv preprint arXiv:2503.11832},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
llava		llava
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning

Install

Unlearning Fine-tune

Cite This Work

About

Uh oh!

Releases

Packages

Languages

License

OPTML-Group/VLM-Safety-MU

Folders and files

Latest commit

History

Repository files navigation

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning

Install

Unlearning Fine-tune

Cite This Work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages