Skip to content

MAGICS-LAB/DAPA

Repository files navigation

Decoupled Alignment for Robust Plug-and-Play Adaptation

This project implements a decoupled alignment approach for robust plug-and-play adaptation of language models. It provides tools for analyzing and modifying model behavior while maintaining alignment with desired objectives.

Project Structure

.
├── plugin_aligner/       # Core alignment implementation
│   ├── replace.py       # Main replacement logic
│   ├── analyze.py       # Analysis tools
│   ├── evulate.py       # Evaluation utilities
│   └── utils/          # Helper utilities
├── Dataset/             # Dataset directory
├── template_checker/    # Template verification tools
└── scripts
    ├── run_test.sh     # Testing script
    ├── run_jailbreak.sh # Jailbreak testing
    └── run_ppl.sh      # Perplexity evaluation

Installation

  1. Create and activate a conda environment:
conda create --name jailbreak python=3.9.18
conda activate jailbreak
  1. Install required packages:
# PyTorch and related packages
pip3 install torch torchvision torchaudio

# Transformers and language model tools
pip install -U transformers
pip install openai pandas einops accelerate
pip install sentencepiece protobuf
pip install transformers_stream_generator tiktoken
pip install ai2-olmo autoawq auto-gptq
pip install sympy importlib-metadata
  1. Set up CUDA environment (if using GPU):
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
export PATH=/usr/local/cuda/bin:$PATH

Usage

Running Tests

To run tests with a specific model:

python plugin_aligner/replace.py --target_model meta-llama/Llama-2-7b-hf

Available Scripts

  • run_test.sh: Run comprehensive tests
  • run_jailbreak.sh: Evaluate model robustness
  • run_ppl.sh: Calculate perplexity metrics

Configuration

  • Set HF_TOKEN environment variable for Hugging Face model access
  • Adjust GPU settings in scripts as needed
  • Modify dataset paths in configuration files

License

See LICENSE file for details.

7. Citation

If you have any question regarding our paper or codes, please feel free to start an issue.

If you use DAPA in your work, please kindly cite our paper:

DAPA

@misc{luo2024decoupledalignmentrobustplugandplay,
      title={Decoupled Alignment for Robust Plug-and-Play Adaptation}, 
      author={Haozheng Luo and Jiahao Yu and Wenxin Zhang and Jialong Li and Jerry Yao-Chieh Hu and Xinyu Xing and Han Liu},
      year={2024},
      eprint={2406.01514},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.01514}, 
}

8.Acknowledgement

We appreciate the following GitHub repos a lot for their valuable code and efforts.

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •