GitHub - dahua966/self-deception-jailbreak

Introduction

This repository contains the implementation code for the paper "Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models".

pip install -r requirements.txt

Run the following command to execute attacks:

python src/main.py --target-model gpt-3.5-turbo --assist-model qwen2.5 --device cuda:0 --device cuda:1

The conversation content is saved in the ./conversation/attack_logs directory.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
data		data
src		src
.gitattributes		.gitattributes
Readme.md		Readme.md
requirements.txt		requirements.txt