Skip to content

Commit

Permalink
Merge pull request #253 from The-Art-of-Hacking/bon
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
santosomar authored Dec 21, 2024
2 parents fbbdfbc + b65756e commit 16faf46
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions ai_research/prompt_injection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ There are many different techniques for prompt injection. The table below lists

These examples illustrate different methods to bypass prompt restrictions by altering the input in creative ways, such as using different formats, languages, or emotional appeals, to manipulate the AI's response.

### BoN Jailbreaking Technique from Anthropic
Anthropic published research on a new jailbreaking technique called [“Best-of-N (BoN) Jailbreaking”](https://becomingahacker.org/bon-jailbreaking-technique-from-anthropic-595ef0e43f35) that can bypass safety and security guardrails in large language models (LLMs). A straightforward black-box algorithm designed to bypass safety measures in advanced AI systems across various modalities, including text, vision, and audio. I wrote an article about this technique [here](https://becomingahacker.org/bon-jailbreaking-technique-from-anthropic-595ef0e43f35).

### Additional References:
- https://github.com/The-Art-of-Hacking/h4cker/tree/master/ai_research/prompt_injection
- https://github.com/TakSec/Prompt-Injection-Everywhere
Expand Down

0 comments on commit 16faf46

Please sign in to comment.