Skip to content

Commit b8690fe

Browse files
committed
Updating readme
1 parent 5d47d08 commit b8690fe

File tree

2 files changed

+26
-6
lines changed

2 files changed

+26
-6
lines changed

README.md

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11

2-
# LLM Attacker - AWS
2+
# MIPSEval
33

4-
LLM Attacker is a modular framework for simulating and evaluating the behavior of Large Language Models (LLMs) in adversarial or structured multi-turn conversational scenarios. It supports both OpenAI-hosted models and locally hosted models.
4+
Multi-turn Injection Planning System for LLM Evaluation
5+
6+
MIPSEval is a modular framework for simulating and evaluating the behavior of Large Language Models (LLMs) in adversarial or structured multi-turn conversational scenarios. It supports both OpenAI-hosted models and locally hosted models.
7+
8+
MIPSEval uses LLMs to design a conversation strategy as well as execute it, making it fully automated. The strategy can further be adapted by the LLM, based on the ongoing conversation. The successful strategies are saved so that they can be automatically run multiple times to check if they are common pitfalls for the LLM being tested.
9+
10+
![LLM Attacker Evaluator Diagram](images/LLM%20Attacker_Evaluator%20Diagram%20-%20MIPSEval%20Diagram.jpg)
511

612
## Features
713

@@ -10,7 +16,16 @@ LLM Attacker is a modular framework for simulating and evaluating the behavior o
1016
- Configurable attack logic via YAML
1117
- Supports both OpenAI and local LLMs
1218
- JSONL logging of interaction history
13-
19+
- Fully automated evaluation
20+
- Strategy and execution are performed by LLMs
21+
- 3 prompt types: Benign, Probing, and Malicious
22+
- Strategies are updated based on the ongoing conversation
23+
- LLM is used to judge success
24+
- High variety of malicious tasks and jailbreaks/prompt injections
25+
- Working in explore or exploit mode
26+
- Evolving of successful strategies
27+
- Any LLM can be tested with MIPSEval
28+
- An extensible framework that allows evaluation of other aspects of LLMs
1429

1530
## Installation
1631

@@ -32,15 +47,17 @@ OPENAI_API_KEY=your_openai_api_key
3247
Run the application using:
3348

3449
```bash
35-
python llm_attacker.py -e .env -c path/to/config.yaml -p openai
50+
python llm_attacker.py -e .env -c path/to/config.yaml -p openai '[-j conversation_history.jsonl]'
3651
```
3752

3853
For local model usage:
3954

4055
```bash
41-
python llm_attacker.py -e .env -c path/to/config.yaml -p local
56+
python llm_attacker.py -e .env -c path/to/config.yaml -p local '[-j conversation_history.jsonl]'
4257
```
4358

59+
Default OpenAI models used to run MIPSEval are gpt-4o for Planner and gpt-4o-mini for executioner. This can be changed in ```setup.py``` for executioner and ```llm_planner.py``` for planner (in get_step_for_evaluator function). Testing was done with the default models used.
60+
4461
### Command-Line Arguments
4562

4663
| Argument | Description | Required |
@@ -53,7 +70,10 @@ python llm_attacker.py -e .env -c path/to/config.yaml -p local
5370

5471
## Output
5572

56-
Conversations are logged in JSONL format
73+
Conversations are logged in JSONL format. Three files are created:
74+
- Conversation History
75+
- Strategies
76+
- Victorious Strategies
5777

5878
## License
5979

468 KB
Loading

0 commit comments

Comments
 (0)