PromptEvolver

Automatic prompt optimization using reasoning models and promptfoo.

Overview

PromptEvolver is a Python CLI tool that automatically optimizes prompts by:

Running promptfoo tests to evaluate prompt performance
Using a local Ollama reasoning model to analyze test failures
Generating improved prompts based on the analysis
Repeating until optimal performance is achieved

Key Innovation: Uses a lightweight local reasoning model (user-specified) for intelligent evaluation, while promptfoo tests use OpenAI models. This hybrid approach provides fast, private reasoning with high-quality test execution.

Quick Start

Setup the environment:
```
./setup.sh
```
Add your API keys to .env: The setup script creates .env if it does not exist. Open it in your editor and set values such as:
```
OPENAI_API_KEY=sk-...
```

Start Ollama (optional – skip if using --use-openai-nano):

ollama serve
# In another terminal:
ollama pull qwen3:0.6b

Run optimization (uv handles the virtualenv automatically):

uv run promptevolver.py
# add --use-openai-nano to rely solely on OpenAI reasoning

View results or compare iterations:

uv run promptevolver.py --view-iteration 1
uv run promptevolver.py --compare

What Happens During a Run

When you execute uv run promptevolver.py, the CLI orchestrates the following steps:

Argument parsing & mode selection – determines whether to run optimization or open the promptfoo viewer, and notes if you requested the OpenAI reasoning path with --use-openai-nano.
Config + environment load – reads the promptfoo YAML, extracts the initial prompt list, and loads any keys from the .env file next to promptevolver.py. The original prompts are cached so the file can be restored when the run ends.
Reasoning model handshake – either confirms the Ollama model is available or verifies OPENAI_API_KEY before switching to OpenAI gpt-5-nano. If neither path works, the run continues but skips automated analysis and prompt rewriting.
Iteration loop – for each round (up to --iterations):
- writes the current candidate prompt back into the config;
- calls npx promptfoo@latest eval -o output/latest.json (passing the .env file) and parses the resulting JSON to compute pass rate and average score;
- prints the metrics and, when tests fail, asks the reasoning model to summarize the first few failures and return actionable suggestions.
Prompt revision – if suggestions exist and the 95 % pass-rate target is unmet, the reasoning model drafts a simpler replacement prompt. Guardrails ensure placeholders are kept, the text stays concise, and obviously bad candidates are discarded.
Wrap-up & artifacts – after the loop the tool highlights the best-performing prompt, prints a summary, saves evolution_results.json, and snapshots each variant to iterations/iteration_N.yaml for later inspection or viewer runs.
Config reset – finally rewrites the promptfoo config with the original baseline prompt so repeat runs always start from the same state.

Features

🧠 Local Reasoning: Uses Ollama phi4-mini-reasoning for private, fast analysis (or OpenAI gpt-5-nano with --use-openai-nano)
📊 Visual Feedback: Clear progress indicators and result summaries
🔄 Smart Improvement: Automatically generates better prompts with simplification bias
📁 Iteration Tracking: Saves each optimization round for comparison
🎯 Target-driven: Stops when 95% pass rate is achieved
🔍 Built-in Viewer: Integration with promptfoo's visualization UI
🛡️ Anti-Complexity: Prevents prompt over-optimization with length limits
📈 Score Tracking: Identifies best performing prompts across iterations

Usage

# Run optimization with default settings
uv run promptevolver.py

# Custom configuration / iteration count
uv run promptevolver.py --config examples/customer_support.yaml --iterations 5 --model o4-mini

# Use OpenAI gpt-5-nano for reasoning instead of Ollama
uv run promptevolver.py --use-openai-nano

# View or compare iterations in the promptfoo UI
uv run promptevolver.py --view-iteration 2
uv run promptevolver.py --compare

Configuration

PromptEvolver uses promptfoo YAML configuration files. See promptfooconfig.yaml for an example, or check the examples/ directory for more configurations.

Requirements

Python 3.10+
Node.js (for promptfoo CLI)
Ollama (for local reasoning model, unless you use --use-openai-nano) – ollama.com
OpenAI API key stored in .env
uv package manager (installed by setup.sh)

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
promptevolver.py		promptevolver.py
promptfooconfig.yaml		promptfooconfig.yaml
pyproject.toml		pyproject.toml
setup.sh		setup.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PromptEvolver

Overview

Quick Start

What Happens During a Run

Features

Usage

Configuration

Requirements

License

About

Uh oh!

Releases

Packages

Languages

License

everling-prime/promptevolver

Folders and files

Latest commit

History

Repository files navigation

PromptEvolver

Overview

Quick Start

What Happens During a Run

Features

Usage

Configuration

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages