Automatic prompt optimization using reasoning models and promptfoo.
PromptEvolver is a Python CLI tool that automatically optimizes prompts by:
- Running promptfoo tests to evaluate prompt performance
- Using a local Ollama reasoning model to analyze test failures
- Generating improved prompts based on the analysis
- Repeating until optimal performance is achieved
Key Innovation: Uses a lightweight local reasoning model (user-specified) for intelligent evaluation, while promptfoo tests use OpenAI models. This hybrid approach provides fast, private reasoning with high-quality test execution.
-
Setup the environment:
./setup.sh
-
Add your API keys to
.env: The setup script creates.envif it does not exist. Open it in your editor and set values such as:OPENAI_API_KEY=sk-...
-
Start Ollama (optional – skip if using --use-openai-nano):
ollama serve # In another terminal: ollama pull qwen3:0.6b -
Run optimization (uv handles the virtualenv automatically):
uv run promptevolver.py # add --use-openai-nano to rely solely on OpenAI reasoning -
View results or compare iterations:
uv run promptevolver.py --view-iteration 1 uv run promptevolver.py --compare
When you execute uv run promptevolver.py, the CLI orchestrates the following steps:
- Argument parsing & mode selection – determines whether to run optimization or open the promptfoo viewer, and notes if you requested the OpenAI reasoning path with
--use-openai-nano. - Config + environment load – reads the promptfoo YAML, extracts the initial prompt list, and loads any keys from the
.envfile next topromptevolver.py. The original prompts are cached so the file can be restored when the run ends. - Reasoning model handshake – either confirms the Ollama model is available or verifies
OPENAI_API_KEYbefore switching to OpenAIgpt-5-nano. If neither path works, the run continues but skips automated analysis and prompt rewriting. - Iteration loop – for each round (up to
--iterations):- writes the current candidate prompt back into the config;
- calls
npx promptfoo@latest eval -o output/latest.json(passing the.envfile) and parses the resulting JSON to compute pass rate and average score; - prints the metrics and, when tests fail, asks the reasoning model to summarize the first few failures and return actionable suggestions.
- Prompt revision – if suggestions exist and the 95 % pass-rate target is unmet, the reasoning model drafts a simpler replacement prompt. Guardrails ensure placeholders are kept, the text stays concise, and obviously bad candidates are discarded.
- Wrap-up & artifacts – after the loop the tool highlights the best-performing prompt, prints a summary, saves
evolution_results.json, and snapshots each variant toiterations/iteration_N.yamlfor later inspection or viewer runs. - Config reset – finally rewrites the promptfoo config with the original baseline prompt so repeat runs always start from the same state.
- 🧠 Local Reasoning: Uses Ollama phi4-mini-reasoning for private, fast analysis (or OpenAI
gpt-5-nanowith--use-openai-nano) - 📊 Visual Feedback: Clear progress indicators and result summaries
- 🔄 Smart Improvement: Automatically generates better prompts with simplification bias
- 📁 Iteration Tracking: Saves each optimization round for comparison
- 🎯 Target-driven: Stops when 95% pass rate is achieved
- 🔍 Built-in Viewer: Integration with promptfoo's visualization UI
- 🛡️ Anti-Complexity: Prevents prompt over-optimization with length limits
- 📈 Score Tracking: Identifies best performing prompts across iterations
# Run optimization with default settings
uv run promptevolver.py
# Custom configuration / iteration count
uv run promptevolver.py --config examples/customer_support.yaml --iterations 5 --model o4-mini
# Use OpenAI gpt-5-nano for reasoning instead of Ollama
uv run promptevolver.py --use-openai-nano
# View or compare iterations in the promptfoo UI
uv run promptevolver.py --view-iteration 2
uv run promptevolver.py --comparePromptEvolver uses promptfoo YAML configuration files. See promptfooconfig.yaml for an example, or check the examples/ directory for more configurations.
- Python 3.10+
- Node.js (for promptfoo CLI)
- Ollama (for local reasoning model, unless you use
--use-openai-nano) – ollama.com - OpenAI API key stored in
.env - uv package manager (installed by
setup.sh)
MIT License - see LICENSE file for details.