This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning" (https://arxiv.org/abs/2509.24372). Evolution strategies (ES) is used to directly optimize billions of parameters of large language models (LLMs).
Feel free to join the ES fine-tuning forum in Discussions.
10/27/2025: 🔥🔥🔥 An accelerated version with 10X+ speed-up in running time is added to the repo! See here. 🚀🚀🚀
Note: we are still actively adding more experimental codes into this repo. We expect breaking change to the accelerated implementations.
Create a virtual environment with python version >= 3.10 and activate it
python -m venv es
source es/bin/activateFrom the root of the repository run following command to install all the relevant python packages
pip install -r requirement.txtFor running the main ES code on conciseness fine-tuning
accelerate launch \
--num_processes 2 \
--num_machines 1 \
--machine_rank 0 \
es_fine-tuning_conciseness.py \
--gpu_threads=1 \
--model_name=Qwen/Qwen2.5-7B-Instruct--num_processes specifies the number of GPUs to use and --gpu_threads specifies the number of threads inside each GPU. The total number of parallel evaluations is thereby equal to num_processes*gpu_threads.
For running the main ES code on the Countdown task
accelerate launch \
--num_processes 4 \
--num_machines 1\
--machine_rank 0 \
countdown/es_fine-tuning_countdown.py \
--data_sample 200 \
--model_name Qwen/Qwen2.5-3B-Instruct \
--gpu_threads 1--gpu_ids: Specify which GPUs to use (CUDA device id), argument foraccelerate launch--model_name: HuggingFace model to fine-tune--hf_cache_dir: Directory for HuggingFace cache--precision: Model precision, default to bebf16--verbose: Enable detailed logging if this argument is present in the command line
Note: The original implementation uses a partially correlated noise. To use complete i.i.d. noise, please use es_fine-tuning_conciseness_iid.py and countdown/es_fine-tuning_countdown_iid.py instead. See here for more details.
If you are using the latest accelerated version es-fine-tuning_countdown_accl.py, please also install the vllm and tensorboard by:
pip install vllm==0.11.0
pip install tensorboardFor running the accelerated version on the Countdown task:
# Single-GPU quickstart
python es_fine-tuning_countdown_accl.py \
--model_name Qwen/Qwen2.5-3B-Instruct \
--cuda_devices 0 \
--num_engines 1 \
--population_size 30 \
--num_iterations 1000
# Multi-GPU run (one vLLM engine per GPU)
python es_fine-tuning_countdown_accl.py \
--model_name Qwen/Qwen2.5-3B-Instruct \
--cuda_devices 0,1,2,3 \
--num_engines 4 \
--population_size 30 \
--num_iterations 1000 \
--sigma 0.001 \
--alpha 0.0005 \
--experiment_dir es-ft-experimentOn preliminary 4xH100 setting, accelerated version achieves ~10 times speed-up with similar convergence rate.
If you find this work helpful in your research, please cite:
@misc{qiu2025evolutionstrategiesscalellm,
title={Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning},
author={Xin Qiu and Yulu Gan and Conor F. Hayes and Qiyao Liang and Elliot Meyerson and Babak Hodjat and Risto Miikkulainen},
year={2025},
eprint={2509.24372},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2509.24372},
}