Genesys (Genetic discovery system) is a distributed evolutionary system that uses LLM agents to discover better LLMs. It covers the full workflow from ideation, implementation, checking, training, and evaluating. You can play with the demo at genesys.allen.ai (may slow for first build). Our results can be found in these pages:
- Evolution Statistics: Evolve - Evolution Statistics
- Discovered Designs: Viewer - Design Artifacts (you can download them here)
- Design Leaderboard: Viewer - Design Leaderboard
There are many other features from the GUI, you can explore them. Here is a short demo video that briefly show some of the features. There is a cool DeepWiki generated by Cognition which may help you read this repo (it seems fully AI generated so may have some mistakes).
-
Clone the repo, assume its under your home directory
~
-
Create a virtual env with pytorch, move to the repo, and install genesys cli
conda create -n genesys python=3.12 -y && \
conda activate genesys && \
cd ~/genesys && \
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia -y && \
pip install -e .
- Setup Environment
export MY_OPENAI_KEY=YOURKEY
export TOGETHER_API_KEY=YOURKEY
export ANTHROPIC_API_KEY=YOURKEY
export HF_KEY=YOURKEY
export WANDB_API_KEY=YOURKEY
export S2_API_KEY=YOURKEY
export DATA_DIR=~/genesys/data # change it to a directory you like
export CKPT_DIR=~/genesys/ckpt # change it to a directory you like
export DB_KEY_PATH=~/genesys/secrets/db_key.json # provide yours, see item 4 below
export HF_DATASETS_TRUST_REMOTE_CODE=1
export PINECONE_API_KEY=YOURKEY
export COHERE_API_KEY=YOURKEY
export PERPLEXITY_API_KEY=YOURKEY
export MATHPIX_API_ID=YOURKEY # optional, it provides pdf to text service, useful if you need to get paper from arxiv url for example, its not used in the paper but you may try it yourself
-
Setup a firebase backend, and store the secret json in DB_KEY_PATH, this is required for the distributed evolution
-
Setup a pinecone vectorstore (optional, if you want to use the vector search of paper chunks). You need to store the chunks in your vectorstore, refer to the code in search_utils.py).
-
Setup the requirements
genesys setup && \
pip install -r requirements_optional.txt # optional
- Test your setup by launching a node
genesys node
- Launch the gui
genesys gui
It should be setup if you followed the installation instruction, but if not, here is how you can separately set it up.
The training corpus is available in smollm-12.5-corpus. The evaluation is based on a custmoized lm_eval. You must export DATA_DIR first, then download evaluation data in DATA_DIR, e.g.:
{DATA_DIR}/blimp_filtered/adjunct_island.jsonl
Download the babyLM evaluation data. Notice that if you change your DATA_DIR, you may need to reinstall it, and remember DO NOT INSTALL peft which may cause conflicts. The supported tasks can be found in lm_eval tasks, specially, it contains babyLM tasks in "blimp_filtered" and "blimp_supplement".
Better separate the design nodes and verification nodes, design checkers need to use GPUs, so may cause conflicts. It is recommended to deploy few design nodes and many verification nodes as design nodes are mostly bounded by CPU and API rate limits.
@misc{cheng2025languagemodelinglanguagemodels,
title={Language Modeling by Language Models},
author={Junyan Cheng and Peter Clark and Kyle Richardson},
year={2025},
eprint={2506.20249},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2506.20249},
}