premise-selection

This repository provides a cloud-based Lean premise selector, Lean.PremiseSelection.Cloud.premiseSelector. It sends the current goal state and any new user-defined premises (in current file or imported) to a cloud server, and returns the top k premises recommended by the server.

To use the selector:

import PremiseSelection

theorem add_comm_nat (a b : Nat) : a + b = b + a := Nat.add_comm ..

example (a b : Nat) : a + b = b + a := by
  premises  -- prints premises including `add_comm_nat` and `Nat.add_comm`

The premise selector extends the Lean.PremiseSelection API introduced in Lean 4 core.

It is developed as part of LeanHammer, which uses the cloud-based premise selector.

Overview

The premise selection server backend runs a selector model on from Mathlib, Batteries, and Lean core. It uses an encoder-only transformer to embed premises and the goal state, and retrieves the top-k premises by cosine similarity.

For performance reasons (see below), the number of new premises that can be uploaded has an upper limit set by the server (e.g. 8192). A warning will be issued if this limit is surpassed, and extra new premises are truncated. This truncation prioritizes the new premises in the current module, and then premises in more recently imported modules.

By default, the cloud premise selector Lean.PremiseSelection.Cloud.premiseSelector uses the backend API hosted by us at http://leanpremise.net. To use a custom backend (e.g. in heavy use cases, machine learning training, or for private premises that you do not wish to upload to the cloud service), you may set up your own server and then specify a different URL:

set_option premiseSelection.apiBaseUrl "http://my_api_url"

Run time

The first call to premise selection (by hammer or by premises) may be much slower than subsequent calls, due to caching. One should expect the first call in any file session (especially in a downstream project of Mathlib, such as FLT) to be up to 10–20 seconds. Unrelatedly, the first call in a downstream project after a server restart may also be much slower (e.g. 2 minutes) due to the time it takes to fill the server-side cache, but subsequent calls in the downstream repository (by any user) will be much faster.

To optimize for run time, the cloud premise selector has three distinct layers of cache:

The first layer is on the user side: since the server embedder only takes pretty-printed representations of new premises, the user side needs to pretty-print new premises, which is done once per file session, and can take up to 10–20 seconds. To make this time reasonably short, there is also an upper bound on the number of new premises allowed to be uploaded to the server. Since this is on the user side, this extra time is needed for every new file session.
The second layer is on the server: the server maintains an LRU cache of the embeddings of recent new premises uploaded by users. (This LRU cache is used only for performance purposes, and is not tied to user identity.) Since this is on the server side, the cache serves all users.
The third layer is during server initialization: the server pre-computes the embeddings of all premises in a tagged version of Lean core, batteries, and mathlib.

Combinators

This repository also provides premise selector combinators:

open Lean PremiseSelection

/-! `orElse` combinator -/

-- Tries the cloud premise selector. If it doesn't work (e.g. network error), use MePo instead.
set_premise_selector
  Cloud.premiseSelector
  <|> mepoSelector (useRarity := true)

/-! `interleave` combinator -/

-- Retrieves `k` premises from the cloud, `k` from MePo, interleaves them by rank,
-- and takes the top-`k` deduplicated premises.
-- This is inspired by Isabelle Sledgehammer's MeSh.
set_premise_selector interleave #[
  Cloud.premiseSelector,
  mepoSelector (useRarity := true) (p := mepoP) (c := mepoC)
]

Testing

To test the selectors:

lake test

This code, as of commit bdf852, is tested on Lean versions v4.18.0 to v4.20.0.

Citation and Resources

The premise selector is developed by Thomas Zhu, Joshua Clune, Jeremy Avigad, Albert Jiang, and Sean Welleck, and described in our paper Premise Selection for a Lean Hammer.

Links to open-source components of this project:

Server
Model training
- Model weights (subject to change)
Data extraction
- Extracted dataset
LeanHammer repo

Citation:

@article{zhu2025premise,
  title={Premise Selection for a Lean Hammer},
  author={Zhu, Thomas and Clune, Joshua and Avigad, Jeremy and Jiang, Albert Qiaochu and Welleck, Sean},
  journal={arXiv preprint arXiv:2506.07477},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
PremiseSelection		PremiseSelection
.gitignore		.gitignore
LICENSE		LICENSE
PremiseSelection.lean		PremiseSelection.lean
PremiseSelectionTest.lean		PremiseSelectionTest.lean
README.md		README.md
lake-manifest.json		lake-manifest.json
lakefile.toml		lakefile.toml
lean-toolchain		lean-toolchain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

premise-selection

Overview

Run time

Combinators

Testing

Citation and Resources

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

hanwenzhu/premise-selection

Folders and files

Latest commit

History

Repository files navigation

premise-selection

Overview

Run time

Combinators

Testing

Citation and Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages