Skip to content

BigRocq is a utility, that takes a Rocq (former Coq) project as input and uses domain knowladge to increase a number of theorems in the dataset by a significant factor.

License

Notifications You must be signed in to change notification settings

JetBrains-Research/rocqstar-rag

Repository files navigation

JetBrains Research

RocqStar Proof Retrieval

RocqStar introduces a novel approach to premise selection and retrieval for Rocq (Coq) projects.
Due to the inherent data scarcity in interactive theorem proving, traditional large-scale language modeling is challenging. RocqStar bridges this gap by combining automated proof data extraction, learned theorem embeddings, and seamless integration with CoqPilot, providing intelligent context retrieval during proof generation.

Using the BigRocq extraction utility, we automatically mine intermediate proof states and generate augmented theorem–proof datasets. From this data, we train a self-attentive embedding model capable of predicting semantic similarity between statements -- effectively ranking relevant lemmas and theorems for a given proof goal.

Our approach improves context selection accuracy by 28% over baseline text-similarity rankers, demonstrating the power of embedding-based retrieval in the Rocq ecosystem.

All components, including the dataset, embedder training pipeline, and CoqPilot integration are open-sourced in this repository.

📦 Model checkpoint: JetBrains-Research/rocq-language-theorem-embeddings

Components

  • big-rocq
    Proof‐augmentation utility that expands a Rocq project by transforming and recombining proof subtrees into new theorems.

  • proof-embeddings
    Code for learning statement embeddings.

  • ranker-server
    Python server wrapping the trained embedder for real‐time statement similarity queries.

  • CoqPilot+RocqStar
    Patch and integration for the CoqPilot VSCode plugin to include RocqStar ranking support.

  • embedder-training-dataset
    Dataset used for training the proof embedder. It contains a collection of theorem–proof pairs, that we mined from the Coq projects.

About

BigRocq is a utility, that takes a Rocq (former Coq) project as input and uses domain knowladge to increase a number of theorems in the dataset by a significant factor.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published