feat: add cairo-metrics benchmark harness + CI tracking#9401
Draft
giladchase wants to merge 1 commit intomainfrom
Draft
feat: add cairo-metrics benchmark harness + CI tracking#9401giladchase wants to merge 1 commit intomainfrom
giladchase wants to merge 1 commit intomainfrom
Conversation
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
33466e0 to
80885ee
Compare
- Introduce `cairo-metrics`: a small CLI that currently checks wall-clock regressions from main. - Persist results in `repo_root/results.db` (git-ignored), currently SQLite keyed by run id (defaults to current git SHA). This enables caching and tracking results across time locally by checking out builds starting from this commit, and having the tool populate the db. The tool then allows comparisons based on the db results. Future work can easily extend this to a stateful DB machine (like RDS), since the DB is behind a trait. - Add a GitHub Actions workflow that benchmarks baseline vs PR, reuses cached baseline results via artifacts (saves ci runtime for multiple PR runs over the same base branch), and posts a “Benchmark Comparison” PR comment, not blocking, the reviewer decided. - Seed initial benchmark suites (corelib + OpenZeppelin): corelib uses local src, and openzepplin is bundeled as a vendored release (not a submodule for simplicity). - Walltime engine is either home-brewed timed comparison of the compiler library, or via `hyperfine` which uses the binary. It uses hyperfine by default if available (need to install with `apt`) otherwise the builtin. This is useful locally, to debug the builtin engine itself (results are similar since hyperfine cancels out shell overhead), and since hyperfine outputs useful statistical anomaly messages. But if hyperfine is a pain to maintain it can be removed.
80885ee to
50a1f2e
Compare
5 tasks
giladchase
commented
Jan 4, 2026
Contributor
Author
giladchase
left a comment
There was a problem hiding this comment.
@giladchase made 1 comment.
Reviewable status: 0 of 24 files reviewed, all discussions resolved (waiting on @orizi and @TomerStarkware).
crates/bin/cairo-metrics/src/engine.rs line 63 at r1 (raw file):
// TODO(gilad): Incremental compilation requires cairo compiler support. // For now, skip incremental scenarios entirely. if is_incremental {
Next level in the stack adds the logic itself, but the short circuit will stay until we support incremental in the compiler.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
cairo-metrics: a small CLI that currently checks wall-clock regressions from main.repo_root/results.db(git-ignored), currently SQLite keyed by runid (defaults to current git SHA). This enables caching and tracking results across time locally by
checking out builds starting from this commit, and having the tool populate the db.
The tool then allows comparisons based on the db results. Future work can easily extend this to
a stateful DB machine (like RDS), since the DB is behind a trait.
via artifacts (saves ci runtime for multiple PR runs over the same base branch), and posts a
“Benchmark Comparison” PR comment, not blocking, the reviewer decided.
is bundeled as a vendored release (not a submodule for simplicity).
or via
hyperfinewhich uses the binary. It uses hyperfine bydefault if available (need to install with
apt) otherwise the builtin.This is useful locally, to debug the builtin engine itself
(results are similar since hyperfine cancels out shell overhead), and since
hyperfine outputs useful statistical anomaly messages.
But if hyperfine is a pain to maintain it can be removed.
Type of change
Please check one:
Why is this change needed?
Implemented a benchmarking harness for testing and tracking performance regressions
What was the behavior or documentation before?
What is the behavior or documentation after?
Related issue or discussion (if any)
Additional context