cross-domain-horizon

Estimate the time horizon of AIs over time on various domains like knowledge and vision

Methodology

Only the frontier is used to fit the regression line
Only models that get between 10% and 90% on some benchmark are counted
The best agent is used on each model

Usage

First run

pip install requirements.txt --no-deps

Run benchmark-specific .py files to load scores for each dataset. Some also calculate horizons.
Run calculate_horizons.py to estimate time horizons for each dataset
Run plots.py to make all plots

To run the combined plot, you'll need cairosvg, see https://stackoverflow.com/questions/73637315/oserror-no-library-called-cairo-2-was-found-from-custom-widgets-import-proje if you get import errors

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.claude		.claude
.cursor/rules		.cursor/rules
baseline		baseline
data		data
plots		plots
src/epoch		src/epoch
webarena		webarena
.gitignore		.gitignore
README.md		README.md
aime.py		aime.py
calculate_horizons.py		calculate_horizons.py
classes.py		classes.py
combined.py		combined.py
gpqa.py		gpqa.py
gpqa_d_analysis.py		gpqa_d_analysis.py
gpqa_diamond.py		gpqa_diamond.py
hcast_r_s.py		hcast_r_s.py
hendrycks_math.py		hendrycks_math.py
livecodebench_2411_2505.py		livecodebench_2411_2505.py
logistic.py		logistic.py
matplotlibrc		matplotlibrc
mle.py		mle.py
mock_aime.py		mock_aime.py
osworld.py		osworld.py
overlay.py		overlay.py
plot_beta_logpdf.py		plot_beta_logpdf.py
plot_speculation.py		plot_speculation.py
plot_splits.py		plot_splits.py
plots.py		plots.py
plotting_aliases.py		plotting_aliases.py
requirements.txt		requirements.txt
rlbench_scores.py		rlbench_scores.py
scratch.py		scratch.py
swe_bench_verified.py		swe_bench_verified.py
tesla.py		tesla.py
tests.py		tests.py
util_plots.py		util_plots.py
utils.py		utils.py
video_mme.py		video_mme.py
wrangle.py		wrangle.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cross-domain-horizon

Methodology

Usage

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

METR/cross-domain-horizon

Folders and files

Latest commit

History

Repository files navigation

cross-domain-horizon

Methodology

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages