You can install mlebench
with pip:
pip install -e .
The CA-bench dataset is a collection of 70 CA problems which we use to evaluate the ML engineering capabilities of AI systems.
To install CA problems datasets, run:
cabench download -d datasets
To install baseline and humand design results, run:
cabench download -d results
To generate workflows from a specific pipeline:
cabench generate -p <task_directory> -s <save_directory> -pl <pipeline_path> -n <rounds>
Example:
cabench generate -p tasks/node-level -s results/my_experiment -pl pipeline/zeroshot_pipeline.py -n 3
To run the generated workflows:
cabench run -p <task_directory> -s <save_directory> -n <rounds>
Example:
cabench run -p tasks/node-level -s results/my_experiment -n 3
To calculate scores for executed solutions:
cabench calculate -p <task_directory> -s <save_directory> -n <rounds>
Example:
cabench calculate -p tasks/node-level -s results/my_experiment -n 3
To generate, run and calculate scores in a single command:
cabench generate -p <task_directory> -s <save_directory> -pl <pipeline_path> -n <rounds> --run-after --calculate-after
Example:
cabench generate -p tasks/node-level -s results/my_experiment -pl pipeline/zeroshot_pipeline.py -n 3 --run-after --calculate-after
-p, --path
: Path to task directory (multiple tasks supported)-s, --save-dir
: Directory to save results (must be a subfolder of 'results/')-pl, --pipeline_path
: Path to pipeline for generating solutions-n, --rounds
: Number of rounds to run (default: 1)--run-after
: Run workflows immediately after generation--calculate-after
: Calculate scores after running (requires --run-after)
cabench download --list