The FMBench Orchestrator
automates the LLM benchmarking. It is built with modular design where users can plug and play with any combination of dataset, models, serving stacks, and benchmark metrics:
Follow the following steps and get your infrastructure cost optimization strategy for hosting Llama3.1-8b in less than 30 mins.
-
IAM ROLE: You need an active AWS account having an IAM Role: Need necessary permissions to create, manage, and terminate EC2 instances. See this link for the permissions and trust policies that this IAM role needs to have. Call this IAM role as
fmbench-orchestrator
. -
Service quota: Your AWS account needs to have enough VCPU quota to launch the Amazon EC2 instances if your LLM serving stack is EC2. In case you need to request a quota increase, please refer to this link.
-
An Orchestrator EC2 Instance: It is recommended to run the orchestrator on an EC2 instance preferably located in the same AWS region where you plan to host your LLM (although launching instances across regions is supported as well).
- Use
Ubuntu
as the instance OS, specifically theubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20240927
AMI. - Use
t3.xlarge
as the instance type with preferably at least 100GB of disk space. - Associate the
fmbench-orchestrator
IAM role with this instance.
- Use
-
Clone the Repository
git clone https://github.com/awslabs/fmbench-orchestrator.git cd fmbench-orchestrator
-
Install
uv
curl -LsSf https://astral.sh/uv/install.sh | sh exec bash uv venv && source .venv/bin/activate && uv pip sync pyproject.toml python -m ipykernel install --user --name=.venv --display-name="Python (uv env)"
-
Hugging Face token: Please follow the instructions here to get a Hugging Face token. Please also make sure to get access to the models in HuggingFace. Most models and tokenizers are downloaded from Hugging Face, to enable this place your Hugging Face token in
/tmp/hf_token.txt
.# replace with your Hugging Face token hf_token=your-hugging-face-token echo $hf_token > /tmp/hf_token.txt
In this example, we compare the cost and performance of hosting Llama3.1-8b on EC2 g6e.2xlarge and g6e.4xlarge.
python main.py --config-file configs/ec2.yml
Here is a description of all the command line parameters that are supported by the orchestrator:
- --config-file - required, path to the orchestrator configuration file.
- --ami-mapping-file - optional, default=ami_mapping.yml, path to a config file containing the region->instance type->AMI mapping
- --fmbench-config-file - optional, config file to use with
FMBench
, this is used if the orchestrator config file uses the "{{config_file}}" format for specifying theFMBench
config file. If you are benchmarking on SageMaker or Bedrock then parameter does need to be specified. - --infra-config-file - optional, default=infra.yml, config file to use with AWS infrastructure
- --write-bucket - optional, default=placeholder, this parameter is only needed when benchmarking on SageMaker, Amazon S3 bucket to store model files for benchmarking on SageMaker
- --fmbench-latest - optional, default=no, *this parameter downloads and installs the latest version of
FMBench
from the GitHub repo rather than the latest released version from PyPi.
To generate analysis reports from the above experiments:
python analytics/analytics.py --results-dir results/llama3-8b-g6e --model-id llama3-8b --payload-file payload_en_3000-3840.jsonl --latency-threshold 2
The results are saved in fmbench-orchestrator/analytics/results/llama3-8b-g6e/
on your orchestrator EC2 instance, including summarization of the results, a heatmap that helps understand which instance type gives the best price performance at the desired scale (transactions/minute), etc.
Below is one of the output tables about cost comparison.
The experiment configurations are specified in the config YML file, in the instances
section. FMbench Orchestrator will run each experiment in parallel, and then collect the results from each experiment onto the orchestrator EC2 instance. See configuration guide for details on the orchestrator config file.
See configs/ec2.yml
as an example for EC2 experiments. The instances
section has 2 experiments, one using g6e.2xlarge and the other using g6e.4xlarge.
instances:
- instance_type: g6e.2xlarge
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
- instance_type: g6e.4xlarge
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-llama3-8b-g6e.4xl-tp-1-mc-max-djl-ec2.yml
Note that the fmbench: lama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
and fmbench: llama3/8b/config-llama3-8b-g6e.4xl-tp-1-mc-max-djl-ec2.yml
files are default config files provided in the FMbench repo. FMbench orchestrator use these config to launch EC2 instance and deploy the experiments on the launched EC2 instance.
An example of using customized fmbench
config file is given in the Compare SageMaker against EC2 section below.
LLM can be hosted on an SageMaker endpoint. This experiment requires the SageMaker endpoint already deployed.
You first need to write a FMBench
config file for SageMaker. One option is to make a copy of config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
, and modify the values in the experiments
section, such as the endpoint_name
, instance_type
and model_id
. Then upload the edited config to your orchestrator EC2 instance.
The orchestrator config YML file should have the following:
instances:
- instance_type: m7a.xlarge # SageMaker experiment
<<: *ec2_settings
fmbench_config:
- PATH/TO/YOUR/edited_config.yml
- instance_type: g6e.2xlarge # EC2 experiment
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
See configs/bedrock.yml
as an example for Bedrock experiments.
instances:
- instance_type: m7a.xlarge # Bedrock experiment
<<: *ec2_settings
fmbench_config:
- fmbench:bedrock/config-bedrock-llama3-1.yml
- instance_type: m7a.xlarge # SageMaker experiment
<<: *ec2_settings
fmbench_config:
- ~/fmbench-orchestrator/configs/byoe/config-llama3-8b-inf2-48xl-tp=8-bs=4-byoe.yml
The FMBench
config file for Bedrock is fmbench:bedrock/config-bedrock-llama3-1.yml
. You can also customize this config and upload your .yml file to the orchestrator EC2 instance.
Please see ec2_custom_dataset.yml
for an example config file. The custom data is uploaded to the ~/fmbench-orchestrator/byo_dataset
folder on the orchestrator EC2 instance, specified in the upload_files
section.
instances:
- instance_type: g6e.2xlarge
<<: *ec2_settings
fmbench_config:
- /home/ubuntu/fmbench-orchestrator/byo_fmbench_configs/config-ec2-llama3-8b-g6e-2xlarge_eval.yml
upload_files:
- local: byo_dataset/custom.jsonl ## your custom dataset
remote: /tmp/fmbench-read/source_data/
- local: analytics/pricing.yml
remote: /tmp/fmbench-read/configs/
Please see the custom.jsonl file for data format example. Note that language needs to be set to 'en' to be compatible with the default config files.
FMBench-orchestrator supports for evaluating candidate models using Majority Voting with a Panel of LLM Evaluators (PoLL). Before running the experiment, please enable model access in Bedrock to the judge models: Llama3-70b, Cohere command-r-v1 and claude 3 Sonnet.
First, create a config file specifying accuracy measurement related info, such as ground_truth
, question_col_key
. You can copy config-llama3.1-8b-g5.2xl-g5.4xl-sm.yml as an example, and modify based on your experiment.
Here are the parameters to update in this config file:
run_steps:
0_setup.ipynb: yes
1_generate_data.ipynb: yes
2_deploy_model.ipynb: yes
3_run_inference.ipynb: yes
4_get_evaluations.ipynb: yes # Make sure to set this step to "yes".
5_model_metric_analysis.ipynb: yes
6_cleanup.ipynb: yes
datasets:
prompt_template_keys:
- input
- context
ground_truth_col_key: answers # The name of the answer field in your custom data
question_col_key: input # The name of the question field in your custom data
The instances
section has an upload_files
section for each instance where we can provide a list of local
files and remote
directory paths to place any custom file on an EC2 instance. This could be a tokenizer.json
file, a custom prompt file, or a custom dataset. The example below shows how to upload a custom pricing.yml
and a custom dataset to an EC2 instance.
instances:
- instance_type: g6e.2xlarge
<<: *ec2_settings
fmbench_config:
- fmbench:llama3/8b/config-ec2-llama3-8b-g6e-2xlarge.yml
upload_files:
- local: byo_dataset/custom.jsonl
remote: /tmp/fmbench-read/source_data/
- local: analytics/pricing.yml
remote: /tmp/fmbench-read/configs/
See ec2_llama3.2-1b-cpu-byodataset.yml
for an example config file. This file refers to the synthetic_data_large_prompts
and a custom prompt file prompt_template_llama3_summarization.txt
for a summarization task. You can edit the dataset file and the prompt template as per your requirements.
Often times we want to benchmark different combinations of parameters on the same EC2 instance, for example we may want to test tensor parallelism degree of 2, 4 and 8 for say Llama3.1-8b
model on the same EC2 machine say g6e.48xlarge
. Can do that easily with the orchestrator by specifying a list of config files rather than just a single config file as shown in the following example:
fmbench_config:
- fmbench:llama3.1/8b/config-llama3.1-8b-g6e.48xl-tp-2-mc-max-djl.yml
- fmbench:llama3.1/8b/config-llama3.1-8b-g6e.48xl-tp-4-mc-max-djl.yml
- fmbench:llama3.1/8b/config-llama3.1-8b-g6e.48xl-tp-8-mc-max-djl.yml
The orchestrator would in this case first run benchmarking for the first file in the list, and then on the same EC2 instance run benchmarking for the second file and so on and so forth. The results folders and fmbench.log
files for each of the runs is downloaded at the end when all config files for that instance have been processed.
Below is the conceptual architecture of the FMBench Orchestrator.
See CONTRIBUTING for more information.
This project is licensed under the MIT-0 License - see the LICENSE file for details.
Contributions are welcome! Please fork the repository and submit a pull request with your changes. For major changes, please open an issue first to discuss what you would like to change.