-
Notifications
You must be signed in to change notification settings - Fork 125
Open
Description
I don't understand why my benchmarking attempts are failing.
Model used:
apiVersion: kubeai.org/v1
kind: Model
metadata:
labels:
argocd.argoproj.io/instance: kubeai-models-prod
features.kubeai.org/TextGeneration: "true"
name: llama-3.3-70b
namespace: kubeai
spec:
args:
- --max-model-len=4092
- --max-num-batched-token=8192
- --gpu-memory-utilization=0.95
- --enforce-eager
- --disable-log-requests
- --max-num-seqs=16
- --quantization=bitsandbytes
- --load-format=bitsandbytes
engine: VLLM
features:
- TextGeneration
loadBalancing:
prefixHash:
meanLoadFactor: 125
prefixCharLength: 100
replication: 256
strategy: LeastLoad
maxReplicas: 2
minReplicas: 1
owner: ""
replicas: 1
resourceProfile: nvidia-gpu-l40s-SHARED-large:1
scaleDownDelaySeconds: 30
targetRequests: 100
url: hf://unsloth/Llama-3.3-70B-Instruct-bnb-4bit
The job that I used (which I sourced from your example repo):
apiVersion: batch/v1
kind: Job
metadata:
name: benchmark-serving
spec:
template:
spec:
containers:
- name: benchmark-serving
image: substratusai/benchmark_serving:v0.0.1
args:
- --base-url=http://kubeai/openai
- --dataset-name=sharegpt
- --dataset-path=/app/sharegpt_16_messages_or_more.json
- --model=llama-3.1-8b-instruct-fp8-l4
- --seed=12345
- --tokenizer=neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
- --request-rate=200
- --max-concurrency=1600
- --num-prompts=8000
- --max-conversations=800
restartPolicy: Never
I get the following output:
k -n kubeai logs jobs/benchmark-serving benchmark-serving
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Namespace(backend='vllm', base_url='http://kubeai/openai', host='localhost', port=8000, endpoint='/v1/completions', dataset=None, dataset_name='sharegpt', dataset_path='/app/sharegpt_16_messages_or_more.json', max_concurrency=800, model='llama-3.1-8b-instruct-fp8-l4', tokenizer='neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8', best_of=1, use_beam_search=False, num_prompts=8000, max_conversations=800, logprobs=None, request_rate=200.0, burstiness=1.0, seed=12345, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=1024, random_output_len=128, random_range_ratio=1.0, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, tokenizer_mode='auto', served_model_name=None, lora_modules=None)
Starting initial single prompt test run...
Traceback (most recent call last):
File "/app/benchmark_serving.py", line 1317, in <module>
main(args)
File "/app/benchmark_serving.py", line 943, in main
benchmark_result = asyncio.run(
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/app/benchmark_serving.py", line 617, in benchmark
raise ValueError(
ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Not Found
I've already tried to fiddle with different models in the job spec, different tokenizer values; I always get the same ValueError in the output. Any help would be much appreciated!
Metadata
Metadata
Assignees
Labels
No labels