Benchmarking script always raises valueError

I don't understand why my benchmarking attempts are failing.

Model used:

```
apiVersion: kubeai.org/v1
kind: Model
metadata:
  labels:
    argocd.argoproj.io/instance: kubeai-models-prod
    features.kubeai.org/TextGeneration: "true"
  name: llama-3.3-70b
  namespace: kubeai
spec:
  args:
  - --max-model-len=4092
  - --max-num-batched-token=8192
  - --gpu-memory-utilization=0.95
  - --enforce-eager
  - --disable-log-requests
  - --max-num-seqs=16
  - --quantization=bitsandbytes
  - --load-format=bitsandbytes
  engine: VLLM
  features:
  - TextGeneration
  loadBalancing:
    prefixHash:
      meanLoadFactor: 125
      prefixCharLength: 100
      replication: 256
    strategy: LeastLoad
  maxReplicas: 2
  minReplicas: 1
  owner: ""
  replicas: 1
  resourceProfile: nvidia-gpu-l40s-SHARED-large:1
  scaleDownDelaySeconds: 30
  targetRequests: 100
  url: hf://unsloth/Llama-3.3-70B-Instruct-bnb-4bit
```

The job that I used (which I sourced from your example repo):

```
apiVersion: batch/v1
kind: Job
metadata:
  name: benchmark-serving
spec:
  template:
    spec:
      containers:
        - name: benchmark-serving
          image: substratusai/benchmark_serving:v0.0.1
          args:
            - --base-url=http://kubeai/openai
            - --dataset-name=sharegpt
            - --dataset-path=/app/sharegpt_16_messages_or_more.json
            - --model=llama-3.1-8b-instruct-fp8-l4
            - --seed=12345
            - --tokenizer=neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
            - --request-rate=200
            - --max-concurrency=1600
            - --num-prompts=8000
            - --max-conversations=800
      restartPolicy: Never
```

I get the following output:

```
k -n kubeai logs jobs/benchmark-serving benchmark-serving
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Namespace(backend='vllm', base_url='http://kubeai/openai', host='localhost', port=8000, endpoint='/v1/completions', dataset=None, dataset_name='sharegpt', dataset_path='/app/sharegpt_16_messages_or_more.json', max_concurrency=800, model='llama-3.1-8b-instruct-fp8-l4', tokenizer='neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8', best_of=1, use_beam_search=False, num_prompts=8000, max_conversations=800, logprobs=None, request_rate=200.0, burstiness=1.0, seed=12345, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=1024, random_output_len=128, random_range_ratio=1.0, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, tokenizer_mode='auto', served_model_name=None, lora_modules=None)
Starting initial single prompt test run...
Traceback (most recent call last):
  File "/app/benchmark_serving.py", line 1317, in <module>
    main(args)
  File "/app/benchmark_serving.py", line 943, in main
    benchmark_result = asyncio.run(
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/app/benchmark_serving.py", line 617, in benchmark
    raise ValueError(
ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Not Found
```

I've already tried to fiddle with different models in the job spec, different tokenizer values; I always get the same ValueError in the output. Any help would be much appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking script always raises valueError #529

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmarking script always raises valueError #529

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions