Skip to content

Commit 296167a

Browse files
committed
x
Signed-off-by: SumanthRH <[email protected]>
1 parent dbc89eb commit 296167a

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

skythought/evals/README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,13 @@ skythought evaluate --model Qwen/QwQ-32B-Preview --task aime --backend ray --bac
4343

4444
By default, we make use of the configuration in [ray_configs/ray_config.yaml](./ray_configs/ray_config.yaml). You can also customize the following parameters for ray:
4545

46-
- `tensor_parallel_size`: Tensor Parallel Size per replica. Defaults to 4.
47-
- `accelerator_type`: GPU accelerator type. For more information see the list of available types: https://docs.ray.io/en/latest/ray-core/accelerator-types.html. Defaults to None (uses any GPUs available in the ray cluster)
48-
- `num_replicas`: Number of model replicas to use for inference. Defaults to 2.
49-
- `batch_size`: Batch size per model replica for inference.
50-
- `gpu_memory_utilization`: The fraction of GPU memory to be used for vLLM's model executor. Defaults to 0.9
51-
- `dtype`: Data type for inference. (Defaults to "auto")
46+
- `tensor_parallel_size`: Tensor parallel size per replica. Defaults to 4.
47+
- `accelerator_type`: GPU accelerator type. See [the list of available types](https://docs.ray.io/en/latest/ray-core/accelerator-types.html) for more information. Defaults to None, which means any available GPUs in the Ray cluster will be used.
48+
- `num_replicas`: Number of model replicas to use for inference. Defaults to 2.
49+
- `batch_size`: Batch size per model replica for inference.
50+
- `gpu_memory_utilization`: Fraction of GPU memory allocated to the model executor in vLLM. Defaults to 0.9.
51+
- `dtype`: Data type used for inference. Defaults to "auto".
52+
5253

5354
### Optimized settings for 32B and 7B models
5455

0 commit comments

Comments
 (0)