x

SumanthRH · SumanthRH · commit 296167a23035 · 2025-02-25T18:31:47.000-08:00
Signed-off-by: SumanthRH &lt;sumanthrh@anyscale.com&gt;
diff --git a/skythought/evals/README.md b/skythought/evals/README.md
@@ -43,12 +43,13 @@ skythought evaluate --model Qwen/QwQ-32B-Preview --task aime --backend ray --bac
 
 By default, we make use of the configuration in [ray_configs/ray_config.yaml](./ray_configs/ray_config.yaml). You can also customize the following parameters for ray: 
 
-- `tensor_parallel_size`: Tensor Parallel Size per replica. Defaults to 4.
--  `accelerator_type`: GPU accelerator type. For more information see the list of available types: https://docs.ray.io/en/latest/ray-core/accelerator-types.html. Defaults to None (uses any GPUs available in the ray cluster) 
-- `num_replicas`: Number of model replicas to use for inference. Defaults to 2. 
-- `batch_size`: Batch size per model replica for inference. 
-- `gpu_memory_utilization`: The fraction of GPU memory to be used for vLLM's model executor. Defaults to 0.9
-- `dtype`: Data type for inference. (Defaults to "auto")
+- `tensor_parallel_size`: Tensor parallel size per replica. Defaults to 4.  
+- `accelerator_type`: GPU accelerator type. See [the list of available types](https://docs.ray.io/en/latest/ray-core/accelerator-types.html) for more information. Defaults to None, which means any available GPUs in the Ray cluster will be used.  
+- `num_replicas`: Number of model replicas to use for inference. Defaults to 2.  
+- `batch_size`: Batch size per model replica for inference.  
+- `gpu_memory_utilization`: Fraction of GPU memory allocated to the model executor in vLLM. Defaults to 0.9.  
+- `dtype`: Data type used for inference. Defaults to "auto".
+
 
 ### Optimized settings for 32B and 7B models