commit

SumanthRH · SumanthRH · commit dbc89eb93043 · 2025-02-25T18:27:33.000-08:00
Signed-off-by: SumanthRH &lt;sumanthrh@anyscale.com&gt;
diff --git a/skythought/evals/README.md b/skythought/evals/README.md
@@ -43,6 +43,12 @@ skythought evaluate --model Qwen/QwQ-32B-Preview --task aime --backend ray --bac
 
 By default, we make use of the configuration in [ray_configs/ray_config.yaml](./ray_configs/ray_config.yaml). You can also customize the following parameters for ray: 
 
+- `tensor_parallel_size`: Tensor Parallel Size per replica. Defaults to 4.
+-  `accelerator_type`: GPU accelerator type. For more information see the list of available types: https://docs.ray.io/en/latest/ray-core/accelerator-types.html. Defaults to None (uses any GPUs available in the ray cluster) 
+- `num_replicas`: Number of model replicas to use for inference. Defaults to 2. 
+- `batch_size`: Batch size per model replica for inference. 
+- `gpu_memory_utilization`: The fraction of GPU memory to be used for vLLM's model executor. Defaults to 0.9
+- `dtype`: Data type for inference. (Defaults to "auto")
 
 ### Optimized settings for 32B and 7B models
 
@@ -54,7 +60,7 @@ For 32B models, we recommend using the default backend configuration for best pe
 skythought evaluate --model Qwen/QwQ-32B-Preview --task aime24 --backend ray --result-dir ./
 ```
 
-For 7B models, we recommend using `tensor_parallel_size=1` and `num_replicas=8` for best performance. FOr example, the previous command will change to:
+For 7B models, we recommend using `tensor_parallel_size=1` and `num_replicas=8` for best performance. For example, the previous command will change to:
 
 ```shell
 skythought evaluate --model Qwen/Qwen2-7B-Instruct --task math500 --backend ray --backend-args tensor_parallel_size=1,num_replicas=8 --result-dir ./
diff --git a/skythought/evals/ray_configs/ray_config.yaml b/skythought/evals/ray_configs/ray_config.yaml
@@ -1,5 +1,5 @@
 llm_engine: vllm # currently only vllm supported
-accelerator_type: H100  # accelerator name as specified here: https://docs.ray.io/en/master/ray-core/accelerator-types.html#accelerator-types
+accelerator_type: null  # accelerator name as specified here: https://docs.ray.io/en/master/ray-core/accelerator-types.html#accelerator-types
 engine_kwargs: # vllm engine kwargs 
   tensor_parallel_size: 4
   gpu_memory_utilization: 0.9