You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -115,7 +115,7 @@ Below is a summary of the available RunPod Worker images, categorized by image s
115
115
|`BLOCK_SIZE`|`16`|`8`, `16`, `32`|Token block size for contiguous chunks of tokens. |
116
116
|`SWAP_SPACE`|`4`|`int`|CPU swap space size (GiB) per GPU. |
117
117
|`ENFORCE_EAGER`|`0`| boolean as `int`|Always use eager-mode PyTorch. If False(`0`), will use eager mode and CUDA graph in hybrid for maximal performance and flexibility. |
118
-
|`MAX_CONTEXT_LEN_TO_CAPTURE`|`8192`|`int`|Maximum context length covered by CUDA graphs. When a sequence has context length larger than this, we fall back to eager mode.|
118
+
|`MAX_SEQ_LEN_TO_CAPTURE`|`8192`|`int`|Maximum context length covered by CUDA graphs. When a sequence has context length larger than this, we fall back to eager mode.|
119
119
|`DISABLE_CUSTOM_ALL_REDUCE`|`0`|`int`|Enables or disables custom all reduce. |
120
120
**Streaming Batch Size Settings**:
121
121
|`DEFAULT_BATCH_SIZE`|`50`|`int`|Default and Maximum batch size for token streaming to reduce HTTP calls. |
0 commit comments