Skip to content

Commit 0e1e383

Browse files
committed
Fix deprecated max_context_len_to_capture engine argument
1 parent c8458fe commit 0e1e383

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ Below is a summary of the available RunPod Worker images, categorized by image s
115115
| `BLOCK_SIZE` | `16` | `8`, `16`, `32` |Token block size for contiguous chunks of tokens. |
116116
| `SWAP_SPACE` | `4` | `int` |CPU swap space size (GiB) per GPU. |
117117
| `ENFORCE_EAGER` | `0` | boolean as `int` |Always use eager-mode PyTorch. If False(`0`), will use eager mode and CUDA graph in hybrid for maximal performance and flexibility. |
118-
| `MAX_CONTEXT_LEN_TO_CAPTURE` | `8192` | `int` |Maximum context length covered by CUDA graphs. When a sequence has context length larger than this, we fall back to eager mode.|
118+
| `MAX_SEQ_LEN_TO_CAPTURE` | `8192` | `int` |Maximum context length covered by CUDA graphs. When a sequence has context length larger than this, we fall back to eager mode.|
119119
| `DISABLE_CUSTOM_ALL_REDUCE` | `0` | `int` |Enables or disables custom all reduce. |
120120
**Streaming Batch Size Settings**:
121121
| `DEFAULT_BATCH_SIZE` | `50` | `int` |Default and Maximum batch size for token streaming to reduce HTTP calls. |

src/config.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,16 @@ def _initialize_config(self):
4747
"kv_cache_dtype": os.getenv("KV_CACHE_DTYPE"),
4848
"block_size": int(os.getenv("BLOCK_SIZE")) if os.getenv("BLOCK_SIZE") else None,
4949
"swap_space": int(os.getenv("SWAP_SPACE")) if os.getenv("SWAP_SPACE") else None,
50-
"max_context_len_to_capture": int(os.getenv("MAX_CONTEXT_LEN_TO_CAPTURE")) if os.getenv("MAX_CONTEXT_LEN_TO_CAPTURE") else None,
50+
"max_seq_len_to_capture": int(os.getenv("MAX_SEQ_LEN_TO_CAPTURE")) if os.getenv("MAX_SEQ_LEN_TO_CAPTURE") else None,
5151
"disable_custom_all_reduce": get_int_bool_env("DISABLE_CUSTOM_ALL_REDUCE", False),
5252
"enforce_eager": get_int_bool_env("ENFORCE_EAGER", False)
5353
}
5454
if args["kv_cache_dtype"] == "fp8_e5m2":
5555
args["kv_cache_dtype"] = "fp8"
5656
logging.warning("Using fp8_e5m2 is deprecated. Please use fp8 instead.")
57+
if os.getenv("MAX_CONTEXT_LEN_TO_CAPTURE"):
58+
args["max_seq_len_to_capture"] = int(os.getenv("MAX_CONTEXT_LEN_TO_CAPTURE"))
59+
logging.warning("Using MAX_CONTEXT_LEN_TO_CAPTURE is deprecated. Please use MAX_SEQ_LEN_TO_CAPTURE instead.")
60+
61+
5762
return {k: v for k, v in args.items() if v not in [None, ""]}

0 commit comments

Comments
 (0)