[CB] Scheduling constraints regarding number of available blocks/pages #261

yannicks1 · 2025-06-24T10:49:47Z

[CB] Scheduling constraints regarding number of available blocks/pages

changes:

moved hard coded BLOCK_SIZE (64) variable to Platform class and import it where needed instead of defining it in multiple different places.
introduced scheduler constraints regarding number of available blocks/pages in can_schedule() (need to keep track of the reserved block ids per request in model_runner.reserved_blocks)
wrote unit test for new scheduler constraint asserting n_reserved_blocks and n_used_blocks
introduce env variable VLLM_SPYRE_N_BLOCKS to set the number of available blocks (needs to be done during initialzation of all classes, if someone knows a better way, please tell me) for the unit tests.
using math.ceil(n / d) instead of ((n + d - 1) // d) for better readability
renaming model_runner.free_pages to model_runner.block_pool (as it is called in upstream vLLM)

closes #260

Signed-off-by: Yannick Schnider <[email protected]>

github-actions · 2025-06-24T10:49:54Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Yannick Schnider <[email protected]>

prashantgupta24 · 2025-06-26T16:34:38Z

tests/e2e/test_spyre_cb.py

+        },
+        {
+            # Prefill sequence 0
+            # total blocks in use: 1


FYI

scheduler.n_free_blocks

should give us the number of free blocks at each step and should be assertable :)

We can technically get the exact blocks used by:

engine_core.model_executor.driver_worker.worker.model_runner.free_blocks

but that might be an overkill here?

no, I think this is a nice additional check! Will add this

done: e5697bc

Signed-off-by: Yannick Schnider <[email protected]>

sducouedic · 2025-07-08T20:35:19Z

LGTM

vllm_spyre/model_executor/model_loader/spyre.py

vllm_spyre/v1/worker/spyre_model_runner.py

tests/e2e/test_spyre_cb.py

sducouedic · 2025-07-08T21:16:03Z

tests/e2e/test_spyre_cb.py

+                max_requested_blocks[req_id] = len(req_ids2blocks[req_id])
+                max_reserved_blocks[req_id] = reserved_blocks[req_id]


I am confused why these two variables are prefixed with 'max'

good point, was a relict of the past. I just updated the variable names!

Signed-off-by: Yannick Schnider <[email protected]>

joerunde · 2025-07-09T16:41:23Z

vllm_spyre/envs.py

@@ -74,6 +75,11 @@ def _backend_backwards_compat() -> str:
    "VLLM_SPYRE_USE_CB":
    lambda: bool(int(os.getenv("VLLM_SPYRE_USE_CB", "0"))),

+    # If set, use the V1 continuous batching implementation. Otherwise, static
+    # batching mode will be enabled.


I don't think this comment is correct? From the code it looks like this is an override for the number of kv cache blocks which is used in place of the simple max_model_len * max_num_seqs calculation.

I'm assuming this is for passing in the known good values for the available blocks given a tested model and card combo. Can we also consider using the existing kv cache override instead of setting up a new one? There's --num-gpu-blocks-override which will set the available blocks in the scheduler config

yes, the comment is from copy pasting the code from VLLM_SPYRE_USE_CB 😆
good pointer with --num-gpu-blocks-override, will look into that!

joerunde · 2025-07-09T17:14:09Z

vllm_spyre/v1/worker/spyre_worker.py

+        # overwrite n_blocks_avail for testing scheduler constraints
+        if envs_spyre.VLLM_SPYRE_N_BLOCKS > 0:
+            n_blocks_avail = envs_spyre.VLLM_SPYRE_N_BLOCKS
+        model_runner._set_blocks(num_blocks=n_blocks_avail)


What I'm seeing here is that:

self._get_num_blocks_available() only uses info from self.model_runner

The resulting n_blocks_avail is only used to modify the model runner, and the model runner's model

So I think that _get_num_blocks_available needs to move to the model runner class, and it should be responsible for finalizing itself after the warmup is complete

although this was not part of this PR, you are absolutely right and I moved the function 😇

Ah what I meant here is that the model runner should encapsulate this entirely. This whole block can be replaced with model_runner.finish_warmup(), and there should be no access of the model runner's private methods or direct access of the model here in the worker

Signed-off-by: Yannick Schnider <[email protected]>

joerunde · 2025-07-09T18:57:41Z

vllm_spyre/v1/worker/spyre_model_runner.py

+        """Function returns the number of available blocks/pages.
+        Will eventually contain a function in torch_sendnn which reads 
+        the actual value provided by the compiler for backend sendnn"""
+


can we consolidate some of the if envs_spyre.VLLM_SPYRE_N_BLOCKS > 0: checks into here and return the override from this method?

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 added 2 commits June 23, 2025 15:50

check if enough blocks in scheduler

0404c17

Signed-off-by: Yannick Schnider <[email protected]>

read block size from platform class

216ac1e

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 mentioned this pull request Jun 24, 2025

[CB] Scheduling constraints regarding number of available blocks/pages #260

Closed

yannicks1 self-assigned this Jun 24, 2025

yannicks1 added 4 commits June 25, 2025 11:24

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

a89be0c

env var to overwrite number of blocks available

2943614

Signed-off-by: Yannick Schnider <[email protected]>

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

e5e1582

adding CB scheduler test against number of free blocks

5a025e3

Signed-off-by: Yannick Schnider <[email protected]>

prashantgupta24 reviewed Jun 26, 2025

View reviewed changes

yannicks1 added 9 commits June 27, 2025 22:17

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

ce5fb6d

Signed-off-by: Yannick Schnider <[email protected]>

block size lower case, adapting to model runner

ea27bd5

Signed-off-by: Yannick Schnider <[email protected]>

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

2001b95

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

6b2bc26

asserting the number of free pages.

e5697bc

Signed-off-by: Yannick Schnider <[email protected]>

scheduler respecting reserved blocks with unit tests

13c43c6

Signed-off-by: Yannick Schnider <[email protected]>

also check number of used blocks

dfacc13

Signed-off-by: Yannick Schnider <[email protected]>

use math.ceil

ff0c1a8

Signed-off-by: Yannick Schnider <[email protected]>

fix isorting imports

e0d31a4

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 marked this pull request as ready for review July 4, 2025 14:15

yannicks1 requested review from rafvasq, sducouedic, tdoublep and nikolaospapandreou as code owners July 4, 2025 14:15

yannicks1 requested review from joerunde, prashantgupta24 and wallashss July 4, 2025 14:24

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

133335b

Signed-off-by: Yannick Schnider <[email protected]>

sducouedic approved these changes Jul 8, 2025

View reviewed changes

vllm_spyre/model_executor/model_loader/spyre.py Outdated Show resolved Hide resolved

vllm_spyre/v1/worker/spyre_model_runner.py Show resolved Hide resolved

vllm_spyre/v1/worker/spyre_model_runner.py Show resolved Hide resolved

tests/e2e/test_spyre_cb.py Show resolved Hide resolved

sducouedic reviewed Jul 8, 2025

View reviewed changes

yannicks1 added 2 commits July 9, 2025 11:49

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

8d228cf

Signed-off-by: Yannick Schnider <[email protected]>

renaming variables

2dd5f9e

Signed-off-by: Yannick Schnider <[email protected]>

joerunde reviewed Jul 9, 2025

View reviewed changes

yannicks1 added 4 commits July 9, 2025 18:09

fix test failing due to variable name change

09f4c3f

Signed-off-by: Yannick Schnider <[email protected]>

fix comment env var

27151cc

Signed-off-by: Yannick Schnider <[email protected]>

move _get_num_blocks_available to model runner

d05132d

Signed-off-by: Yannick Schnider <[email protected]>

add logger info when overriding the number of blocks available

281598f

Signed-off-by: Yannick Schnider <[email protected]>

joerunde reviewed Jul 9, 2025

View reviewed changes

yannicks1 added 2 commits July 15, 2025 08:34

Merge branch 'main' into ysc-max_blocks-scheduler-constraint

aaa9584

consolidating functionality in finish_warmup funtion in model runner

2f4487c

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 merged commit 009c7a5 into main Jul 15, 2025
17 of 18 checks passed

yannicks1 deleted the ysc-max_blocks-scheduler-constraint branch July 15, 2025 15:39

		max_requested_blocks[req_id] = len(req_ids2blocks[req_id])
		max_reserved_blocks[req_id] = reserved_blocks[req_id]

[CB] Scheduling constraints regarding number of available blocks/pages #261

[CB] Scheduling constraints regarding number of available blocks/pages #261

Uh oh!

Conversation

yannicks1 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!