Skip to content

[CB] use used block ids for dummy batch size 2 #259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 1, 2025

Conversation

yannicks1
Copy link
Collaborator

[CB] use used block ids for dummy batch size 2

solves #258

Signed-off-by: Yannick Schnider <[email protected]>
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@yannicks1 yannicks1 self-assigned this Jun 24, 2025
@yannicks1
Copy link
Collaborator Author

Status: To be tested on AIU Spyre

@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre" VLLM_SPYRE_TEST_MODEL_LIST="tiny-granite-3.2-8b"

@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre" VLLM_SPYRE_TEST_MODEL_LIST='tiny-granite-3.2-8b'

@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

1 similar comment
@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1
Copy link
Collaborator Author

Good news, I spun up a pod and tested this, and it actually works 🎉
ran test_spyre_cb.py and cb_spyre_inference.py and validated with aiu-smi that we actually run on Spyre.

ready to review!

@yannicks1 yannicks1 marked this pull request as ready for review June 30, 2025 16:18
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can add a test for this somehow 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a test where we only send 1 request and assert a bunch of stuff? 🤷

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarinho2 had something in his PR here - https://github.com/vllm-project/vllm-spyre/pull/252/files but he was assuming we used dummy_req_ids2blocks which we don't anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about an assert on the shape dimension that corresponds to the batch size for the decode path? might be an overkill to write test for this. Plus an assert is saver...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding only 1 request and asserting scheduler's tkv could be a good start?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be sufficient. Reluctant to add even more tests as they are already taking long and this can be easily asserted in the code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I think the base tests already cover this case.

Copy link
Collaborator

@prashantgupta24 prashantgupta24 Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I worded it incorrectly, yeah we already have test cases that cover the scenario, what I meant was we could add asserts in the tests themselves to cover this specific PR scenario where len(cached_requests) == 1. On the other hand, since we only have access to the model_runner obj during testing, the only thing we could have asserted was model_runner.model.indices.size which is already covered by the assert in the code, so I guess it's okay 🤷

Copy link
Collaborator

@wallashss wallashss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1
Copy link
Collaborator Author

ran the only failing test manually on the card and it succeeded. Thus merging this PR!

@yannicks1 yannicks1 merged commit b324b42 into main Jul 1, 2025
16 of 18 checks passed
@yannicks1 yannicks1 deleted the ysc-remove-dummy-sequence branch July 1, 2025 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants