-
Notifications
You must be signed in to change notification settings - Fork 18
[CB] use used block ids for dummy batch size 2 #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
Status: To be tested on AIU Spyre |
bot:test |
bot:test |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
bot:test |
1 similar comment
bot:test |
Good news, I spun up a pod and tested this, and it actually works 🎉 ready to review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we can add a test for this somehow 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a test where we only send 1 request and assert a bunch of stuff? 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarinho2 had something in his PR here - https://github.com/vllm-project/vllm-spyre/pull/252/files but he was assuming we used dummy_req_ids2blocks
which we don't anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about an assert on the shape dimension that corresponds to the batch size for the decode path? might be an overkill to write test for this. Plus an assert is saver...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding only 1 request and asserting scheduler's tkv could be a good start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be sufficient. Reluctant to add even more tests as they are already taking long and this can be easily asserted in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I think the base tests already cover this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I worded it incorrectly, yeah we already have test cases that cover the scenario, what I meant was we could add asserts in the tests themselves to cover this specific PR scenario where len(cached_requests) == 1
. On the other hand, since we only have access to the model_runner
obj during testing, the only thing we could have asserted was model_runner.model.indices.size
which is already covered by the assert in the code, so I guess it's okay 🤷
Signed-off-by: Yannick Schnider <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bot:test |
ran the only failing test manually on the card and it succeeded. Thus merging this PR! |
[CB] use used block ids for dummy batch size 2
solves #258