-
Notifications
You must be signed in to change notification settings - Fork 18
Duplicate the SamplingMetadata class #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We were previously reusing the GPU SamplingMetadata class but there have been incompatible changes upstream (PR vllm-project/vllm#16728) Since it's not clear for now whether we want, should or can reuse the LogitsProcessor implementation as is, I'm making a copy of the old version of the class for the spyre backend. This won't affect any features for now since the vllm change was an internal refactoring without UX impact. Signed-off-by: Max de Bayser <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the preferred way to tackle these breaking changes upstream, then it LGTM.
vllm_spyre/v1/sample/metadata.py
Outdated
# This is a copy of the vLLM vllm file prior to PR | ||
# https://github.com/vllm-project/vllm/pull/16728 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate lines 3-4
I don't think that class is gpu-specific though, right? It's the sampling metadata used by all in-tree platforms supported by vllm |
No, for example, in the TPU it's a different class: vllm/v1/sample/tpu/metadata.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can rebase this to get merged to vllm-main-updates
instead of main
?
Sorry, I'm not sure I understand. These commits here are already in |
You're right, I said that just in case we wanted to merge any new changes |
Signed-off-by: Max de Bayser <[email protected]>
This reverts commit 785a5d5.
We were previously reusing the GPU SamplingMetadata class but there have been incompatible changes upstream (PR vllm-project/vllm#16728)
Since it's not clear for now whether we want, should or can reuse the LogitsProcessor implementation as is, I'm making temporarily making a copy of the old versions of the files that we need for the spyre backend.
This won't affect any features for now since the vllm change was an internal refactoring without UX impact.
Follow-up issue to give a definitive solution: https://github.ibm.com/ai-foundation/aiu-app-sw-tracker/issues/804