Skip to content

✨ Use shared CachedRequestData as vllm:main #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

prashantgupta24
Copy link
Collaborator

@prashantgupta24 prashantgupta24 commented Jul 1, 2025

Description

This is more complicated than I thought originally :)

Alright, all tests are passing locally. There seems to be another breaking change in vllm:main that will have to be addressed to make the main tests pass. The default tests fail because this code is not backward compatible yet 😅

Related Issues

fix #271

Signed-off-by: Prashant Gupta <[email protected]>
Copy link

github-actions bot commented Jul 1, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
@prashantgupta24 prashantgupta24 changed the title 🐛 req_ids is now a list in vllm:main 🐛 Use shared CachedRequestData as vllm:main Jul 1, 2025
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
@prashantgupta24 prashantgupta24 changed the title 🐛 Use shared CachedRequestData as vllm:main ✨ Use shared CachedRequestData as vllm:main Jul 2, 2025
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
@maxdebayser
Copy link
Collaborator

@prashantgupta24 , I think the other breaking change that you mentioned is the sampling metadata one, right? I've opened a hacky PR to temporarily fix this: #278

@prashantgupta24
Copy link
Collaborator Author

@prashantgupta24 , I think the other breaking change that you mentioned is the sampling metadata one, right? I've opened a hacky PR to temporarily fix this: #278

Yep, thanks!

Signed-off-by: Prashant Gupta <[email protected]>
@prashantgupta24 prashantgupta24 mentioned this pull request Jul 4, 2025
@prashantgupta24
Copy link
Collaborator Author

closing in favor of #283

joerunde pushed a commit that referenced this pull request Jul 8, 2025
# Description

This branch has a fix for:
- Caching the token_ids (now the new tokens are cached in
`execute_model` instead of `update_states`. This is because of
vllm-project/vllm#20291. )
- Changes from the `CachedRequestData`
(#273)

## Related Issues

Fix for #271

---------

Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Co-authored-by: Max de Bayser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix plugin code agains vLLM main
2 participants