Skip to content

Integrate upstream logits processors #290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 14, 2025
Merged

Integrate upstream logits processors #290

merged 4 commits into from
Jul 14, 2025

Conversation

maxdebayser
Copy link
Collaborator

@maxdebayser maxdebayser commented Jul 8, 2025

At first it wasn't obvious if it would be easy to integrate the changes of PR vllm-project/vllm#16728 so initially I added PR that copies the sampler files previous to that PR in vllm-spyre. But actually it's easier than I thought because the sampler code is not compiled to the AIU, only the model forward is.

Currently in the MinP processor there is a tensor for the cpu and for the device. Since only the model forward runs on the AIU, both tensors end up on the CPU, which means that there is an unnecessary copy from one to the other, but the result is still correct.

There is a future upstream PR that will generalize the Logits processor to other sampling parameters:

vllm-project/vllm#19912

Copy link

github-actions bot commented Jul 8, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

The changes introduced by PR
vllm-project/vllm#16728
to the sampler architecture were incompatible with
our spyre model runner.

Initially, as a stopgap solution. I copied the old sampling
classes into our vllm_spyre tree just so that we can keep
working on the latest changes from main.

Now this commit reverts that and makes the same logits
processor logic work for the spyre input batch and model
runner classes.  The difference with the gpu model runner is
that in spyre we don't condense the batch but have a boolean
mask that is used to calculate "dense" request indices. These
indices must be used for the BatchUpdateBuilder because they
are the right ones to slice the `logits` tensor that is passed
to the Sampler.

Signed-off-by: Max de Bayser <[email protected]>
@joerunde
Copy link
Collaborator

joerunde commented Jul 9, 2025

bot:test

@prashantgupta24
Copy link
Collaborator

bot:test
MARKERS="cb and spyre"

@maxdebayser
Copy link
Collaborator Author

@joerunde, I've opened an issue for the follow-up PR that will add tests: #295

@prashantgupta24
Copy link
Collaborator

bot:test
MARKERS="cb and spyre"

@prashantgupta24
Copy link
Collaborator

bot:test

@prashantgupta24
Copy link
Collaborator

bot:test
MARKERS="cb and spyre"

@prashantgupta24
Copy link
Collaborator

bot:test

@joerunde
Copy link
Collaborator

joerunde commented Jul 9, 2025

bot:test

@prashantgupta24
Copy link
Collaborator

bot:test
MARKERS="cb and spyre"

@maxdebayser
Copy link
Collaborator Author

bot:test

@joerunde joerunde merged commit 5975e98 into main Jul 14, 2025
14 of 18 checks passed
@joerunde joerunde deleted the logits_processors branch July 14, 2025 20:05
@maxdebayser
Copy link
Collaborator Author

maxdebayser commented Jul 14, 2025

For reference: the test run 129 successfully executed all tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants