-
Notifications
You must be signed in to change notification settings - Fork 18
Integrate upstream logits processors #290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
The changes introduced by PR vllm-project/vllm#16728 to the sampler architecture were incompatible with our spyre model runner. Initially, as a stopgap solution. I copied the old sampling classes into our vllm_spyre tree just so that we can keep working on the latest changes from main. Now this commit reverts that and makes the same logits processor logic work for the spyre input batch and model runner classes. The difference with the gpu model runner is that in spyre we don't condense the batch but have a boolean mask that is used to calculate "dense" request indices. These indices must be used for the BatchUpdateBuilder because they are the right ones to slice the `logits` tensor that is passed to the Sampler. Signed-off-by: Max de Bayser <[email protected]>
ca0c89b
to
052b28d
Compare
bot:test |
bot:test |
bot:test |
bot:test |
bot:test |
bot:test |
bot:test |
bot:test |
bot:test |
For reference: the test run 129 successfully executed all tests |
At first it wasn't obvious if it would be easy to integrate the changes of PR vllm-project/vllm#16728 so initially I added PR that copies the sampler files previous to that PR in vllm-spyre. But actually it's easier than I thought because the sampler code is not compiled to the AIU, only the model forward is.
Currently in the MinP processor there is a tensor for the cpu and for the device. Since only the model forward runs on the AIU, both tensors end up on the CPU, which means that there is an unnecessary copy from one to the other, but the result is still correct.
There is a future upstream PR that will generalize the Logits processor to other sampling parameters:
vllm-project/vllm#19912