Skip to content

Use hugging face as baseline to test CB output #240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 23, 2025

Conversation

sducouedic
Copy link
Collaborator

@sducouedic sducouedic commented Jun 18, 2025

Closes issue #168

@joerunde About the llama issue, maybe we can wait until and e2e image gets published with CB enabled before removing it from the list of model for cb? it won't be used in the meantime in the CI/CD anyway, and this allows to test it on cpu for now, what do you think?

Signed-off-by: Sophie du Couédic <[email protected]>
@sducouedic sducouedic marked this pull request as ready for review June 18, 2025 08:09
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@sducouedic sducouedic marked this pull request as draft June 18, 2025 08:10
…nerate_cb_spyre_vllm_output

Signed-off-by: Sophie du Couédic <[email protected]>
@sducouedic sducouedic marked this pull request as ready for review June 18, 2025 14:36
Signed-off-by: Sophie du Couédic <[email protected]>
@sducouedic sducouedic force-pushed the test_cb_using_hf_ref branch from 7390fb0 to fa7234e Compare June 19, 2025 13:22
Signed-off-by: Sophie du Couédic <[email protected]>
@joerunde
Copy link
Collaborator

About the llama issue, maybe we can wait until and e2e image gets published with CB enabled before removing it from the list of model for cb? it won't be used in the meantime in the CI/CD anyway, and this allows to test it on cpu for now, what do you think?

I think we can leave it in the list, it sounds like we should be fine to test llama models on cpu going forward. We can update the test configs over on the spyre hardware to swap to granite models

Copy link
Collaborator

@prashantgupta24 prashantgupta24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to request changes just so that we can pause more changes for CB. Right now there's a ton of uncertainty surrounding CB and we don't want to make any changes until we have more clarity

yannicks1 added a commit that referenced this pull request Jun 20, 2025
…#245)

Temporary hack until the parameter makes it to a new release version.
Needs to be merged first for the tests on the other PRs to pass.

(PS: this was actually the error after fixing the merge conflict in PR
#240, which had nothing to do with the conflict)

---------

Signed-off-by: Sophie du Couédic <[email protected]>
Co-authored-by: Yannick Schnider <[email protected]>
@sducouedic
Copy link
Collaborator Author

I'm going to request changes just so that we can pause more changes for CB. Right now there's a ton of uncertainty surrounding CB and we don't want to make any changes until we have more clarity

@prashantgupta24 mmh since the code change has already be done, and don't fundamentally change the logic of the tests (just a bit of refactoring and better output comparison), I would prefer still to merge that branch, instead of keeping it aside for a long time.

@prashantgupta24
Copy link
Collaborator

prashantgupta24 commented Jun 20, 2025

The changes that you have here are amazing! But there's still some uncertainty around the output that continuous batching is generating on the actual spyre hardware. Ideally, we want to get to the point where this PR works as is, But clearly both llama and granite models are not generating output which is gonna match what HF is producing at the moment on spyre.

Probably worth spending a bit of time understanding why and see if we want to push a "working" implementation which doesn't produce good results in the meantime (Which could be that we make sure that there is some output). In that case, we will not be able to merge this PR as is because the outputs will never match HF. That's the only reason I recommended a pause for a little while until the investigation gets us more insight?

@sducouedic
Copy link
Collaborator Author

@prashantgupta24 ah I think there is some confusion, the tests were not passing because of some breaking changes in vllm upstream (a new parameter introduced again), fixed in this PR #245. I just synced with main and all the tests are passing now (at least on the CPU)

@prashantgupta24
Copy link
Collaborator

@prashantgupta24 ah I think there is some confusion, the tests were not passing because of some breaking changes in vllm upstream (a new parameter introduced again), fixed in this PR #245. I just synced with main and all the tests are passing now (at least on the CPU)

I am talking about the tests failing on the actual spyre hardware for CB with the latest changes.

@sducouedic
Copy link
Collaborator Author

Ah ok I see. Fundamentally it doesn't really change to compare with HF output or a hardcoded ground truth. But yes let's freeze the code if that can simplify debugging for spyre 👍

Copy link
Collaborator

@yannicks1 yannicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prashantgupta24 prashantgupta24 enabled auto-merge (squash) June 23, 2025 19:48
@github-actions github-actions bot added the ready label Jun 23, 2025
Copy link
Collaborator

@prashantgupta24 prashantgupta24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We have a plan on how to handle testing with spyre

@prashantgupta24 prashantgupta24 merged commit 632a4aa into main Jun 23, 2025
18 checks passed
@prashantgupta24 prashantgupta24 deleted the test_cb_using_hf_ref branch June 23, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants