Use hugging face as baseline to test CB output #240

sducouedic · 2025-06-18T08:09:39Z

Closes issue #168

@joerunde About the llama issue, maybe we can wait until and e2e image gets published with CB enabled before removing it from the list of model for cb? it won't be used in the meantime in the CI/CD anyway, and this allows to test it on cpu for now, what do you think?

Signed-off-by: Sophie du Couédic <[email protected]>

github-actions · 2025-06-18T08:09:47Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

…nerate_cb_spyre_vllm_output Signed-off-by: Sophie du Couédic <[email protected]>

Signed-off-by: Sophie du Couédic <[email protected]>

tests/e2e/test_spyre_basic.py

Signed-off-by: Sophie du Couédic <[email protected]>

joerunde · 2025-06-19T21:08:08Z

About the llama issue, maybe we can wait until and e2e image gets published with CB enabled before removing it from the list of model for cb? it won't be used in the meantime in the CI/CD anyway, and this allows to test it on cpu for now, what do you think?

I think we can leave it in the list, it sounds like we should be fine to test llama models on cpu going forward. We can update the test configs over on the spyre hardware to swap to granite models

prashantgupta24

I'm going to request changes just so that we can pause more changes for CB. Right now there's a ton of uncertainty surrounding CB and we don't want to make any changes until we have more clarity

…#245) Temporary hack until the parameter makes it to a new release version. Needs to be merged first for the tests on the other PRs to pass. (PS: this was actually the error after fixing the merge conflict in PR #240, which had nothing to do with the conflict) --------- Signed-off-by: Sophie du Couédic <[email protected]> Co-authored-by: Yannick Schnider <[email protected]>

sducouedic · 2025-06-20T11:54:14Z

I'm going to request changes just so that we can pause more changes for CB. Right now there's a ton of uncertainty surrounding CB and we don't want to make any changes until we have more clarity

@prashantgupta24 mmh since the code change has already be done, and don't fundamentally change the logic of the tests (just a bit of refactoring and better output comparison), I would prefer still to merge that branch, instead of keeping it aside for a long time.

prashantgupta24 · 2025-06-20T16:05:09Z

The changes that you have here are amazing! But there's still some uncertainty around the output that continuous batching is generating on the actual spyre hardware. Ideally, we want to get to the point where this PR works as is, But clearly both llama and granite models are not generating output which is gonna match what HF is producing at the moment on spyre.

Probably worth spending a bit of time understanding why and see if we want to push a "working" implementation which doesn't produce good results in the meantime (Which could be that we make sure that there is some output). In that case, we will not be able to merge this PR as is because the outputs will never match HF. That's the only reason I recommended a pause for a little while until the investigation gets us more insight?

sducouedic · 2025-06-20T19:22:59Z

@prashantgupta24 ah I think there is some confusion, the tests were not passing because of some breaking changes in vllm upstream (a new parameter introduced again), fixed in this PR #245. I just synced with main and all the tests are passing now (at least on the CPU)

prashantgupta24 · 2025-06-20T19:28:44Z

@prashantgupta24 ah I think there is some confusion, the tests were not passing because of some breaking changes in vllm upstream (a new parameter introduced again), fixed in this PR #245. I just synced with main and all the tests are passing now (at least on the CPU)

I am talking about the tests failing on the actual spyre hardware for CB with the latest changes.

sducouedic · 2025-06-20T19:32:52Z

Ah ok I see. Fundamentally it doesn't really change to compare with HF output or a hardcoded ground truth. But yes let's freeze the code if that can simplify debugging for spyre 👍

yannicks1

LGTM

prashantgupta24

LGTM! We have a plan on how to handle testing with spyre

use hf to test cb

e7c35f6

Signed-off-by: Sophie du Couédic <[email protected]>

sducouedic marked this pull request as ready for review June 18, 2025 08:09

sducouedic requested review from rafvasq and prashantgupta24 as code owners June 18, 2025 08:09

sducouedic marked this pull request as draft June 18, 2025 08:10

make generate_spyre_vllm_output more generic and use it instead of ge…

8caf79a

…nerate_cb_spyre_vllm_output Signed-off-by: Sophie du Couédic <[email protected]>

sducouedic marked this pull request as ready for review June 18, 2025 14:36

fix conflicts with main

af7f3c7

Signed-off-by: Sophie du Couédic <[email protected]>

prashantgupta24 reviewed Jun 18, 2025

View reviewed changes

tests/e2e/test_spyre_basic.py Show resolved Hide resolved

sducouedic force-pushed the test_cb_using_hf_ref branch from 7390fb0 to fa7234e Compare June 19, 2025 13:22

fix conflicts

4a86784

Signed-off-by: Sophie du Couédic <[email protected]>

sducouedic force-pushed the test_cb_using_hf_ref branch from fa7234e to 4a86784 Compare June 19, 2025 13:30

sducouedic mentioned this pull request Jun 19, 2025

[Priority merge] NewRequestData parameter introduced in vllm upstream #245

Merged

prashantgupta24 requested changes Jun 19, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into test_cb_using_hf_ref

2f3c469

yannicks1 approved these changes Jun 23, 2025

View reviewed changes

prashantgupta24 enabled auto-merge (squash) June 23, 2025 19:48

github-actions bot added the ready label Jun 23, 2025

prashantgupta24 approved these changes Jun 23, 2025

View reviewed changes

prashantgupta24 merged commit 632a4aa into main Jun 23, 2025
18 checks passed

prashantgupta24 deleted the test_cb_using_hf_ref branch June 23, 2025 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use hugging face as baseline to test CB output #240

Use hugging face as baseline to test CB output #240

Uh oh!

sducouedic commented Jun 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

Uh oh!

joerunde commented Jun 19, 2025

Uh oh!

prashantgupta24 left a comment

Uh oh!

sducouedic commented Jun 20, 2025

Uh oh!

prashantgupta24 commented Jun 20, 2025 •

edited

Loading

Uh oh!

sducouedic commented Jun 20, 2025

Uh oh!

prashantgupta24 commented Jun 20, 2025

Uh oh!

sducouedic commented Jun 20, 2025

Uh oh!

yannicks1 left a comment

Uh oh!

prashantgupta24 left a comment

Uh oh!

Uh oh!

Uh oh!

Use hugging face as baseline to test CB output #240

Use hugging face as baseline to test CB output #240

Uh oh!

Conversation

sducouedic commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 18, 2025

Uh oh!

Uh oh!

joerunde commented Jun 19, 2025

Uh oh!

prashantgupta24 left a comment

Choose a reason for hiding this comment

Uh oh!

sducouedic commented Jun 20, 2025

Uh oh!

prashantgupta24 commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sducouedic commented Jun 20, 2025

Uh oh!

prashantgupta24 commented Jun 20, 2025

Uh oh!

sducouedic commented Jun 20, 2025

Uh oh!

yannicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

prashantgupta24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sducouedic commented Jun 18, 2025 •

edited

Loading

prashantgupta24 commented Jun 20, 2025 •

edited

Loading