-
Notifications
You must be signed in to change notification settings - Fork 18
Use hugging face as baseline to test CB output #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sophie du Couédic <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
…nerate_cb_spyre_vllm_output Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
7390fb0
to
fa7234e
Compare
Signed-off-by: Sophie du Couédic <[email protected]>
fa7234e
to
4a86784
Compare
I think we can leave it in the list, it sounds like we should be fine to test llama models on cpu going forward. We can update the test configs over on the spyre hardware to swap to granite models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to request changes just so that we can pause more changes for CB. Right now there's a ton of uncertainty surrounding CB and we don't want to make any changes until we have more clarity
…#245) Temporary hack until the parameter makes it to a new release version. Needs to be merged first for the tests on the other PRs to pass. (PS: this was actually the error after fixing the merge conflict in PR #240, which had nothing to do with the conflict) --------- Signed-off-by: Sophie du Couédic <[email protected]> Co-authored-by: Yannick Schnider <[email protected]>
@prashantgupta24 mmh since the code change has already be done, and don't fundamentally change the logic of the tests (just a bit of refactoring and better output comparison), I would prefer still to merge that branch, instead of keeping it aside for a long time. |
The changes that you have here are amazing! But there's still some uncertainty around the output that continuous batching is generating on the actual Probably worth spending a bit of time understanding why and see if we want to push a "working" implementation which doesn't produce good results in the meantime (Which could be that we make sure that there is some output). In that case, we will not be able to merge this PR as is because the outputs will never match |
@prashantgupta24 ah I think there is some confusion, the tests were not passing because of some breaking changes in vllm upstream (a new parameter introduced again), fixed in this PR #245. I just synced with main and all the tests are passing now (at least on the CPU) |
I am talking about the tests failing on the actual |
Ah ok I see. Fundamentally it doesn't really change to compare with HF output or a hardcoded ground truth. But yes let's freeze the code if that can simplify debugging for spyre 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We have a plan on how to handle testing with spyre
Closes issue #168
@joerunde About the llama issue, maybe we can wait until and e2e image gets published with CB enabled before removing it from the list of model for cb? it won't be used in the meantime in the CI/CD anyway, and this allows to test it on cpu for now, what do you think?