-
Notifications
You must be signed in to change notification settings - Fork 18
🍱 Swap tests to tiny granite #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Joe Runde <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
@joerunde do we know how to fix the failing tests? looks like something with the tokenizer is off? |
Signed-off-by: Joe Runde <[email protected]>
@yannicks1 yeah I think I have this almost working, just not sure if the cache is carrect based on how slow it's going right now |
hmmm, maybe the abort test is just too slow now for static batching? 🤔 |
Signed-off-by: Joe Runde <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can unset the HF_HUB_OFFLINE
variable everywhere with this model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works! Not sure if it is a good idea though, if HF will try to download a new version on each test run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably!
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
bot:test |
Signed-off-by: Joe Runde <[email protected]>
Description
This PR swaps our default test decoder model from llama-160m to the micro granite 3.3 model: https://huggingface.co/ibm-ai-platform/micro-g3.3-8b-instruct-1b
Static batching tests run too slowly on cpu with the granite model though, so we've overridden them to continue using the llama model for cpu tests on github. 🤞 the static batching code + tests will be removed shortly in an upcoming release.
Related Issues