Skip to content

🍱 Swap tests to tiny granite #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 27, 2025
Merged

🍱 Swap tests to tiny granite #264

merged 18 commits into from
Jun 27, 2025

Conversation

joerunde
Copy link
Collaborator

@joerunde joerunde commented Jun 27, 2025

Description

This PR swaps our default test decoder model from llama-160m to the micro granite 3.3 model: https://huggingface.co/ibm-ai-platform/micro-g3.3-8b-instruct-1b

Static batching tests run too slowly on cpu with the granite model though, so we've overridden them to continue using the llama model for cpu tests on github. 🤞 the static batching code + tests will be removed shortly in an upcoming release.

Related Issues

Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

joerunde added 2 commits June 26, 2025 18:24
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
@yannicks1
Copy link
Collaborator

@joerunde do we know how to fix the failing tests? looks like something with the tokenizer is off?

@joerunde
Copy link
Collaborator Author

@yannicks1 yeah I think I have this almost working, just not sure if the cache is carrect based on how slow it's going right now

@joerunde
Copy link
Collaborator Author

hmmm, maybe the abort test is just too slow now for static batching? 🤔

Signed-off-by: Joe Runde <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can unset the HF_HUB_OFFLINE variable everywhere with this model

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying it out

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works! Not sure if it is a good idea though, if HF will try to download a new version on each test run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably!

@joerunde
Copy link
Collaborator Author

bot:test

@joerunde joerunde changed the title ⚗️ try to swap to tiny granite 🍱 Swap tests to tiny granite Jun 27, 2025
@joerunde joerunde merged commit 94cee66 into main Jun 27, 2025
17 checks passed
@joerunde joerunde deleted the swap-to-granite branch June 27, 2025 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants