-
Notifications
You must be signed in to change notification settings - Fork 261
Add block quantization e2e test #1867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: shanjiaz <[email protected]>
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - this may fail in vllm as I think tyler reverted the PR to support
vllm-project/vllm#25607
Ah! That makes sense it was failing for me locally. I was planning on trying to serve the model in vllm directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong. Should now work with this PR: vllm-project/vllm#25219
@shanjiaz can we get this in soon |
Signed-off-by: shanjiaz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool cool cool cool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don’t need a recipe as the tiny llama model has a standard decoder structure with no gating layers to ignore. The lm_head is ignored by the test
Consider reading through the testing functionality: https://github.com/vllm-project/llm-compressor/blob/main/tests/e2e/e2e_utils.py
Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
SUMMARY: Added e2e testing for block quantization. TEST PLAN: Tested locally with the following command: ``` python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s ``` log: ``` ================= vLLM GENERATION ================= PROMPT: The capital of France is GENERATED TEXT: Paris, which is located in the Île-de-France region. The PROMPT: The president of the US is GENERATED TEXT: paying for the protests against him. The White House has reportedly cut PROMPT: My name is GENERATED TEXT: [insert name], and I am a [insert job title]. I am excited PASSED ===================================================================================================================== 1 passed in 130.10s (0:02:10) ===================================================================================================================== ``` --------- Signed-off-by: shanjiaz <[email protected]>
SUMMARY: Added e2e testing for block quantization. TEST PLAN: Tested locally with the following command: ``` python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s ``` log: ``` ================= vLLM GENERATION ================= PROMPT: The capital of France is GENERATED TEXT: Paris, which is located in the Île-de-France region. The PROMPT: The president of the US is GENERATED TEXT: paying for the protests against him. The White House has reportedly cut PROMPT: My name is GENERATED TEXT: [insert name], and I am a [insert job title]. I am excited PASSED ===================================================================================================================== 1 passed in 130.10s (0:02:10) ===================================================================================================================== ``` --------- Signed-off-by: shanjiaz <[email protected]> Signed-off-by: LeiZhang <[email protected]>
SUMMARY:
Added e2e testing for block quantization.
TEST PLAN:
Tested locally with the following command:
log: