Add block quantization e2e test #1867

shanjiaz · 2025-09-25T17:22:22Z

SUMMARY:
Added e2e testing for block quantization.

TEST PLAN:
Tested locally with the following command:

python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s

log:

================= vLLM GENERATION =================

PROMPT:
The capital of France is
GENERATED TEXT:
 Paris, which is located in the Île-de-France region. The

PROMPT:
The president of the US is
GENERATED TEXT:
 paying for the protests against him. The White House has reportedly cut

PROMPT:
My name is
GENERATED TEXT:
 [insert name], and I am a [insert job title]. I am excited

PASSED

===================================================================================================================== 1 passed in 130.10s (0:02:10) =====================================================================================================================

Signed-off-by: shanjiaz <[email protected]>

github-actions · 2025-09-25T17:23:34Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

dsikka

FYI - this may fail in vllm as I think tyler reverted the PR to support
vllm-project/vllm#25607

shanjiaz · 2025-09-25T18:30:21Z

FYI - this may fail in vllm as I think tyler reverted the PR to support vllm-project/vllm#25607

Ah! That makes sense it was failing for me locally. I was planning on trying to serve the model in vllm directly.

dsikka

I was wrong. Should now work with this PR: vllm-project/vllm#25219

dsikka · 2025-10-01T09:45:43Z

@shanjiaz can we get this in soon

tests/e2e/vLLM/configs/fp8_block.yaml

Signed-off-by: shanjiaz <[email protected]>

brian-dellabetta

cool cool cool cool

tests/e2e/vLLM/configs/fp8_block.yaml

tests/e2e/vLLM/recipes/FP8/recipe_fp8_block.yaml

dsikka

You don’t need a recipe as the tiny llama model has a standard decoder structure with no gating layers to ignore. The lm_head is ignored by the test

Consider reading through the testing functionality: https://github.com/vllm-project/llm-compressor/blob/main/tests/e2e/e2e_utils.py

Signed-off-by: shanjiaz <[email protected]>

brian-dellabetta

👍

SUMMARY: Added e2e testing for block quantization. TEST PLAN: Tested locally with the following command: ``` python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s ``` log: ``` ================= vLLM GENERATION ================= PROMPT: The capital of France is GENERATED TEXT: Paris, which is located in the Île-de-France region. The PROMPT: The president of the US is GENERATED TEXT: paying for the protests against him. The White House has reportedly cut PROMPT: My name is GENERATED TEXT: [insert name], and I am a [insert job title]. I am excited PASSED ===================================================================================================================== 1 passed in 130.10s (0:02:10) ===================================================================================================================== ``` --------- Signed-off-by: shanjiaz <[email protected]>

SUMMARY: Added e2e testing for block quantization. TEST PLAN: Tested locally with the following command: ``` python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s ``` log: ``` ================= vLLM GENERATION ================= PROMPT: The capital of France is GENERATED TEXT: Paris, which is located in the Île-de-France region. The PROMPT: The president of the US is GENERATED TEXT: paying for the protests against him. The White House has reportedly cut PROMPT: My name is GENERATED TEXT: [insert name], and I am a [insert job title]. I am excited PASSED ===================================================================================================================== 1 passed in 130.10s (0:02:10) ===================================================================================================================== ``` --------- Signed-off-by: shanjiaz <[email protected]> Signed-off-by: LeiZhang <[email protected]>

shanjiaz and others added 2 commits September 25, 2025 13:22

Add block quantization e2e test

74d80eb

Signed-off-by: shanjiaz <[email protected]>

Merge branch 'main' into hz-add-e2e-block

57486a7

dsikka reviewed Sep 25, 2025

View reviewed changes

dsikka reviewed Sep 26, 2025

View reviewed changes

Merge branch 'main' into hz-add-e2e-block

dc87e90

shanjiaz added the ready When a PR is ready for review label Oct 10, 2025

shanjiaz marked this pull request as ready for review October 10, 2025 18:31

dsikka requested changes Oct 10, 2025

View reviewed changes

tests/e2e/vLLM/configs/fp8_block.yaml Show resolved Hide resolved

tests/e2e/vLLM/configs/fp8_block.yaml Outdated Show resolved Hide resolved

use tinyllama instead

e262cc8

Signed-off-by: shanjiaz <[email protected]>

shanjiaz requested a review from dsikka October 10, 2025 18:47

brian-dellabetta previously approved these changes Oct 10, 2025

View reviewed changes

dsikka requested changes Oct 10, 2025

View reviewed changes

tests/e2e/vLLM/configs/fp8_block.yaml Outdated Show resolved Hide resolved

kylesayrs requested changes Oct 10, 2025

View reviewed changes

tests/e2e/vLLM/recipes/FP8/recipe_fp8_block.yaml Outdated Show resolved Hide resolved

dsikka requested changes Oct 11, 2025

View reviewed changes

shanjiaz and others added 2 commits October 14, 2025 00:36

Merge branch 'main' into hz-add-e2e-block

6be29c5

remove recipe

b054dd7

Signed-off-by: shanjiaz <[email protected]>

shanjiaz dismissed brian-dellabetta’s stale review via b054dd7 October 14, 2025 13:17

shanjiaz added 2 commits October 14, 2025 13:45

tiny llama

df63eca

Signed-off-by: shanjiaz <[email protected]>

minimal change

a1e5dca

Signed-off-by: shanjiaz <[email protected]>

shanjiaz requested review from brian-dellabetta, dsikka and kylesayrs October 14, 2025 13:50

brian-dellabetta approved these changes Oct 14, 2025

View reviewed changes

shanjiaz added 2 commits October 14, 2025 13:16

Merge branch 'main' into hz-add-e2e-block

858de57

Merge branch 'main' into hz-add-e2e-block

298ea5c

dsikka approved these changes Oct 16, 2025

View reviewed changes

kylesayrs approved these changes Oct 16, 2025

View reviewed changes

kylesayrs merged commit b3c345f into main Oct 16, 2025
9 checks passed

kylesayrs deleted the hz-add-e2e-block branch October 16, 2025 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add block quantization e2e test #1867

Add block quantization e2e test #1867

shanjiaz commented Sep 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

dsikka left a comment

Uh oh!

shanjiaz commented Sep 25, 2025

Uh oh!

dsikka left a comment

Uh oh!

dsikka commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add block quantization e2e test #1867

Add block quantization e2e test #1867

Conversation

shanjiaz commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

shanjiaz commented Sep 25, 2025

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

dsikka commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shanjiaz commented Sep 25, 2025 •

edited

Loading