Skip to content

Conversation

shanjiaz
Copy link
Collaborator

@shanjiaz shanjiaz commented Sep 25, 2025

SUMMARY:
Added e2e testing for block quantization.

TEST PLAN:
Tested locally with the following command:

python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s

log:

================= vLLM GENERATION =================

PROMPT:
The capital of France is
GENERATED TEXT:
 Paris, which is located in the Île-de-France region. The

PROMPT:
The president of the US is
GENERATED TEXT:
 paying for the protests against him. The White House has reportedly cut

PROMPT:
My name is
GENERATED TEXT:
 [insert name], and I am a [insert job title]. I am excited

PASSED

===================================================================================================================== 1 passed in 130.10s (0:02:10) =====================================================================================================================

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - this may fail in vllm as I think tyler reverted the PR to support
vllm-project/vllm#25607

@shanjiaz
Copy link
Collaborator Author

FYI - this may fail in vllm as I think tyler reverted the PR to support vllm-project/vllm#25607

Ah! That makes sense it was failing for me locally. I was planning on trying to serve the model in vllm directly.

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong. Should now work with this PR: vllm-project/vllm#25219

@dsikka
Copy link
Collaborator

dsikka commented Oct 1, 2025

@shanjiaz can we get this in soon

@shanjiaz shanjiaz added the ready When a PR is ready for review label Oct 10, 2025
@shanjiaz shanjiaz marked this pull request as ready for review October 10, 2025 18:31
Signed-off-by: shanjiaz <[email protected]>
@shanjiaz shanjiaz requested a review from dsikka October 10, 2025 18:47
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool cool cool cool

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don’t need a recipe as the tiny llama model has a standard decoder structure with no gating layers to ignore. The lm_head is ignored by the test

Consider reading through the testing functionality: https://github.com/vllm-project/llm-compressor/blob/main/tests/e2e/e2e_utils.py

Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: shanjiaz <[email protected]>
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kylesayrs kylesayrs merged commit b3c345f into main Oct 16, 2025
9 checks passed
@kylesayrs kylesayrs deleted the hz-add-e2e-block branch October 16, 2025 16:28
cajeonrh pushed a commit to cajeonrh/llm-compressor that referenced this pull request Oct 16, 2025
SUMMARY:
Added e2e testing for block quantization.


TEST PLAN:
Tested locally with the following command:
```
python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s
```

log:
```
================= vLLM GENERATION =================

PROMPT:
The capital of France is
GENERATED TEXT:
 Paris, which is located in the Île-de-France region. The

PROMPT:
The president of the US is
GENERATED TEXT:
 paying for the protests against him. The White House has reportedly cut

PROMPT:
My name is
GENERATED TEXT:
 [insert name], and I am a [insert job title]. I am excited

PASSED

===================================================================================================================== 1 passed in 130.10s (0:02:10) =====================================================================================================================
```

---------

Signed-off-by: shanjiaz <[email protected]>
zhanglei1172 pushed a commit to zhanglei1172/llm-compressor that referenced this pull request Oct 17, 2025
SUMMARY:
Added e2e testing for block quantization.

TEST PLAN:
Tested locally with the following command:
```
python -m pytest tests/e2e/vLLM/test_vllm.py -vv -s
```

log:
```
================= vLLM GENERATION =================

PROMPT:
The capital of France is
GENERATED TEXT:
 Paris, which is located in the Île-de-France region. The

PROMPT:
The president of the US is
GENERATED TEXT:
 paying for the protests against him. The White House has reportedly cut

PROMPT:
My name is
GENERATED TEXT:
 [insert name], and I am a [insert job title]. I am excited

PASSED

===================================================================================================================== 1 passed in 130.10s (0:02:10) =====================================================================================================================
```

---------

Signed-off-by: shanjiaz <[email protected]>
Signed-off-by: LeiZhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants