Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] APC + prompt logprobs unsupported (PR 2/N for v1 sample and prompt logprobs support) #11910

Open
wants to merge 345 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
345 commits
Select commit Hold shift + click to select a range
4e53851
[Bugfix][Mamba] Fix Multistep on Mamba-like models (#10705)
mzusman Nov 27, 2024
8239c6f
[Bugfix] Ignore `lm_head` when loading embedding models (#10719)
DarkLight1337 Nov 27, 2024
5a3a0eb
[Frontend] don't block event loop in tokenization (preprocess) in Ope…
tomeras91 Nov 27, 2024
b22e27c
[misc] upgrade filelock version (#10731)
youkaichao Nov 28, 2024
b5864e2
[Model] support bitsandbytes quantization with minicpm3 model (#10682)
zixuanzhang226 Nov 28, 2024
b9cabc9
[Doc] Update model in arch_overview.rst to match comment (#10701)
spacewander Nov 28, 2024
d61d661
[Bug][CLI] Allow users to disable prefix caching explicitly (#10724)
rickyyx Nov 28, 2024
39f4494
[V1] Do not allocate beyond the max_model_len (#10730)
WoosukKwon Nov 28, 2024
dcdf2f3
[Kernel] Update vllm-flash-attn version (#10736)
WoosukKwon Nov 28, 2024
ea6ed6b
[TPU] Update requirements-tpu (#10726)
richardsliu Nov 28, 2024
ac0b495
[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561)
sixsixcoder Nov 28, 2024
1362dac
[Kernel] Update vllm-flash-attn version to reduce CPU overheads (#10742)
WoosukKwon Nov 28, 2024
bc6637c
[V1] Optimize the CPU overheads in FlashAttention custom op (#10733)
WoosukKwon Nov 28, 2024
3733796
[Model] Add Internlm2 LoRA support (#5064)
Isotr0py Nov 28, 2024
170a30c
[Model] Clean up MiniCPMV (#10751)
DarkLight1337 Nov 29, 2024
8d83244
[Misc] typo find in sampling_metadata.py (#10740)
noooop Nov 29, 2024
d8499c0
[Bugfix] Fix Idefics3 bug (#10778)
jeejeelee Nov 29, 2024
3c8ced2
[platform] Add verify_quantization in platform. (#10757)
wangxiyuan Nov 29, 2024
5146352
[Bugfix] Fix OpenVino/Neuron `driver_worker` init (#10779)
NickLucche Nov 30, 2024
d95da87
[Model] Refactor Molmo weights loading to use AutoWeightsLoader (#10771)
Isotr0py Nov 30, 2024
7831672
[Interleaved ATTN] Support for Mistral-8B (#10591)
patrickvonplaten Nov 30, 2024
a877540
[doc] format fix (#10789)
wangxiyuan Nov 30, 2024
cbf1489
[Model] Replace embedding models with pooling adapter (#10769)
DarkLight1337 Dec 1, 2024
db1ca39
[Misc] Improve type annotations for `support_torch_compile` (#10763)
DarkLight1337 Dec 1, 2024
d198e8f
[Misc] Rename embedding classes to pooling (#10801)
DarkLight1337 Dec 1, 2024
cf04e11
[doc] add warning about comparing hf and vllm outputs (#10805)
youkaichao Dec 1, 2024
b58062b
[Misc] Adding `MMMU-Pro` vision dataset to serving benchmark (#10804)
ywang96 Dec 1, 2024
bcdb5b8
removed fast tests from pipeline
afeldman-nm Dec 2, 2024
88f7f57
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
KuntaiDu Dec 2, 2024
02eb179
[Model] Add BNB support to Llava and Pixtral-HF (#10795)
Isotr0py Dec 2, 2024
8d5035d
[core] Avoid metrics log noise when idle - include speculative decodi…
cduk Dec 2, 2024
ab21a28
[Kernel] Use `out` arg in flash_attn_varlen_func (#10811)
WoosukKwon Dec 2, 2024
6643bf2
Fill TorchSDPAAttentionMetadata seq_lens_field for prefill (#10799)
maxdebayser Dec 2, 2024
9464931
[misc] remove xverse modeling file (#10814)
youkaichao Dec 2, 2024
777bb76
[doc]Update config docstring (#10732)
wangxiyuan Dec 2, 2024
221ee79
[Model]: add some tests for aria model (#10770)
xffxff Dec 2, 2024
39cd324
Update vllm/outputs.py
afeldman-nm Dec 2, 2024
5757476
small fixes
afeldman-nm Dec 2, 2024
3d1373c
moved output processing commands into processor
afeldman-nm Dec 2, 2024
05f39a9
[CI/Build] Update `mistral_common` version for tests and docs (#10825)
DarkLight1337 Dec 2, 2024
74274c2
added explanatory comment to EngineCore.update_from_output()
afeldman-nm Dec 2, 2024
c9a7b3f
[misc] use out argument for flash attention (#10822)
youkaichao Dec 2, 2024
7ea421d
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Dec 2, 2024
f22facd
constructing dummy logprobs
afeldman-nm Dec 2, 2024
b16dd79
dummy logprobs with decodes
afeldman-nm Dec 2, 2024
0054ece
passing some detokenizer tests
afeldman-nm Dec 2, 2024
59853d5
fixing error during debug
afeldman-nm Dec 2, 2024
193e60c
existing detokenizer test checks are unbroken; need to add logprobs c…
afeldman-nm Dec 2, 2024
a078f89
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 2, 2024
15f9825
merge
afeldman-nm Dec 3, 2024
26b165e
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 4, 2024
30ea722
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 4, 2024
4fefd62
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 4, 2024
603f2b5
model runner returns logprobs as np arrays
afeldman-nm Dec 4, 2024
ac602d8
new request types
afeldman-nm Dec 4, 2024
2a9ef8c
first pass at only using numpy in engine core
afeldman-nm Dec 4, 2024
2fe9147
tested removal of pythonization from engine core
afeldman-nm Dec 4, 2024
1283010
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 4, 2024
fee1e8e
Merge branch 'v1_logprobs' into move_pyth
afeldman-nm Dec 4, 2024
a46a8e5
wip detokenizer updates
afeldman-nm Dec 4, 2024
0c04576
wip
afeldman-nm Dec 5, 2024
0f04d6e
wip
afeldman-nm Dec 5, 2024
c6831ca
first pass at pythonization moved out of engine
afeldman-nm Dec 5, 2024
238bc46
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 5, 2024
86b18aa
Merge branch 'v1_logprobs' into move_pyth
afeldman-nm Dec 5, 2024
ae7e10c
incremental/non-incremental detokenized text comparison
afeldman-nm Dec 5, 2024
3cffca3
implemented the sample logprobs N+1 scenario in the front end
afeldman-nm Dec 5, 2024
73e4c12
fixed prompt logprob count bug
afeldman-nm Dec 5, 2024
5b49d36
passing one test!
afeldman-nm Dec 5, 2024
66fe6bc
Merge branch 'main' into v1_logprobs
afeldman-nm Dec 5, 2024
0cf2c79
successfully failing cumulative logprobs test
afeldman-nm Dec 5, 2024
49e0b33
cumulative logprob works
afeldman-nm Dec 5, 2024
6558b37
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 6, 2024
5d36dcc
Merge branch 'v1_logprobs_merge' into v1_logprobs
afeldman-nm Dec 6, 2024
e8bd247
wip
afeldman-nm Dec 6, 2024
9f39817
progress toward detok stop token test
afeldman-nm Dec 7, 2024
867bb71
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 7, 2024
58bcc5a
detokenizer stop tokens test passing; some slight engine fixes for th…
afeldman-nm Dec 7, 2024
696401e
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 7, 2024
d8361d3
Merge branch 'main' into v1_logprobs
afeldman-nm Dec 7, 2024
85e58c9
Merge branch 'v1_logprobs_merge' into v1_logprobs
afeldman-nm Dec 7, 2024
6320868
refactored detokenizer
afeldman-nm Dec 7, 2024
54abd99
wip
afeldman-nm Dec 7, 2024
7852bb2
incremental detokenization test now also checks logprobs
afeldman-nm Dec 7, 2024
8d82049
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 7, 2024
f6d4329
woosuk code structure suggestion
afeldman-nm Dec 7, 2024
aa15b75
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 7, 2024
a4eb6bc
detokenizer tests refactor
afeldman-nm Dec 7, 2024
06185d0
refactor
afeldman-nm Dec 7, 2024
90ed53d
refactoring
afeldman-nm Dec 7, 2024
48f4671
refactor
afeldman-nm Dec 7, 2024
7121739
refactoring to make logprobs var names clearer, touched a lot of file…
afeldman-nm Dec 7, 2024
cef5ddb
Merge branch 'main' into v1_logprobs
afeldman-nm Dec 7, 2024
bed24db
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 7, 2024
5ce8128
move
afeldman-nm Dec 7, 2024
14c7e56
merge
afeldman-nm Dec 9, 2024
bdd0abf
removed VLLM_USE_V1 checks
afeldman-nm Dec 9, 2024
1fc981e
revert logprobs name changes
afeldman-nm Dec 9, 2024
dc63ac1
removing some unnecessary changes'
afeldman-nm Dec 9, 2024
4f30408
removed fast checks
afeldman-nm Dec 9, 2024
d8e9885
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 10, 2024
77488cb
wip test_completion
afeldman-nm Dec 12, 2024
f1a689c
toward completion tests
afeldman-nm Dec 12, 2024
e962aa7
serialization fix
afeldman-nm Dec 12, 2024
05f982f
tried merge, not quite working
afeldman-nm Dec 16, 2024
b22c5e7
formatted vllm/v1/engine/core.py
afeldman-nm Dec 16, 2024
5bc7039
wip merge
afeldman-nm Dec 16, 2024
4d53751
formatting
afeldman-nm Dec 16, 2024
ba3967f
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 16, 2024
fc8340d
woops, didn't pull in latest v1_logprobs changes
afeldman-nm Dec 17, 2024
e084ad0
merge
afeldman-nm Dec 17, 2024
697fc15
cleanup
afeldman-nm Dec 17, 2024
f61d822
remove calling max_logprobs from engine
afeldman-nm Dec 17, 2024
b77c1af
remove change in hpu
afeldman-nm Dec 17, 2024
20b8af1
Merge branch 'main' into v1_logprobs
afeldman-nm Dec 17, 2024
3193659
merge
afeldman-nm Dec 18, 2024
f0c1ba7
deferring v1 test_completion.py to later PR
afeldman-nm Dec 18, 2024
a9df520
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Dec 18, 2024
efce9ca
merge
afeldman-nm Dec 19, 2024
15654c4
simplify changes to scheduler
robertgshaw2-neuralmagic Dec 29, 2024
5857f87
small assert
robertgshaw2-neuralmagic Dec 29, 2024
a49d4c1
nit
robertgshaw2-neuralmagic Dec 29, 2024
dc7d27c
revert moving update from output file
robertgshaw2-neuralmagic Dec 29, 2024
72eed99
updated
robertgshaw2-neuralmagic Jan 1, 2025
7d6eb22
stahs
robertgshaw2-neuralmagic Jan 1, 2025
7c4c231
stash
robertgshaw2-neuralmagic Jan 1, 2025
eab5ceb
updated
robertgshaw2-neuralmagic Jan 1, 2025
970e030
format
robertgshaw2-neuralmagic Jan 2, 2025
54d6f17
single compute_logits, consider switching to N compute_logits
robertgshaw2-neuralmagic Jan 2, 2025
8d4723e
format
robertgshaw2-neuralmagic Jan 2, 2025
fc20e5e
revert update from output changes
robertgshaw2-neuralmagic Jan 2, 2025
fca2dae
update partial reqs to be a list
robertgshaw2-neuralmagic Jan 2, 2025
317ee1e
update
robertgshaw2-neuralmagic Jan 2, 2025
74fc264
updated
robertgshaw2-neuralmagic Jan 2, 2025
db999da
remove unrelated changes
robertgshaw2-neuralmagic Jan 2, 2025
9b430d8
updated
robertgshaw2-neuralmagic Jan 2, 2025
d470e23
nit
robertgshaw2-neuralmagic Jan 2, 2025
ecaa68a
update ModelRunnerOutput
robertgshaw2-neuralmagic Jan 2, 2025
c32b6eb
updated
robertgshaw2-neuralmagic Jan 2, 2025
09d7592
updated
robertgshaw2-neuralmagic Jan 2, 2025
f092bef
cleanup
robertgshaw2-neuralmagic Jan 2, 2025
555861e
remove spurious change
robertgshaw2-neuralmagic Jan 2, 2025
5b7d629
updated
robertgshaw2-neuralmagic Jan 2, 2025
2694b75
less spurious changes
robertgshaw2-neuralmagic Jan 2, 2025
3d651fc
updated
robertgshaw2-neuralmagic Jan 2, 2025
cbe8275
updated to include the sampled logprob
robertgshaw2-neuralmagic Jan 2, 2025
531eeb7
fix logprobs
robertgshaw2-neuralmagic Jan 2, 2025
c4ed7ba
add utility class
robertgshaw2-neuralmagic Jan 2, 2025
a7cb691
format
robertgshaw2-neuralmagic Jan 2, 2025
d001a05
remove cruft
robertgshaw2-neuralmagic Jan 2, 2025
3a257b8
update comment
robertgshaw2-neuralmagic Jan 2, 2025
bd38a24
nit
robertgshaw2-neuralmagic Jan 2, 2025
531c007
stash
robertgshaw2-neuralmagic Jan 2, 2025
0497bf9
update
robertgshaw2-neuralmagic Jan 2, 2025
25041f6
stash
robertgshaw2-neuralmagic Jan 2, 2025
062d0a7
stash
robertgshaw2-neuralmagic Jan 2, 2025
94d9b38
updated
robertgshaw2-neuralmagic Jan 2, 2025
f2cdb61
updated
robertgshaw2-neuralmagic Jan 2, 2025
3c4d9c1
updated
robertgshaw2-neuralmagic Jan 2, 2025
1a36c3b
updated
robertgshaw2-neuralmagic Jan 2, 2025
9e9ec2b
cleanup diff
robertgshaw2-neuralmagic Jan 2, 2025
b99d9cd
clean up diff
robertgshaw2-neuralmagic Jan 2, 2025
2f85118
clean up diff
robertgshaw2-neuralmagic Jan 2, 2025
cb8c87c
more clean
robertgshaw2-neuralmagic Jan 2, 2025
983f2a7
stash
robertgshaw2-neuralmagic Jan 2, 2025
16a8caa
passing mypy
robertgshaw2-neuralmagic Jan 2, 2025
868e653
updated
robertgshaw2-neuralmagic Jan 2, 2025
62b8360
update
robertgshaw2-neuralmagic Jan 2, 2025
7fe4d85
update
robertgshaw2-neuralmagic Jan 2, 2025
92a27aa
updated
robertgshaw2-neuralmagic Jan 2, 2025
e279409
update indexing
robertgshaw2-neuralmagic Jan 2, 2025
bc3942c
reduce changeg
robertgshaw2-neuralmagic Jan 2, 2025
b5647c3
reduce cruft
robertgshaw2-neuralmagic Jan 2, 2025
0db5db0
reduce cruft
robertgshaw2-neuralmagic Jan 2, 2025
ff7d7d2
updated
robertgshaw2-neuralmagic Jan 2, 2025
8aa8baa
update comment
robertgshaw2-neuralmagic Jan 2, 2025
527228d
format
robertgshaw2-neuralmagic Jan 2, 2025
f5d0b57
reduce length of comments
robertgshaw2-neuralmagic Jan 2, 2025
711ff13
updated
robertgshaw2-neuralmagic Jan 2, 2025
3a99615
reduce assets
robertgshaw2-neuralmagic Jan 2, 2025
6bb6d34
updated
robertgshaw2-neuralmagic Jan 2, 2025
d73010d
updated
robertgshaw2-neuralmagic Jan 2, 2025
b8f40df
updated
robertgshaw2-neuralmagic Jan 2, 2025
e806678
clean
robertgshaw2-neuralmagic Jan 2, 2025
afef932
reduce cruft
robertgshaw2-neuralmagic Jan 2, 2025
71580ae
revert crruft
robertgshaw2-neuralmagic Jan 2, 2025
1d52a37
updated
robertgshaw2-neuralmagic Jan 3, 2025
c8eef87
cleanup
robertgshaw2-neuralmagic Jan 3, 2025
b501aed
updated
robertgshaw2-neuralmagic Jan 3, 2025
ac070f8
updated
robertgshaw2-neuralmagic Jan 3, 2025
9a28ddf
updated
robertgshaw2-neuralmagic Jan 3, 2025
d1a956d
update comment
robertgshaw2-neuralmagic Jan 3, 2025
5fd0060
updated
robertgshaw2-neuralmagic Jan 3, 2025
433b93c
merge
robertgshaw2-neuralmagic Jan 3, 2025
0d2f7c8
stash
robertgshaw2-neuralmagic Jan 3, 2025
06b9aba
cleanup
robertgshaw2-neuralmagic Jan 3, 2025
035e2c2
updated
robertgshaw2-neuralmagic Jan 3, 2025
17e41c8
remove
robertgshaw2-neuralmagic Jan 3, 2025
2cb4832
finish cleaning sampler.py
robertgshaw2-neuralmagic Jan 3, 2025
92595a4
updated
robertgshaw2-neuralmagic Jan 3, 2025
c82fc85
updated comment
robertgshaw2-neuralmagic Jan 3, 2025
c3c4f9c
passing mypy!
robertgshaw2-neuralmagic Jan 3, 2025
fec3d15
comment
robertgshaw2-neuralmagic Jan 3, 2025
d002d67
todo -> fixme
robertgshaw2-neuralmagic Jan 3, 2025
3157e8b
updated
robertgshaw2-neuralmagic Jan 3, 2025
60125e3
fixed sampler bug
afeldman-nm Jan 4, 2025
5908cb1
fixed some sampler bugs
afeldman-nm Jan 5, 2025
c5f9565
merge
afeldman-nm Jan 5, 2025
fc52031
wip fixing detokenizer test
afeldman-nm Jan 5, 2025
7dc2756
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
6e57de4
wip
afeldman-nm Jan 6, 2025
599aae8
temporary hack to use pickling
afeldman-nm Jan 6, 2025
2aa1007
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
ae1e1b7
wip detokenizer test
afeldman-nm Jan 6, 2025
ae00145
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
a1c5b2e
fix: logprobs not being wrapped in an array
afeldman-nm Jan 6, 2025
7288370
sample logprobs work
afeldman-nm Jan 6, 2025
85e57d9
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
0e90ccb
detokenizer test passing for sample logprobs
afeldman-nm Jan 6, 2025
c2f48fb
detokenizer tests passing
afeldman-nm Jan 6, 2025
7993d08
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
13177d4
prompt logprobs with chunked prefill!
afeldman-nm Jan 6, 2025
05536f5
cleanup
afeldman-nm Jan 6, 2025
fa64529
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
0d17df8
light refactor
afeldman-nm Jan 6, 2025
f707191
torch serialization with msgpack via enc_/ext_hooksgit status!
afeldman-nm Jan 6, 2025
637c45c
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 7, 2025
cd5e7c6
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
8b1b995
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
3d00348
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
ce4f081
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
62d648a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 9, 2025
3546639
wip
afeldman-nm Jan 9, 2025
a8c0167
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 9, 2025
0bba8f9
Merge branch 'v1_logprobs' into v1_logprobs_prompt
afeldman-nm Jan 9, 2025
69218ab
GPU returns num_prompt_logprobs + 1 prompt logprobs
afeldman-nm Jan 9, 2025
2505244
now prompt logprobs include prompt token
afeldman-nm Jan 9, 2025
e1058ac
wip making prompt logprobs line up with tok ids
afeldman-nm Jan 9, 2025
5f33902
partial req peek token
afeldman-nm Jan 9, 2025
199a834
refactoring
afeldman-nm Jan 9, 2025
879fc44
refactoring; non-blocking cpu->gpu transfer
afeldman-nm Jan 9, 2025
0f425fe
wip detokenizer tests
afeldman-nm Jan 9, 2025
1089127
detok test fix
afeldman-nm Jan 9, 2025
d2742d8
passing detok tests
afeldman-nm Jan 9, 2025
cf28c9b
Merge branch 'main' into v1_logprobs
afeldman-nm Jan 9, 2025
749be5a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 10, 2025
a55e679
LLMEngine test working, wip AsyncLLM test
afeldman-nm Jan 10, 2025
b2c0c95
reverted unwanted changes
afeldman-nm Jan 10, 2025
9a40c5f
success
afeldman-nm Jan 10, 2025
ca94fd4
Merge branch 'main' into v1_logprobs_apc_merge
afeldman-nm Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 62 additions & 5 deletions tests/v1/engine/test_async_llm.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
import asyncio
from typing import Tuple
from typing import Optional, Tuple

import pytest

from vllm import SamplingParams
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.platforms import current_platform
from vllm.v1.engine.async_llm import AsyncLLM
from vllm.v1.engine.utils import STR_ASYNC_LLM_PROMPT_LP_APC_UNSUPPORTED

if not current_platform.is_cuda():
pytest.skip(reason="V1 currently only supported on CUDA.",
Expand All @@ -16,20 +17,76 @@
disable_log_requests=True)


async def generate(engine: AsyncLLM, request_id: str,
max_tokens: int) -> Tuple[int, str]:
async def generate(
engine: AsyncLLM,
request_id: str,
max_tokens: Optional[int] = None,
sampling_params: Optional[SamplingParams] = None,
) -> Tuple[int, str]:
"""Wrapper for `AsyncLLM` generation.

At least one of `max_tokens` and `sampling_params` must
not be `None`. If `sampling_params` is `None`, `max_tokens`
is used to create a `SamplingParams` instance. If
`sampling_params` is provided, `max_tokens` is not used.

Args:
engine: AsyncLLM instance
request_id: AsyncLLM request ID
max_tokens: (optional) max number of tokens to generate
sampling_params: (optional) request sampling params

Returns:
count: number of returns from engine.generate()
request_id
"""
assert not (max_tokens is None and sampling_params is None), (
"At least one of max_tokens and sampling_params"
" must not be None.")
if sampling_params is None:
sampling_params = SamplingParams(max_tokens=max_tokens, temperature=0)
count = 0
async for _ in engine.generate(request_id=request_id,
prompt="Hello my name is Robert and",
sampling_params=SamplingParams(
max_tokens=max_tokens, temperature=0)):
sampling_params=sampling_params):

count += 1
await asyncio.sleep(0.)

return count, request_id


@pytest.mark.asyncio
async def test_async_llm_refuses_prompt_logprobs_with_apc(monkeypatch):
"""Test passes if AsyncLLM raises an exception when it is configured
for automatic prefix caching and it receives a request with
prompt_logprobs enabled, which is incompatible."""
# TODO(rickyx): Remove monkeypatch VLLM_USE_V1 setting once we have a
# better way to test V1 so that in the future when we switch, we don't
# have to change all the tests.
monkeypatch.setenv("VLLM_USE_V1", "1")
# Create AsyncLLM engine with APC
apc_engine_args = AsyncEngineArgs(model="facebook/opt-125m",
enable_prefix_caching=True,
gpu_memory_utilization=0.8,
disable_log_requests=True)
engine = AsyncLLM.from_engine_args(apc_engine_args)
try:
with pytest.raises(ValueError) as excinfo:
# Issue a request with prompt logprobs enabled, which should fail
await asyncio.create_task(
generate(engine,
"request-0",
sampling_params=SamplingParams(max_tokens=10,
temperature=0,
prompt_logprobs=5)))
# Validate exception string is correct
assert str(excinfo.value) == STR_ASYNC_LLM_PROMPT_LP_APC_UNSUPPORTED
finally:
# Shut down engine
engine.shutdown()


@pytest.mark.asyncio
async def test_load(monkeypatch):
# TODO(rickyx): Remove monkeypatch once we have a better way to test V1
Expand Down
Loading
Loading