[Bugfix] Multi-sequence broken #11898

andylolu2 · 2025-01-09T13:08:42Z

Fixes the bugs introduced in #9569

SequenceGroup does not necessarily contain only one sequence (e.g. when n > 1), so many of the optimisations don't make sense.
Currently the seed is duplicated across all completions, so when we have n > 1 with seed set, all completions give the same output.
Currently only the first sequence in a ParallelSampleSequenceGroup yields responses. But once the first sequence finishes it won't receive new chunks. This means responses from other sequences are not sent when the first sequence terminates first.

Andy@MistralAI

github-actions · 2025-01-09T13:08:57Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

andylolu2 · 2025-01-09T13:09:12Z

@youkaichao

youkaichao · 2025-01-10T08:13:57Z

vllm/sequence.py

+            n = self.sampling_params.n
+            assert isinstance(n, int)
+            if n > self.num_seqs():
+                # At prompt stage, the sequence group is not yet filled up
+                # and only have one sequence running. However, in the
+                # generation stage, we will have `n` sequences
+                # running.
+                return n
+        # At sampling stages, return the number of actual sequences
+        # that are not finished yet.
+        return self.num_seqs() - self.num_finished_seqs()


when will we hit this? I think the engine will only see single-sequence request

When you construct the output when n > 1 you access the "master group".

For example, you construct the RequestOutput with multiple sequences here:

vllm/vllm/outputs.py

Line 180 in ef725fe

return cls.from_seq_group(assembled_seq_group, use_cache,

Then call master_seq_group.is_finished() here:

vllm/vllm/outputs.py

Line 170 in ef725fe

finished = seq_group.is_finished()

Which currently already becomes True when the first sequence terminates (regardless of whether the other sequences has terminated)

youkaichao · 2025-01-10T08:14:13Z

vllm/sequence.py

+            params = copy.deepcopy(original_params)
+            params.n = 1
+            if params.seed is not None:
+                params.seed += i


this part makes sense to me.

youkaichao · 2025-01-10T12:02:01Z

@andylolu2 thanks for the fix! can you add a test case for n > 1 and seed to make sure they are different?

andylolu2 · 2025-01-12T22:01:05Z

@youkaichao I added new asserts in the current tests to ensure each sample in the same parallel-sampling group gives different results.

youkaichao · 2025-01-13T08:19:09Z

@andylolu2 please fix the format.

andylolu2 · 2025-01-13T10:01:42Z

Woops, thanks for reminding

Signed-off-by: Andy Lo <[email protected]>

andylolu2 · 2025-01-13T19:03:26Z

@youkaichao Should be good now

hewr2010 · 2025-01-20T08:37:32Z

cases with n>1 work perfectly for me now

youkaichao

thanks for fixing it!

youkaichao · 2025-01-20T08:53:21Z

tests/samplers/test_seeded_generate.py

+
+        # verify generations within the same parallel sampling group differ
+        for output in outputs:
+            for sub_output_a, sub_output_b in combinations(output, 2):
+                assert sub_output_a != sub_output_b


would this test be flaky? e.g. when we generate with n=10, if two sequences happen to generate the same answer ...

ywang96 · 2025-01-20T22:40:43Z

@andylolu2 We have switched to use pre-commit, can you please update this PR with the latest changes in main? Thanks!

Signed-off-by: Andy Lo <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

Signed-off-by: Andy Lo <[email protected]>

andylolu2 · 2025-01-24T23:05:27Z

Sorry about that, was caught up in other matters. Glad that it got merged :)

youkaichao reviewed Jan 10, 2025

View reviewed changes

andylolu2 force-pushed the main branch 3 times, most recently from 7c31b9c to debff7f Compare January 12, 2025 22:00

andylolu2 added 3 commits January 13, 2025 10:06

Fix multi-sequence bugs

d8334c1

Signed-off-by: Andy Lo <[email protected]>

Add test and fix non-streaming multi-sequence

d79289c

Signed-off-by: Andy Lo <[email protected]>

Fix format

55e2c7a

Signed-off-by: Andy Lo <[email protected]>

andylolu2 force-pushed the main branch from d21f5a7 to 55e2c7a Compare January 13, 2025 10:06

youkaichao approved these changes Jan 20, 2025

View reviewed changes

youkaichao reviewed Jan 20, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 20, 2025

simon-mo merged commit 18fd4a8 into vllm-project:main Jan 21, 2025
59 of 64 checks passed

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Bugfix] Multi-sequence broken (vllm-project#11898)

84b9b74

Signed-off-by: Andy Lo <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Bugfix] Multi-sequence broken (vllm-project#11898)

fd109f2

Signed-off-by: Andy Lo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Multi-sequence broken #11898

[Bugfix] Multi-sequence broken #11898

andylolu2 commented Jan 9, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 9, 2025

andylolu2 commented Jan 9, 2025

youkaichao Jan 10, 2025

andylolu2 Jan 10, 2025

andylolu2 Jan 10, 2025 •

edited

Loading

youkaichao Jan 10, 2025

youkaichao commented Jan 10, 2025

andylolu2 commented Jan 12, 2025 •

edited

Loading

youkaichao commented Jan 13, 2025

andylolu2 commented Jan 13, 2025

andylolu2 commented Jan 13, 2025

hewr2010 commented Jan 20, 2025

youkaichao left a comment

youkaichao Jan 20, 2025

ywang96 commented Jan 20, 2025

andylolu2 commented Jan 24, 2025

[Bugfix] Multi-sequence broken #11898

[Bugfix] Multi-sequence broken #11898

Conversation

andylolu2 commented Jan 9, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 9, 2025

andylolu2 commented Jan 9, 2025

youkaichao Jan 10, 2025

Choose a reason for hiding this comment

andylolu2 Jan 10, 2025

Choose a reason for hiding this comment

andylolu2 Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

youkaichao Jan 10, 2025

Choose a reason for hiding this comment

youkaichao commented Jan 10, 2025

andylolu2 commented Jan 12, 2025 • edited Loading

youkaichao commented Jan 13, 2025

andylolu2 commented Jan 13, 2025

andylolu2 commented Jan 13, 2025

hewr2010 commented Jan 20, 2025

youkaichao left a comment

Choose a reason for hiding this comment

youkaichao Jan 20, 2025

Choose a reason for hiding this comment

ywang96 commented Jan 20, 2025

andylolu2 commented Jan 24, 2025

andylolu2 commented Jan 9, 2025 •

edited by github-actions bot

Loading

andylolu2 Jan 10, 2025 •

edited

Loading

andylolu2 commented Jan 12, 2025 •

edited

Loading