Skip to content

[CB] additional prefill in warmup to fix TTFT #270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 2, 2025
Merged

Conversation

yannicks1
Copy link
Collaborator

[CB] additional prefill in warmup to fix TTFT

TTFT for the first prefill when running continuous batching was way bigger than for second prefill.
Doing an additional prefill during warmup outside the warmup_context solves this issue.

@yannicks1 yannicks1 self-assigned this Jul 1, 2025
@yannicks1 yannicks1 added the bug Something isn't working label Jul 1, 2025
Copy link

github-actions bot commented Jul 1, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@yannicks1
Copy link
Collaborator Author

bot:test
TEST_FILE=tests/e2e/test_spyre_cb.py MARKERS="spyre"

@yannicks1
Copy link
Collaborator Author

All continuous batching tests have passed no Spyre 🎉

@yannicks1 yannicks1 force-pushed the ysc-fix-warmup-TTFT branch from 21dbb4d to 02ac0d4 Compare July 2, 2025 08:07
@yannicks1 yannicks1 marked this pull request as ready for review July 2, 2025 15:02
@yannicks1 yannicks1 changed the title [do not merge][CB] additional prefill in warmup to fix TTFT [CB] additional prefill in warmup to fix TTFT Jul 2, 2025
]
add_dummy_request = dummy_requests.pop(-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra_dummy_request maybe? add is a verb so this reads to me like a boolean flag

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, but also too fast:)

warmup_end_t = time.time()
warmup_total_t = warmup_end_t - warmup_start_t
logger.info("Warmup finished.")
logger.info("Warmup took %.3fs", warmup_total_t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, it was bad having this timing inside the context manager before 👍

@joerunde joerunde merged commit 7b3545f into main Jul 2, 2025
16 of 18 checks passed
@joerunde joerunde deleted the ysc-fix-warmup-TTFT branch July 2, 2025 15:16
@aluu317
Copy link

aluu317 commented Jul 2, 2025

Might want to change the PR title to include "temp fix" or "additional prefill in warmup to temporarily reduce TTFT" just so we don't get confused that this a permanent fix

joerunde pushed a commit that referenced this pull request Jul 8, 2025
# Description

This came out of a follow up to
#270 to determine why an
extra Prefill was necessary after using `warmup_mode`. I learned that
the extra Prefill is required to deploy the compiled graph to the Spyre
device. This PR does not change any functionality, but updates logging
and documentation around warmup to make this clearer.

---------

Signed-off-by: Travis Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants