-
Notifications
You must be signed in to change notification settings - Fork 18
[CB] additional prefill in warmup to fix TTFT #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yannick Schnider <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
bot:test |
All continuous batching tests have passed no Spyre 🎉 |
21dbb4d
to
02ac0d4
Compare
] | ||
add_dummy_request = dummy_requests.pop(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra_dummy_request
maybe? add
is a verb so this reads to me like a boolean flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, but also too fast:)
warmup_end_t = time.time() | ||
warmup_total_t = warmup_end_t - warmup_start_t | ||
logger.info("Warmup finished.") | ||
logger.info("Warmup took %.3fs", warmup_total_t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, it was bad having this timing inside the context manager before 👍
Might want to change the PR title to include "temp fix" or "additional prefill in warmup to temporarily reduce TTFT" just so we don't get confused that this a permanent fix |
# Description This came out of a follow up to #270 to determine why an extra Prefill was necessary after using `warmup_mode`. I learned that the extra Prefill is required to deploy the compiled graph to the Spyre device. This PR does not change any functionality, but updates logging and documentation around warmup to make this clearer. --------- Signed-off-by: Travis Johnson <[email protected]>
[CB] additional prefill in warmup to fix TTFT
TTFT for the first prefill when running continuous batching was way bigger than for second prefill.
Doing an additional prefill during warmup outside the warmup_context solves this issue.