-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add optional token-level progress bar to LLM.beam_search
using tqdm
#19301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @NekoMimiUnagi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request. This PR, authored by NekoMimiUnagi, aims to improve the usability of the LLM.beam_search
method by adding an optional token-level progress bar using the tqdm
library. The core motivation is to give users better visibility into the generation process, addressing existing feature requests. The changes involve modifying the beam_search
method signature to accept a new use_tqdm
boolean parameter, adding documentation for it, and conditionally wrapping the main token generation loop with tqdm
when the option is enabled. A warning is also included to clarify that the progress bar represents token steps and is an upper bound, not necessarily reflecting instance-level progress.
Highlights
- Add Optional Progress Bar: Introduces an optional token-level progress bar to the
LLM.beam_search
method using thetqdm
library. - New Parameter
use_tqdm
: Adds a boolean parameteruse_tqdm
(defaulting toFalse
) to thebeam_search
method to control the visibility of the progress bar, maintaining backward compatibility. - Progress Bar Warning: Includes a warning message when the progress bar is enabled, informing users that it tracks token steps (an upper bound) and not instance-level progress.
Changelog
- vllm/entrypoints/llm.py
- Added the
use_tqdm: bool = False
parameter to thebeam_search
method signature (around line 534). - Added a docstring entry for the new
use_tqdm
parameter (around line 544). - Modified the main token generation loop (
for _ in range(max_tokens):
) to conditionally wrap the iteration withtqdm
ifuse_tqdm
is True (around line 607-619). - Added a
warnings.warn
call within theuse_tqdm
conditional block to explain the nature of the progress bar (around line 610-617).
- Added the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a very useful feature: an optional token-level progress bar for LLM.beam_search
using tqdm
. This significantly improves the user experience by providing visibility into the generation process. The implementation is clear, and the inclusion of the use_tqdm
flag ensures backward compatibility. The PR description, test plan, and test results are also well-documented.
I've found one high-severity issue related to the parameters passed to warnings.warn
and the tqdm
initialization that needs to be addressed. There are also a couple of minor stylistic points mentioned in the summary.
Summary of Findings
- Incorrect parameters for
warnings.warn
andtqdm
configuration: Thewarnings.warn
function was called withunit
andunit_scale
parameters, which are invalid for it. These parameters, along with a minor adjustment to the description string, should be moved to thetqdm
call to ensure correct progress bar display and to prevent potential errors. This was commented on directly. - Minor: Extra newline: There's an unnecessary extra blank line at line 618 in
vllm/entrypoints/llm.py
. This is a minor formatting issue and was not commented on directly due to review settings (severity: low).
Merge Readiness
The addition of the progress bar is a valuable enhancement. However, there's a high-severity issue with the warnings.warn
call that needs to be fixed before this PR can be merged. Once that's addressed, this should be good to go. I am unable to approve the pull request, so please have others review and approve this code before merging.
Adds an optional token-level progress bar to the `LLM.beam_search()` method using `tqdm`. This improves visibility for long-running inference by allowing users to estimate progress and remaining time. The progress bar is enabled via a new `use_tqdm` boolean argument (default: False), and it wraps the `range(max_tokens)` loop. Also includes a logger warning when the bar is enabled to clarify that the progress shown is a token-level upper bound and may terminate early due to stopping conditions. The tqdm bar is labeled "Beam search" with units shown as "tokens". Example: outputs = llm.beam_search(prompts, sampling_params, use_tqdm=True) This change improves developer experience and aligns `beam_search` closer to `generate` and `chat`, which provide better runtime feedback. Signed-off-by: Ruosen Li <[email protected]>
Signed-off-by: Ruosen Li <[email protected]>
Signed-off-by: Ruosen Li <[email protected]>
Signed-off-by: Ruosen Li <[email protected]>
Essential Elements of an Effective PR Description Checklist
Purpose
This PR adds an optional token-level progress bar to the
LLM.beam_search()
method usingtqdm
. It addresses a recurring usability issue: when running beam search inference, users have no visibility into how long generation will take or how far along the process is.This PR resolves the feature request described in #19300 (comment).
This feature:
range(max_tokens)
loop withtqdm
ifuse_tqdm=True
,use_tqdm
argument (default: False) to preserve backward compatibility,This proposal was also previously requested by users #11835, and this PR revives the functionality in a clean, optional, and informative manner.
Test Plan
Enable the progress bar in a script or notebook:
Observe the terminal for:
Beam Searching: 39%|█ | 25/64 tokens
,Test Result