Add optional token-level progress bar to `LLM.beam_search` using tqdm #19301

NekoMimiUnagi · 2025-06-06T23:03:31Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

Purpose

This PR adds an optional token-level progress bar to the LLM.beam_search() method using tqdm. It addresses a recurring usability issue: when running beam search inference, users have no visibility into how long generation will take or how far along the process is.

This PR resolves the feature request described in #19300 (comment).

This feature:

Wraps the range(max_tokens) loop with tqdm if use_tqdm=True,
Adds a use_tqdm argument (default: False) to preserve backward compatibility,
Emits a warning explaining that the progress bar shows an upper bound on token steps (not per-instance progress),
Labels the tqdm output with "Beam search" and shows progress in tokens (e.g., 40/128 tokens),

This proposal was also previously requested by users #11835, and this PR revives the functionality in a clean, optional, and informative manner.

Test Plan

Enable the progress bar in a script or notebook:

outputs = llm.beam_search(
    prompts=["Translate this sentence."],
    sampling_params=sampling_params,
    use_tqdm=True,  # Enables progress bar
)

Observe the terminal for:

The tqdm bar output like Beam Searching: 39%|█ | 25/64 tokens,
The log warning about token-level estimation.

Test Result

WARNING vllm: The progress bar shows the upper bound on token steps and may finish early due to stopping conditions. It does not reflect instance-level progress.
Beam search:  36%|███████████████████▏                  | 36/100 tokens

Progress bar is visible, responsive, and terminates early if generation stops.
No side effects when use_tqdm=False or when tqdm is not available.

github-actions · 2025-06-06T23:03:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Hello @NekoMimiUnagi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. This PR, authored by NekoMimiUnagi, aims to improve the usability of the LLM.beam_search method by adding an optional token-level progress bar using the tqdm library. The core motivation is to give users better visibility into the generation process, addressing existing feature requests. The changes involve modifying the beam_search method signature to accept a new use_tqdm boolean parameter, adding documentation for it, and conditionally wrapping the main token generation loop with tqdm when the option is enabled. A warning is also included to clarify that the progress bar represents token steps and is an upper bound, not necessarily reflecting instance-level progress.

Highlights

Add Optional Progress Bar: Introduces an optional token-level progress bar to the LLM.beam_search method using the tqdm library.
New Parameter use_tqdm: Adds a boolean parameter use_tqdm (defaulting to False) to the beam_search method to control the visibility of the progress bar, maintaining backward compatibility.
Progress Bar Warning: Includes a warning message when the progress bar is enabled, informing users that it tracks token steps (an upper bound) and not instance-level progress.

Changelog

vllm/entrypoints/llm.py
- Added the use_tqdm: bool = False parameter to the beam_search method signature (around line 534).
- Added a docstring entry for the new use_tqdm parameter (around line 544).
- Modified the main token generation loop (for _ in range(max_tokens):) to conditionally wrap the iteration with tqdm if use_tqdm is True (around line 607-619).
- Added a warnings.warn call within the use_tqdm conditional block to explain the nature of the progress bar (around line 610-617).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a very useful feature: an optional token-level progress bar for LLM.beam_search using tqdm. This significantly improves the user experience by providing visibility into the generation process. The implementation is clear, and the inclusion of the use_tqdm flag ensures backward compatibility. The PR description, test plan, and test results are also well-documented.

I've found one high-severity issue related to the parameters passed to warnings.warn and the tqdm initialization that needs to be addressed. There are also a couple of minor stylistic points mentioned in the summary.

Summary of Findings

Incorrect parameters for warnings.warn and tqdm configuration: The warnings.warn function was called with unit and unit_scale parameters, which are invalid for it. These parameters, along with a minor adjustment to the description string, should be moved to the tqdm call to ensure correct progress bar display and to prevent potential errors. This was commented on directly.
Minor: Extra newline: There's an unnecessary extra blank line at line 618 in vllm/entrypoints/llm.py. This is a minor formatting issue and was not commented on directly due to review settings (severity: low).

Merge Readiness

The addition of the progress bar is a valuable enhancement. However, there's a high-severity issue with the warnings.warn call that needs to be fixed before this PR can be merged. Once that's addressed, this should be good to go. I am unable to approve the pull request, so please have others review and approve this code before merging.

vllm/entrypoints/llm.py

Adds an optional token-level progress bar to the `LLM.beam_search()` method using `tqdm`. This improves visibility for long-running inference by allowing users to estimate progress and remaining time. The progress bar is enabled via a new `use_tqdm` boolean argument (default: False), and it wraps the `range(max_tokens)` loop. Also includes a logger warning when the bar is enabled to clarify that the progress shown is a token-level upper bound and may terminate early due to stopping conditions. The tqdm bar is labeled "Beam search" with units shown as "tokens". Example: outputs = llm.beam_search(prompts, sampling_params, use_tqdm=True) This change improves developer experience and aligns `beam_search` closer to `generate` and `chat`, which provide better runtime feedback. Signed-off-by: Ruosen Li <[email protected]>

Signed-off-by: Ruosen Li <[email protected]>

NekoMimiUnagi requested a review from aarnphm as a code owner June 6, 2025 23:03

gemini-code-assist bot reviewed Jun 6, 2025

View reviewed changes

mergify bot added the frontend label Jun 6, 2025

gemini-code-assist bot suggested changes Jun 6, 2025

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

NekoMimiUnagi added 3 commits June 6, 2025 18:37

Update the comment for use_tqdm

4ad5828

Signed-off-by: Ruosen Li <[email protected]>

Update llm.py based on Code Review

06f937d

Signed-off-by: Ruosen Li <[email protected]>

NekoMimiUnagi force-pushed the main branch from 4e11888 to 06f937d Compare June 6, 2025 23:39

Fix use_tqdm bug and reformat the file

164e079

Signed-off-by: Ruosen Li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add optional token-level progress bar to `LLM.beam_search` using tqdm #19301

Add optional token-level progress bar to `LLM.beam_search` using tqdm #19301

Uh oh!

NekoMimiUnagi commented Jun 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add optional token-level progress bar to LLM.beam_search using tqdm #19301

Are you sure you want to change the base?

Add optional token-level progress bar to LLM.beam_search using tqdm #19301

Uh oh!

Conversation

NekoMimiUnagi commented Jun 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

Add optional token-level progress bar to `LLM.beam_search` using tqdm #19301

Add optional token-level progress bar to `LLM.beam_search` using tqdm #19301

NekoMimiUnagi commented Jun 6, 2025 •

edited by github-actions bot

Loading