Skip to content

Add optional token-level progress bar to LLM.beam_search using tqdm #19301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

NekoMimiUnagi
Copy link

@NekoMimiUnagi NekoMimiUnagi commented Jun 6, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results

Purpose

This PR adds an optional token-level progress bar to the LLM.beam_search() method using tqdm. It addresses a recurring usability issue: when running beam search inference, users have no visibility into how long generation will take or how far along the process is.

This PR resolves the feature request described in #19300 (comment).

This feature:

  • Wraps the range(max_tokens) loop with tqdm if use_tqdm=True,
  • Adds a use_tqdm argument (default: False) to preserve backward compatibility,
  • Emits a warning explaining that the progress bar shows an upper bound on token steps (not per-instance progress),
  • Labels the tqdm output with "Beam search" and shows progress in tokens (e.g., 40/128 tokens),

This proposal was also previously requested by users #11835, and this PR revives the functionality in a clean, optional, and informative manner.

Test Plan

Enable the progress bar in a script or notebook:

outputs = llm.beam_search(
    prompts=["Translate this sentence."],
    sampling_params=sampling_params,
    use_tqdm=True,  # Enables progress bar
)

Observe the terminal for:

  • The tqdm bar output like Beam Searching: 39%|█ | 25/64 tokens,
  • The log warning about token-level estimation.

Test Result

WARNING vllm: The progress bar shows the upper bound on token steps and may finish early due to stopping conditions. It does not reflect instance-level progress.
Beam search:  36%|███████████████████▏                  | 36/100 tokens
  • Progress bar is visible, responsive, and terminates early if generation stops.
  • No side effects when use_tqdm=False or when tqdm is not available.

@NekoMimiUnagi NekoMimiUnagi requested a review from aarnphm as a code owner June 6, 2025 23:03
Copy link

github-actions bot commented Jun 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @NekoMimiUnagi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. This PR, authored by NekoMimiUnagi, aims to improve the usability of the LLM.beam_search method by adding an optional token-level progress bar using the tqdm library. The core motivation is to give users better visibility into the generation process, addressing existing feature requests. The changes involve modifying the beam_search method signature to accept a new use_tqdm boolean parameter, adding documentation for it, and conditionally wrapping the main token generation loop with tqdm when the option is enabled. A warning is also included to clarify that the progress bar represents token steps and is an upper bound, not necessarily reflecting instance-level progress.

Highlights

  • Add Optional Progress Bar: Introduces an optional token-level progress bar to the LLM.beam_search method using the tqdm library.
  • New Parameter use_tqdm: Adds a boolean parameter use_tqdm (defaulting to False) to the beam_search method to control the visibility of the progress bar, maintaining backward compatibility.
  • Progress Bar Warning: Includes a warning message when the progress bar is enabled, informing users that it tracks token steps (an upper bound) and not instance-level progress.

Changelog

  • vllm/entrypoints/llm.py
    • Added the use_tqdm: bool = False parameter to the beam_search method signature (around line 534).
    • Added a docstring entry for the new use_tqdm parameter (around line 544).
    • Modified the main token generation loop (for _ in range(max_tokens):) to conditionally wrap the iteration with tqdm if use_tqdm is True (around line 607-619).
    • Added a warnings.warn call within the use_tqdm conditional block to explain the nature of the progress bar (around line 610-617).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the frontend label Jun 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a very useful feature: an optional token-level progress bar for LLM.beam_search using tqdm. This significantly improves the user experience by providing visibility into the generation process. The implementation is clear, and the inclusion of the use_tqdm flag ensures backward compatibility. The PR description, test plan, and test results are also well-documented.

I've found one high-severity issue related to the parameters passed to warnings.warn and the tqdm initialization that needs to be addressed. There are also a couple of minor stylistic points mentioned in the summary.

Summary of Findings

  • Incorrect parameters for warnings.warn and tqdm configuration: The warnings.warn function was called with unit and unit_scale parameters, which are invalid for it. These parameters, along with a minor adjustment to the description string, should be moved to the tqdm call to ensure correct progress bar display and to prevent potential errors. This was commented on directly.
  • Minor: Extra newline: There's an unnecessary extra blank line at line 618 in vllm/entrypoints/llm.py. This is a minor formatting issue and was not commented on directly due to review settings (severity: low).

Merge Readiness

The addition of the progress bar is a valuable enhancement. However, there's a high-severity issue with the warnings.warn call that needs to be fixed before this PR can be merged. Once that's addressed, this should be good to go. I am unable to approve the pull request, so please have others review and approve this code before merging.

Adds an optional token-level progress bar to the `LLM.beam_search()` method using `tqdm`. This improves visibility for long-running inference by allowing users to estimate progress and remaining time.

The progress bar is enabled via a new `use_tqdm` boolean argument (default: False), and it wraps the `range(max_tokens)` loop.

Also includes a logger warning when the bar is enabled to clarify that the progress shown is a token-level upper bound and may terminate early due to stopping conditions. The tqdm bar is labeled "Beam search" with units shown as "tokens".

Example:
    outputs = llm.beam_search(prompts, sampling_params, use_tqdm=True)

This change improves developer experience and aligns `beam_search` closer to `generate` and `chat`, which provide better runtime feedback.

Signed-off-by: Ruosen Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant