[CI] Update FlashInfer to 0.2.6 #19297

mgoin · 2025-06-06T20:38:29Z

Purpose

Update to the latest stable release of FlashInfer. This is the first stable release with Blackwell support, so fairly important to solidify on. However there are not pre-built wheels yet. We can wait to see if wheels will be published, or build our own. @huydhn could you help me with this?

Test Plan

Test Result

Signed-off-by: mgoin <[email protected]>

github-actions · 2025-06-06T20:38:37Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Hello @mgoin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by @mgoin, aims to update the FlashInfer library dependency to version 0.2.6. According to the description, this version is significant as it's the first stable release with Blackwell support. The changes primarily involve modifying the Dockerfile used for CI builds to fetch and build FlashInfer from the v0.2.6 tag on GitHub, moving away from a conditional installation that previously attempted to use a pre-built wheel for CUDA 12.8. The PR also updates the supported CUDA architecture list in the build process.

Highlights

FlashInfer Version Update: Updates the FlashInfer dependency to the latest stable release, v0.2.6, which includes initial support for Blackwell GPUs.
CI Build Logic Change: Modifies the docker/Dockerfile to consistently build FlashInfer from source using the v0.2.6 tag from the GitHub repository, removing the previous conditional logic that would attempt to use a pre-built wheel for CUDA 12.8.
CUDA Architecture Support: Updates the TORCH_CUDA_ARCH_LIST environment variable in the Dockerfile (line 264) to include 10.0+PTX, likely to enable support for newer architectures like Blackwell (SM 10.0).

Changelog

docker/Dockerfile
- Commented out the conditional block (lines 261-263, 270) that checked for CUDA 12.8 and attempted to install a specific pre-built FlashInfer wheel.
- Updated the TORCH_CUDA_ARCH_LIST environment variable (line 264) to include 10.0+PTX.
- Modified the uv pip install command (line 269) to fetch FlashInfer directly from the v0.2.6 tag on GitHub instead of a previous commit hash.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates FlashInfer to version 0.2.6, which is an important step for Blackwell support. The changes in the Dockerfile correctly target the new version and update the TORCH_CUDA_ARCH_LIST to include compute capability 10.0.

The PR description clearly explains the rationale for building from source (lack of pre-built wheels for v0.2.6 currently).

I have a couple of suggestions to improve the clarity and maintainability of the Dockerfile.

Summary of Findings

Outdated/Misleading Comment: A comment on line 260 regarding pre-built wheels for FlashInfer is outdated given the change to build v0.2.6 from source.
Clarity of Commented-Out Code: The block of previously active code for FlashInfer installation (lines 261-263, 270) is now commented out. Replacing it with a concise explanatory comment or removing it would improve Dockerfile readability.

Merge Readiness

The core change to update FlashInfer to v0.2.6 and enable Blackwell support is well-implemented. However, there are a few medium severity issues related to comments and commented-out code that affect the Dockerfile's clarity and maintainability.

I recommend addressing these suggestions to improve the codebase. As I am an AI, I am not authorized to approve pull requests. Please ensure these changes are reviewed and approved by a human maintainer before merging.

gemini-code-assist · 2025-06-06T20:39:51Z

docker/Dockerfile

-        fi; \
-        uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@21ea1d2545f74782b91eb8c08fd503ac4c0743fc" ; \
-    fi \
+    # FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use


The comment # FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use now precedes logic that unconditionally builds FlashInfer v0.2.6 from source (for non-arm64 platforms). This comment seems to refer to the previous state with FlashInfer v0.2.5, where a pre-built wheel was used for CUDA 12.8.

Could we update this comment to more accurately reflect the current strategy of building v0.2.6 from source? This would prevent potential confusion for future readers.

# Building FlashInfer v0.2.6 from source as pre-built wheels for this version are not yet available.

gemini-code-assist · 2025-06-06T20:39:51Z

docker/Dockerfile

+    # if [[ "$CUDA_VERSION" == 12.8* ]]; then \
+    #     uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
+    # else \


This block of commented-out code previously handled the conditional installation of FlashInfer v0.2.5 (either via a pre-built wheel for CUDA 12.8 or by building from a specific commit for other versions). The PR description clarifies that v0.2.6 wheels are not yet available, hence the shift to building from the v0.2.6 tag.

To improve Dockerfile readability and reduce clutter, would it be better to replace these commented-out lines with a single, more concise comment explaining the current situation or a TODO for future wheel availability? For example:

- # if [[ "$CUDA_VERSION" == 12.8* ]]; then \ - # uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \ - # else \ + # TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.

This would make the Dockerfile's intent clearer.

# TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.

gemini-code-assist · 2025-06-06T20:39:51Z

docker/Dockerfile

+        export FLASHINFER_ENABLE_SM90=0; \
+    fi; \
+    uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6" ; \
+    # fi \


This commented-out # fi corresponds to the if block (lines 261-263) that is also now commented out.

If the preceding commented block (lines 261-263) is removed or replaced by a more concise comment as suggested, this line should also be removed to maintain consistency and clarity in the Dockerfile.

huydhn · 2025-06-07T00:10:56Z

Yes, I can help build and publish that wheel on download.pytorch.org

Update FlashInfer to 0.2.6

cb2d8e7

Signed-off-by: mgoin <[email protected]>

gemini-code-assist bot reviewed Jun 6, 2025

View reviewed changes

mergify bot added the ci/build label Jun 6, 2025

gemini-code-assist bot suggested changes Jun 6, 2025

View reviewed changes

Update Dockerfile

572b3a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI] Update FlashInfer to 0.2.6 #19297

[CI] Update FlashInfer to 0.2.6 #19297

mgoin commented Jun 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

huydhn commented Jun 7, 2025

Uh oh!

Uh oh!

Uh oh!

[CI] Update FlashInfer to 0.2.6 #19297

Are you sure you want to change the base?

[CI] Update FlashInfer to 0.2.6 #19297

Conversation

mgoin commented Jun 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

huydhn commented Jun 7, 2025

Uh oh!

Uh oh!

mgoin commented Jun 6, 2025 •

edited by github-actions bot

Loading