Skip to content

[CI] Update FlashInfer to 0.2.6 #19297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Jun 6, 2025

Purpose

Update to the latest stable release of FlashInfer. This is the first stable release with Blackwell support, so fairly important to solidify on. However there are not pre-built wheels yet. We can wait to see if wheels will be published, or build our own. @huydhn could you help me with this?

Test Plan

Test Result

Copy link

github-actions bot commented Jun 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by @mgoin, aims to update the FlashInfer library dependency to version 0.2.6. According to the description, this version is significant as it's the first stable release with Blackwell support. The changes primarily involve modifying the Dockerfile used for CI builds to fetch and build FlashInfer from the v0.2.6 tag on GitHub, moving away from a conditional installation that previously attempted to use a pre-built wheel for CUDA 12.8. The PR also updates the supported CUDA architecture list in the build process.

Highlights

  • FlashInfer Version Update: Updates the FlashInfer dependency to the latest stable release, v0.2.6, which includes initial support for Blackwell GPUs.
  • CI Build Logic Change: Modifies the docker/Dockerfile to consistently build FlashInfer from source using the v0.2.6 tag from the GitHub repository, removing the previous conditional logic that would attempt to use a pre-built wheel for CUDA 12.8.
  • CUDA Architecture Support: Updates the TORCH_CUDA_ARCH_LIST environment variable in the Dockerfile (line 264) to include 10.0+PTX, likely to enable support for newer architectures like Blackwell (SM 10.0).

Changelog

  • docker/Dockerfile
    • Commented out the conditional block (lines 261-263, 270) that checked for CUDA 12.8 and attempted to install a specific pre-built FlashInfer wheel.
    • Updated the TORCH_CUDA_ARCH_LIST environment variable (line 264) to include 10.0+PTX.
    • Modified the uv pip install command (line 269) to fetch FlashInfer directly from the v0.2.6 tag on GitHub instead of a previous commit hash.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the ci/build label Jun 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates FlashInfer to version 0.2.6, which is an important step for Blackwell support. The changes in the Dockerfile correctly target the new version and update the TORCH_CUDA_ARCH_LIST to include compute capability 10.0.

The PR description clearly explains the rationale for building from source (lack of pre-built wheels for v0.2.6 currently).

I have a couple of suggestions to improve the clarity and maintainability of the Dockerfile.

Summary of Findings

  • Outdated/Misleading Comment: A comment on line 260 regarding pre-built wheels for FlashInfer is outdated given the change to build v0.2.6 from source.
  • Clarity of Commented-Out Code: The block of previously active code for FlashInfer installation (lines 261-263, 270) is now commented out. Replacing it with a concise explanatory comment or removing it would improve Dockerfile readability.

Merge Readiness

The core change to update FlashInfer to v0.2.6 and enable Blackwell support is well-implemented. However, there are a few medium severity issues related to comments and commented-out code that affect the Dockerfile's clarity and maintainability.

I recommend addressing these suggestions to improve the codebase. As I am an AI, I am not authorized to approve pull requests. Please ensure these changes are reviewed and approved by a human maintainer before merging.

fi; \
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@21ea1d2545f74782b91eb8c08fd503ac4c0743fc" ; \
fi \
# FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use now precedes logic that unconditionally builds FlashInfer v0.2.6 from source (for non-arm64 platforms). This comment seems to refer to the previous state with FlashInfer v0.2.5, where a pre-built wheel was used for CUDA 12.8.

Could we update this comment to more accurately reflect the current strategy of building v0.2.6 from source? This would prevent potential confusion for future readers.

    # Building FlashInfer v0.2.6 from source as pre-built wheels for this version are not yet available.

Comment on lines 261 to 263
# if [[ "$CUDA_VERSION" == 12.8* ]]; then \
# uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
# else \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of commented-out code previously handled the conditional installation of FlashInfer v0.2.5 (either via a pre-built wheel for CUDA 12.8 or by building from a specific commit for other versions). The PR description clarifies that v0.2.6 wheels are not yet available, hence the shift to building from the v0.2.6 tag.

To improve Dockerfile readability and reduce clutter, would it be better to replace these commented-out lines with a single, more concise comment explaining the current situation or a TODO for future wheel availability? For example:

-    # if [[ "$CUDA_VERSION" == 12.8* ]]; then \
-    #     uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
-    # else \
+    # TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.

This would make the Dockerfile's intent clearer.

    # TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.

export FLASHINFER_ENABLE_SM90=0; \
fi; \
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6" ; \
# fi \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out # fi corresponds to the if block (lines 261-263) that is also now commented out.

If the preceding commented block (lines 261-263) is removed or replaced by a more concise comment as suggested, this line should also be removed to maintain consistency and clarity in the Dockerfile.

@huydhn
Copy link
Contributor

huydhn commented Jun 7, 2025

Yes, I can help build and publish that wheel on download.pytorch.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants