-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[CI] Update FlashInfer to 0.2.6 #19297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: mgoin <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by @mgoin, aims to update the FlashInfer library dependency to version 0.2.6. According to the description, this version is significant as it's the first stable release with Blackwell support. The changes primarily involve modifying the Dockerfile used for CI builds to fetch and build FlashInfer from the v0.2.6
tag on GitHub, moving away from a conditional installation that previously attempted to use a pre-built wheel for CUDA 12.8. The PR also updates the supported CUDA architecture list in the build process.
Highlights
- FlashInfer Version Update: Updates the FlashInfer dependency to the latest stable release, v0.2.6, which includes initial support for Blackwell GPUs.
- CI Build Logic Change: Modifies the
docker/Dockerfile
to consistently build FlashInfer from source using thev0.2.6
tag from the GitHub repository, removing the previous conditional logic that would attempt to use a pre-built wheel for CUDA 12.8. - CUDA Architecture Support: Updates the
TORCH_CUDA_ARCH_LIST
environment variable in the Dockerfile (line 264) to include10.0+PTX
, likely to enable support for newer architectures like Blackwell (SM 10.0).
Changelog
- docker/Dockerfile
- Commented out the conditional block (lines 261-263, 270) that checked for CUDA 12.8 and attempted to install a specific pre-built FlashInfer wheel.
- Updated the
TORCH_CUDA_ARCH_LIST
environment variable (line 264) to include10.0+PTX
. - Modified the
uv pip install
command (line 269) to fetch FlashInfer directly from thev0.2.6
tag on GitHub instead of a previous commit hash.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates FlashInfer to version 0.2.6, which is an important step for Blackwell support. The changes in the Dockerfile correctly target the new version and update the TORCH_CUDA_ARCH_LIST
to include compute capability 10.0.
The PR description clearly explains the rationale for building from source (lack of pre-built wheels for v0.2.6 currently).
I have a couple of suggestions to improve the clarity and maintainability of the Dockerfile.
Summary of Findings
- Outdated/Misleading Comment: A comment on line 260 regarding pre-built wheels for FlashInfer is outdated given the change to build v0.2.6 from source.
- Clarity of Commented-Out Code: The block of previously active code for FlashInfer installation (lines 261-263, 270) is now commented out. Replacing it with a concise explanatory comment or removing it would improve Dockerfile readability.
Merge Readiness
The core change to update FlashInfer to v0.2.6 and enable Blackwell support is well-implemented. However, there are a few medium
severity issues related to comments and commented-out code that affect the Dockerfile's clarity and maintainability.
I recommend addressing these suggestions to improve the codebase. As I am an AI, I am not authorized to approve pull requests. Please ensure these changes are reviewed and approved by a human maintainer before merging.
docker/Dockerfile
Outdated
fi; \ | ||
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@21ea1d2545f74782b91eb8c08fd503ac4c0743fc" ; \ | ||
fi \ | ||
# FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment # FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use
now precedes logic that unconditionally builds FlashInfer v0.2.6 from source (for non-arm64 platforms). This comment seems to refer to the previous state with FlashInfer v0.2.5, where a pre-built wheel was used for CUDA 12.8.
Could we update this comment to more accurately reflect the current strategy of building v0.2.6 from source? This would prevent potential confusion for future readers.
# Building FlashInfer v0.2.6 from source as pre-built wheels for this version are not yet available.
docker/Dockerfile
Outdated
# if [[ "$CUDA_VERSION" == 12.8* ]]; then \ | ||
# uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \ | ||
# else \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of commented-out code previously handled the conditional installation of FlashInfer v0.2.5 (either via a pre-built wheel for CUDA 12.8 or by building from a specific commit for other versions). The PR description clarifies that v0.2.6 wheels are not yet available, hence the shift to building from the v0.2.6
tag.
To improve Dockerfile readability and reduce clutter, would it be better to replace these commented-out lines with a single, more concise comment explaining the current situation or a TODO for future wheel availability? For example:
- # if [[ "$CUDA_VERSION" == 12.8* ]]; then \
- # uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
- # else \
+ # TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.
This would make the Dockerfile's intent clearer.
# TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.
docker/Dockerfile
Outdated
export FLASHINFER_ENABLE_SM90=0; \ | ||
fi; \ | ||
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6" ; \ | ||
# fi \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can help build and publish that wheel on download.pytorch.org |
Purpose
Update to the latest stable release of FlashInfer. This is the first stable release with Blackwell support, so fairly important to solidify on. However there are not pre-built wheels yet. We can wait to see if wheels will be published, or build our own. @huydhn could you help me with this?
Test Plan
Test Result