FlashInfer+ROCm: An AMD ROCm port of FlashInfer

FlashInfer+ROCm is a port of the FlashInfer library that adds support for AMD Instinct GPUs. The project is in active development with current focus on porting attention kernels to ROCm.

Versioning: The release tag format <upstream_version>+amd ties each FlashInfer+ROCm release to its corresponding upstream tag (e.g., 0.2.5+amd.2 is based on upstream v0.2.5).

Feature Support Matrix

Kernel Type	FP16 / BF16	FP8 (E4M3, E5M2)	Notes
Decode Attention	✅	✅	Supports MHA, GQA, and MQA
Prefill Attention	✅	WIP	Supports MHA, GQA, and MQA
Cascade	WIP	WIP	Not Yet Ported
MLA	TBD	TBD	Not Yet Ported
POD	TBD	TBD	Not Yet Ported
Positional Encoding	TBD	TBD	Not Yet Ported
Sampling	TBD	TBD	Top-K/Top-P Sampling Not Yet Ported
Normalization	TBD	TBD	RMS-Norm/Layer-Norm Not Yet Ported

GPU and ROCm Support

Supported GPU: gfx942 (CDNA3 architecture)

Supported ROCm versions: 6.3.2, 6.4.1, 7.0.2, 7.1.1

Torch Version Support

Torch+ROCm: 2.7.1, 2.8.0

Note: Other versions may work but have not been tested. Refer to https://repo.radeon.com/rocm/manylinux/rocm-rel-{rocm-version}/ (replacing {rocm-version} with the desired ROCm version, e.g., 6.4.1) for available versions.

Getting Started

Option 1: Get a Pre-built Docker Image

Pre-built Docker images are available at https://hub.docker.com/r/rocm/flashinfer.

Docker Image	ROCm	FlashInfer	PyTorch
rocm/flashinfer:flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7	6.4.1	0.2.5	2.7.1

Start a container:

docker run -it --privileged --network=host --device=/dev/kfd --device=/dev/dri \
  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
  --ipc=host --shm-size 128G --name=<container-name> <docker-image-tag>

Activate the environment and verify:

# Activate micromamba environment (name varies by image)
micromamba activate flashinfer-py3.12-torch2.7.1-rocm6.4.1

# Verify installation
python -c "import flashinfer; print(flashinfer.__version__)"

Expected output: 0.2.5+rocm.1 (with a possible JIT backend message)

Option 2: Install from a Wheel Package

Install from AMD's package repository:

pip install amd-flashinfer --index-url https://pypi.amd.com/simple/

Install a ROCm-enabled torch package from https://repo.radeon.com:

pip install torch==2.7.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1

NOTE: The torch version should be exactly as available on repo.radeon.com otherwise a non-ROCm torch version will get installed from pypi.

Trying the Examples

Download and run example scripts from the repository:

# Download a single example
wget https://raw.githubusercontent.com/ROCm/flashinfer/amd-integration/examples/single_prefill_example.py
python single_prefill_example.py

# Download all examples
for example in single_prefill_example.py batch_prefill_example.py batch_decode_example.py; do
  wget https://raw.githubusercontent.com/ROCm/flashinfer/amd-integration/examples/$example
done

Available examples:

single_prefill_example.py - Single-sequence prefill attention
batch_prefill_example.py - Batched prefill attention
batch_decode_example.py - Batched decode attention

For Developers

Setting up a Development Environment

Build the development Docker image with the repository's Dockerfile:

docker build \
  --build-arg ROCM_VERSION=6.4.1 \
  --build-arg PY_VERSION=3.12 \
  --build-arg TORCH_VERSION=2.7.1 \
  --build-arg USERNAME=$USER \
  --build-arg USER_UID=$(id -u) \
  --build-arg USER_GID=$(id -g) \
  -t flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7 \
  -f .devcontainer/rocm/Dockerfile .

Build argument descriptions

ROCM_VERSION: ROCm version (default: 7.0.2)
PY_VERSION: Python version (default: 3.12)
TORCH_VERSION: PyTorch version (default: 2.7.1)
USERNAME: Username inside container (default: devuser)
USER_UID: User ID for matching host permissions
USER_GID: Group ID for matching host permissions

Run the development container:

docker run -it \
  --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
  --ipc=host --privileged --shm-size=128G --network=host \
  --device=/dev/kfd --device=/dev/dri \
  --group-add video --group-add render \
  -v $PWD:/workspace \
  --name flashinfer-dev-container \
  flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7

Docker run argument descriptions

--cap-add=SYS_PTRACE: Enables debugging
--security-opt seccomp=unconfined: Relaxes security for development
--ipc=host: Shares host IPC for better performance
--privileged: Required for GPU access
--shm-size=128G: Shared memory size (adjust as needed)
--network=host: Uses host networking
--device=/dev/kfd --device=/dev/dri: Exposes AMD GPU devices
--group-add video --group-add render: GPU access groups
-v <host-path>:<container-path>: Mounts source code

Activate the micromamba environment:

micromamba activate flashinfer-py3.12-torch2.7.1-rocm6.4.1

Note: Environment name varies based on Python, PyTorch, and ROCm versions.

Building and Installing a Wheel Package

Build with AOT (Ahead-of-Time) compiled kernels:

FLASHINFER_HIP_ARCHITECTURES=gfx942 FLASHINFER_AOT_TORCH_EXTS=ON \
  python -m pip wheel . --wheel-dir=./dist/ --no-deps --no-build-isolation -v
cd dist && pip install flashinfer-*.whl

Build with JIT (Just-in-Time) compilation only:

FLASHINFER_HIP_ARCHITECTURES=gfx942 \
  python -m pip wheel . --wheel-dir=./dist/ --no-deps --no-build-isolation -v
cd dist && pip install flashinfer-*.whl

Editable install for development:

FLASHINFER_HIP_ARCHITECTURES=gfx942 python -m pip install --no-build-isolation -ve.

Note: The --no-deps flag assumes dependencies are pre-installed. Omit it to download dependencies during build. AOT builds take longer and use more disk space but avoid JIT compilation at runtime.

Running Tests

The Python tests suite can be run with pytest:

# Run default tests (configured in pyproject.toml)
pytest

# Run specific test file
pytest tests/test_decode_kernels_hip.py

# Run with pattern matching
pytest -k "test_decode_kernels_hip"

# Verbose output
pytest -v

The default test configuration is specified in pyproject.toml under the testpaths setting.

Name		Name	Last commit message	Last commit date
Latest commit History 1,192 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
3rdparty		3rdparty
aot_build_utils		aot_build_utils
benchmarks		benchmarks
ci		ci
cmake		cmake
csrc		csrc
docker		docker
docs		docs
examples		examples
flashinfer		flashinfer
include		include
licenses		licenses
profiler		profiler
scripts		scripts
templates		templates
tests		tests
.clang-format		.clang-format
.clangd		.clangd
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlashInfer+ROCm: An AMD ROCm port of FlashInfer

Table of Contents

Feature Support Matrix

GPU and ROCm Support

Torch Version Support

Getting Started

Option 1: Get a Pre-built Docker Image

Option 2: Install from a Wheel Package

Trying the Examples

For Developers

Setting up a Development Environment

Building and Installing a Wheel Package

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ROCm/flashinfer

Folders and files

Latest commit

History

Repository files navigation

FlashInfer+ROCm: An AMD ROCm port of FlashInfer

Table of Contents

Feature Support Matrix

GPU and ROCm Support

Torch Version Support

Getting Started

Option 1: Get a Pre-built Docker Image

Option 2: Install from a Wheel Package

Trying the Examples

For Developers

Setting up a Development Environment

Building and Installing a Wheel Package

Running Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages