flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 285
Star 2.8k

Code
Issues 101
Pull requests 14
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: flashinfer-ai/flashinfer

[Roadmap] FlashInfer v0.2 to v0.3

#675 opened Dec 17, 2024 by yzh119

Open 6

Deprecation Notice: Python 3.8 Wheel Support to End in future...

#682 opened Dec 18, 2024 by yzh119

Open 2

[Feature Request] Llama 4

#1004 opened Apr 6, 2025 by yzh119

Open 1

Beta

Labels 16 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

101 Open 169 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Does flashinfer support head_size = 576 for Ampere GPUs?

#1043 opened Apr 28, 2025 by ghostplant

flashinfer.jit: Loading JIT ops: batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False

#1038 opened Apr 24, 2025 by yawnzh

Which one of FlashMLA is faster on Hopper?

#1037 opened Apr 24, 2025 by ghostplant

Can flashinfer's CutlassSegmentGEMMSM90Run function be used for LoRA computing on H20?

#1034 opened Apr 23, 2025 by chenhongyu2048

flashinfer.decode.single_decode_with_kv_cache: Floating point exception (core dumped)

#1027 opened Apr 20, 2025 by MenHimChan

[Performance Issue] FlashInfer shows no performance improvement with FP8 compared to BF16 in BatchDecodeWithPagedKVCacheWrapper with page_size=1

#1024 opened Apr 18, 2025 by cscyuge

[Bug] FP8 scaling factors (k_scale/v_scale) not taking effect in BatchPrefillWithPagedKVCacheWrapper

#1023 opened Apr 17, 2025 by cscyuge

Low performance of POD Attention compared to BatchPrefillWithPagedKVCache

#1022 opened Apr 17, 2025 by Edenzzzz

[Feature Request] Llama 4

#1004 opened Apr 6, 2025 by yzh119

3 tasks

Does FlashInfer support ViT attention in Qwen2.5_Vl?

#992 opened Apr 1, 2025 by IdeaMeshDyx

[Feature Tracking] MLA-FP8 Hopper kernel

#990 opened Mar 31, 2025 by yzh119

C++ API Stability

#988 opened Mar 30, 2025 by AgrawalAmey

Unclear behavior of top_k for k < 1

#979 opened Mar 28, 2025 by sharvil

top_k_top_p_sampling_from_logits incompatible with torch.compile + CUDAGraph

#978 opened Mar 28, 2025 by sharvil

struck in the JIT loading stage without any errors

#972 opened Mar 25, 2025 by ZAntonyH

how to use mla C++ API [C++ compilation error ]

#963 opened Mar 20, 2025 by paperHZ

Question about which kernel to use for non-squared mask

#959 opened Mar 19, 2025 by RobeZH

CUDA error: too many blocks in cooperative launch(72) when use multi GPU

#938 opened Mar 13, 2025 by Zzzer0o

batchsize dim for flashinfer api

#927 opened Mar 10, 2025 by arthursunbao

“BatchPrefillWithPagedKVCache failed with error an illegal memory access was encountered” when I use a cuda that is not cuda:0

#924 opened Mar 9, 2025 by wfloveiu

Anyway to build flashinfer with pytorch 2.7.0 and cuda 12.8

#922 opened Mar 8, 2025 by ryderhwang

RMSNorm failed with error code no kernel image is available for execution on the device

#920 opened Mar 7, 2025 by nanmi

[Bugfix] CUDAGraph Compatibility in AppendPagedKVCacheKernel for Variable-Length Inputs

#919 opened Mar 7, 2025 by SungBalance

ValueError: Invalid mode: forward_mode=<ForwardMode.TARGET_VERIFY: 6>

#916 opened Mar 6, 2025 by tchaton

flashinfer.prefill.single_prefill_with_kv_cache meets error when running on A100

#915 opened Mar 6, 2025 by FengzhuoZhang

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly