-
Notifications
You must be signed in to change notification settings - Fork 250
Pull requests: vllm-project/vllm-ascend
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Update README.zh.md to fix typo
documentation
Improvements or additions to documentation
#1758
opened Jul 12, 2025 by
Yikun
Loading…
[bugfix] fix deepseek bug when tp_size == 1
module:ops
#1755
opened Jul 11, 2025 by
zzzzwwjj
Loading…
[0.9.1][Perf] Port MLA multistream optimazition and prefetch to v0.9.1
module:core
#1750
opened Jul 11, 2025 by
whx-sjtu
Loading…
[V0.9.1] torchair_graph bugfix when chunked_prefill is true
#1748
opened Jul 11, 2025 by
fems14
Loading…
[V0.9.1] Add support for flashcomm_v1 in Qwen2.5
module:core
module:ops
#1745
opened Jul 11, 2025 by
rjg-lyh
Loading…
flashcomm3 multi stream of moe layer
merge-conflicts
module:core
module:ops
module:quantization
#1744
opened Jul 11, 2025 by
wyhhyw123
Loading…
[Platform] Add support for Altlas A3 series
ci/build
module:core
#1740
opened Jul 11, 2025 by
wxsIcey
Loading…
Optimization of TP4 Parallelism in DeepSeek MLP Dense Layers
#1738
opened Jul 11, 2025 by
zhanghw0354
Loading…
[Doc] Add model costomization doc
documentation
Improvements or additions to documentation
#1737
opened Jul 11, 2025 by
shen-shanshan
Loading…
[2/N] Enable shellcheck and pymarkdown for lint system
documentation
Improvements or additions to documentation
module:tests
module:tools
#1735
opened Jul 11, 2025 by
Potabk
Loading…
[Test] Remove VLLM_USE_V1 in example and tests
module:tests
#1733
opened Jul 11, 2025 by
wangxiyuan
Loading…
[Perf] Reduce memory usage by splitting tokens in fused_experts
documentation
Improvements or additions to documentation
module:core
module:ops
module:quantization
module:tests
ready
read for review
#1729
opened Jul 10, 2025 by
ApsarasX
Loading…
[V0.9.1] add support for flashcomm2 in qwen3
module:core
#1726
opened Jul 10, 2025 by
David9857
Loading…
[BUGFIX] [v0.9.1-dev] Obtain the NPU ID of non-consecutive NPU cards
#1724
opened Jul 10, 2025 by
yangqinghao-cmss
Loading…
[V0.9.1] optimize rope in qwen3
module:ops
module:tests
#1719
opened Jul 10, 2025 by
David9857
Loading…
[v0.9.1]add rot_pos_emb()/get_window_index()/_process_image_input() to qwen2.5_vl_without_padding
#1705
opened Jul 9, 2025 by
zheliuyu
Loading…
[V0.9.1] Replace FA interface with FA_V2 to optimize perf in SelfAttention
#1701
opened Jul 9, 2025 by
rjg-lyh
Loading…
[0.9.1]feat: Qwen3-dense model support dual-batch overlap(dbo)
#1699
opened Jul 9, 2025 by
ZhaoJiangJiang
Loading…
[WIP] dynamic eplb
module:core
module:ops
module:quantization
#1697
opened Jul 9, 2025 by
wanghanqingLYT
Loading…
support fa3 quant for v0.9.1-dev
merge-conflicts
module:quantization
module:tests
#1695
opened Jul 9, 2025 by
22dimensions
Loading…
[0.9.1][PD] Added support for delay-free blocks in prefill nodes
#1691
opened Jul 9, 2025 by
underfituu
Loading…
[WIP][Prefill Performance] Parallel Strategy Optimizations (VRAM-for-Speed Tradeoff)
merge-conflicts
module:ops
module:quantization
#1687
opened Jul 9, 2025 by
SlightwindSec
Loading…
[V0 Deprecation] Remove V0 prompt adapter
merge-conflicts
#1683
opened Jul 9, 2025 by
shen-shanshan
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.