feat(cuda): Implement custom TVM schedule for fused QKV-split and RoPE #3397

Chandan-Sugreevu · 2025-12-11T20:11:02Z

Introduces a high-performance custom TVM schedule for the combined QKV-split and Rotary Positional Embedding (RoPE) operation.

This optimization forces the entire computation to run within a single fused CUDA kernel, significantly reducing kernel launch overhead and improving memory access patterns for Llama-style models on GPU.

feat(cuda): Implement custom TVM schedule for fused QKV-split and RoPE

35193c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(cuda): Implement custom TVM schedule for fused QKV-split and RoPE #3397

feat(cuda): Implement custom TVM schedule for fused QKV-split and RoPE #3397

Uh oh!

Chandan-Sugreevu commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(cuda): Implement custom TVM schedule for fused QKV-split and RoPE #3397

Are you sure you want to change the base?

feat(cuda): Implement custom TVM schedule for fused QKV-split and RoPE #3397

Uh oh!

Conversation

Chandan-Sugreevu commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant