Skip to content

Commit 00ba0ae

Browse files
timzsuZhengyuan
and
Zhengyuan
authored
bugfix: Align KV chunk size binary search with actual KV chunk splitting. (#728)
Close #726. Alignes KV chunk size binary search with the real strategy so that the resulting `kv_chunk_size` would yield correct `new_batch_size`. Co-authored-by: Zhengyuan <[email protected]>
1 parent 13de896 commit 00ba0ae

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

include/flashinfer/attention/scheduler.cuh

+2-1
Original file line numberDiff line numberDiff line change
@@ -96,12 +96,13 @@ inline auto PrefillBinarySearchKVChunkSize(const bool enable_cuda_graph,
9696

9797
int64_t low = min_kv_chunk_size;
9898
int64_t high = max_kv_len;
99+
constexpr int64_t min_kv_len = 1;
99100
while (low < high) {
100101
const int64_t mid = (low + high) / 2;
101102
int64_t new_batch_size = 0;
102103
for (uint32_t i = 0; i < batch_size; ++i) {
103104
new_batch_size +=
104-
ceil_div(packed_qo_len_arr[i], qo_chunk_size) * ceil_div(kv_len_arr[i], mid);
105+
ceil_div(packed_qo_len_arr[i], qo_chunk_size) * ceil_div(std::max(kv_len_arr[i], min_kv_len), mid);
105106
}
106107
if (new_batch_size > max_batch_size_if_split) {
107108
low = mid + 1;

0 commit comments

Comments
 (0)