[CPU] Change kvcache default type of PagedAttention to u8 for CPU plugin #1206

luo-cheng2021 · 2024-11-13T12:03:37Z

Change kvcache default type of PagedAttention to u8 for CPU plugin to aligned SDPA behaviour.

ilya-lavrenov · 2024-11-13T21:34:10Z

src/cpp/src/device_config.hpp

            // if user sets ov::kv_cache_precision hint
            const auto kv_cache_precision_it = plugin_config.find(ov::hint::kv_cache_precision.name());
            if (kv_cache_precision_it != plugin_config.end()) {
                const auto kv_cache_precision = kv_cache_precision_it->second.as<ov::element::Type>();
                m_kv_cache_type = kv_cache_precision;
+            } else {
+                // x86 and arm have different default kv cache type


what if plugin_config has accuracy execution hint or inference precision?

Updated. For 'EXECUTION_MODE_HINT': 'ACCURACY' the plugin will force f32 precision for kvcache. If inference precision was set to f32, the kvcache is still in default u8 precision.

ilya-lavrenov · 2024-11-14T08:05:26Z

@luo-cheng2021 could you please also include reverting of #1212 ?
I have hardcoded OpenVINO commit before u8 KV cache migration on CPU to unlock GenAI development.

…fault_u8

This reverts commit 9243a8f.

change kvcache default type to u8 for cpu plugin

ac18dd4

github-actions bot added category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms labels Nov 13, 2024

ilya-lavrenov reviewed Nov 13, 2024

View reviewed changes

use f32 for hint: EXECUTION_MODE_HINT:ACCURACY

ffef13e

ilya-lavrenov added this to the 2025.0 milestone Nov 14, 2024

luo-cheng2021 added 2 commits November 14, 2024 16:10

Merge remote-tracking branch 'upstream/master' into luocheng/pa_kv_de…

b9a05f0

…fault_u8

Revert "[GHA]: hardcode OpenVINO commit (openvinotoolkit#1212)"

9efab9d

This reverts commit 9243a8f.

github-actions bot added the category: GHA CI based on Github actions label Nov 14, 2024

ilya-lavrenov removed the category: sampling Sampling / Decoding algorithms label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Change kvcache default type of PagedAttention to u8 for CPU plugin #1206

[CPU] Change kvcache default type of PagedAttention to u8 for CPU plugin #1206

luo-cheng2021 commented Nov 13, 2024

ilya-lavrenov Nov 13, 2024 •

edited

Loading

luo-cheng2021 Nov 14, 2024 •

edited

Loading

ilya-lavrenov commented Nov 14, 2024

[CPU] Change kvcache default type of PagedAttention to u8 for CPU plugin #1206

Are you sure you want to change the base?

[CPU] Change kvcache default type of PagedAttention to u8 for CPU plugin #1206

Conversation

luo-cheng2021 commented Nov 13, 2024

ilya-lavrenov Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

luo-cheng2021 Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

ilya-lavrenov commented Nov 14, 2024

ilya-lavrenov Nov 13, 2024 •

edited

Loading

luo-cheng2021 Nov 14, 2024 •

edited

Loading