Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Change kvcache default type of PagedAttention to u8 for CPU plugin #1206

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

luo-cheng2021
Copy link
Contributor

Change kvcache default type of PagedAttention to u8 for CPU plugin to aligned SDPA behaviour.

@github-actions github-actions bot added category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms labels Nov 13, 2024
// if user sets ov::kv_cache_precision hint
const auto kv_cache_precision_it = plugin_config.find(ov::hint::kv_cache_precision.name());
if (kv_cache_precision_it != plugin_config.end()) {
const auto kv_cache_precision = kv_cache_precision_it->second.as<ov::element::Type>();
m_kv_cache_type = kv_cache_precision;
} else {
// x86 and arm have different default kv cache type
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if plugin_config has accuracy execution hint or inference precision?

Copy link
Contributor Author

@luo-cheng2021 luo-cheng2021 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. For 'EXECUTION_MODE_HINT': 'ACCURACY' the plugin will force f32 precision for kvcache. If inference precision was set to f32, the kvcache is still in default u8 precision.

@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Nov 14, 2024
@ilya-lavrenov
Copy link
Contributor

@luo-cheng2021 could you please also include reverting of #1212 ?
I have hardcoded OpenVINO commit before u8 KV cache migration on CPU to unlock GenAI development.

@github-actions github-actions bot added the category: GHA CI based on Github actions label Nov 14, 2024
@ilya-lavrenov ilya-lavrenov removed the category: sampling Sampling / Decoding algorithms label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: GHA CI based on Github actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants