Skip to content
Open
Show file tree
Hide file tree
Changes from 77 commits
Commits
Show all changes
213 commits
Select commit Hold shift + click to select a range
3c6e10f
[ADD] Paged Attention Reference impl
PiotrKrzem Jan 21, 2025
e6faae8
[ADD] Testing suite, add missing params to PagedAttn, fix KV caching
PiotrKrzem Feb 4, 2025
6d25c4a
[FIX] Merge conflict
PiotrKrzem Feb 4, 2025
52d4b63
Merge branch 'master' into feature/paged_reference
PiotrKrzem Feb 4, 2025
fbc22b0
[ADD] Update reference to correctly compute both outputs with RoPE ro…
PiotrKrzem Feb 27, 2025
69d4796
[FIX] Remove staged artifacts
PiotrKrzem Feb 27, 2025
d3b31fc
Merge branch 'master' into feature/paged_reference
PiotrKrzem Feb 28, 2025
b8c9742
[FIX] Add RoPE to testing suite
PiotrKrzem Feb 28, 2025
687d6aa
[FIX] Add missing test cases:
PiotrKrzem Feb 28, 2025
0986e6d
[FIX] Test build
PiotrKrzem Feb 28, 2025
dfe28b0
[FIX] Single op graph
PiotrKrzem Feb 28, 2025
c582c82
[FIX] Visitor test
PiotrKrzem Feb 28, 2025
abbea22
[FIX] Use reference_tests::Tensor in tests
PiotrKrzem Feb 28, 2025
6ccf30d
[FIX] test case name:
PiotrKrzem Feb 28, 2025
a9ebf4f
[FIX] Separate extension, update dependencies
PiotrKrzem Feb 28, 2025
7c2c449
[FIX] Compilation errrors
PiotrKrzem Feb 28, 2025
fc38fcb
[FIX] Refactor for unit testing
PiotrKrzem Mar 4, 2025
3e41603
[FIX] Re-add 40 tests with computed RoPE
PiotrKrzem Mar 4, 2025
4cbb5fd
[FIX] Remove from ops16, apply review comments
PiotrKrzem Mar 4, 2025
ad45c38
[FIX] Build errors from refactor
PiotrKrzem Mar 4, 2025
b0d4c0c
[FIX] Inline funcs to supress warning
PiotrKrzem Mar 4, 2025
ddd129e
[FIX] Clang
PiotrKrzem Mar 4, 2025
5c21ba1
[FIX] Comparison dtype error
PiotrKrzem Mar 4, 2025
1c12a5d
[FIX] Add unit funcs to named namespace
PiotrKrzem Mar 4, 2025
8091215
[FIX] Rename namespace
PiotrKrzem Mar 4, 2025
e1ff55d
[FIX] clang
PiotrKrzem Mar 4, 2025
7bf188a
[FIX] template func lookup
PiotrKrzem Mar 4, 2025
7f0acd1
Merge branch 'master' into feature/paged_reference
PiotrKrzem Mar 4, 2025
6a22c16
[FIX] Add func to headers to avoid unused warn
PiotrKrzem Mar 5, 2025
2d460e0
[FIX] GPU build namespace err
PiotrKrzem Mar 5, 2025
4aa34e5
Merge branch 'master' into feature/paged_reference
mlukasze Mar 5, 2025
1626784
[FIX] Explicitly call ref func
PiotrKrzem Mar 5, 2025
c90519c
[FIX] Clang
PiotrKrzem Mar 5, 2025
80cb9a9
[FIX] Param list err
PiotrKrzem Mar 5, 2025
08c7a1f
[FIX] Params err pt2
PiotrKrzem Mar 5, 2025
64a0fd2
[FIX] Review comments exc internal namespace
PiotrKrzem Mar 5, 2025
8225f28
[FIX] Remove internal namespace
PiotrKrzem Mar 5, 2025
d6ee9ef
[FIX] Remove PagedAttn from internal opset
PiotrKrzem Mar 5, 2025
cf3e05a
[FIX] Remove from internal namespace pt2
PiotrKrzem Mar 5, 2025
933c596
[FIX} Cleanup of v16 and remaining artifacts
PiotrKrzem Mar 5, 2025
9d6c9a1
Merge branch 'master' into feature/paged_reference
mlukasze Mar 6, 2025
145fbd3
Merge branch 'master' into feature/paged_reference
PiotrKrzem Mar 6, 2025
4d1cf62
[FIX] Tests
PiotrKrzem Mar 6, 2025
b679e8f
Merge branch 'feature/paged_reference' of https://github.com/PiotrKrz…
PiotrKrzem Mar 6, 2025
4087530
Update src/core/reference/include/openvino/reference/paged_attention.hpp
mmikolajcz Mar 6, 2025
4184529
Merge branch 'openvinotoolkit:master' into feature/paged_reference
PiotrKrzem Mar 9, 2025
97033ac
Update src/core/src/op/paged_attention.cpp
PiotrKrzem Mar 9, 2025
b6015fd
Update src/core/src/op/paged_attention.cpp
PiotrKrzem Mar 9, 2025
8709d3a
[FIX] Minor fixes to tests and ref
PiotrKrzem Mar 9, 2025
2818126
[FIX] Build testing suite
PiotrKrzem Mar 9, 2025
c70fe1d
[FIX] Multiplies initial value
PiotrKrzem Mar 9, 2025
a437d1d
Fix some issues with reference test cases. Some issues are still ther…
mmikolajcz Mar 10, 2025
0b5aeaf
Initial draft of functional shared single layer tests for PagedAttention
mmikolajcz Mar 20, 2025
b018c5a
Improve PagedAttention test structure and test case naming
mmikolajcz Mar 21, 2025
1624be2
Apply changes to reference impl
mmikolajcz Mar 21, 2025
0e649d4
Apply requested changes
mmikolajcz Mar 21, 2025
eb85af4
Add scale func tests
mmikolajcz Mar 21, 2025
4dbb868
Merge branch 'master' into feature/paged_reference
PiotrKrzem Mar 25, 2025
0de5177
[FIX] k,v heads, fixed alibi formula, cache copy, minor review fixes
PiotrKrzem Mar 25, 2025
fa43f64
[FIX] Struct for params, ultimate code purify
PiotrKrzem Mar 30, 2025
a627646
[FIX] int32 build errors
PiotrKrzem Mar 31, 2025
b132ee2
[FIX] size_t iont32_t mismatch fix
PiotrKrzem Mar 31, 2025
f64ee26
[FIX] Key int32_t error
PiotrKrzem Mar 31, 2025
f948bfd
[FIX] Tests compilation vec2str
PiotrKrzem Mar 31, 2025
2a360e3
Split k and v head size
mmikolajcz Apr 15, 2025
03dcf55
[ADD] Cache manager simulation, eviction, 2 new inputs, 2 new outputs…
PiotrKrzem May 5, 2025
fdf61ca
Merge branch 'master' into feature/paged_reference
PiotrKrzem May 5, 2025
d33e037
[FIX] Build bugfix
PiotrKrzem May 5, 2025
9375a33
[FIX] sliding_window unused
PiotrKrzem May 5, 2025
6484555
[FIX] 5th output build errors
PiotrKrzem May 5, 2025
d25e073
[FIX] Type prop tests with new inputs
PiotrKrzem May 5, 2025
fb44591
[FIX] Compatibility rank checks, minor code fixes, tests classes fixes
PiotrKrzem May 12, 2025
38fcda4
Merge branch 'master' into feature/paged_reference
PiotrKrzem May 12, 2025
b94f552
[FIX] Type prop tests after merge
PiotrKrzem May 12, 2025
16825a3
[FIX] Namespace error
PiotrKrzem May 12, 2025
9094308
[FIX] Namespace error
PiotrKrzem May 12, 2025
ae2c84c
[FIX] Clang
PiotrKrzem May 12, 2025
14f9c31
[REVERT] Revert 5 outputs, review comments
PiotrKrzem May 30, 2025
100321f
[ADD] Debug prints, new requested test cases
PiotrKrzem May 30, 2025
dc31bb9
[FIX] Commented tests for clarity:
PiotrKrzem Jun 18, 2025
fcfd3e3
Merge branch 'master' into feature/paged_reference
PiotrKrzem Jul 21, 2025
c8b8010
[WIP] Add cache manager on apr with genai
PiotrKrzem Jul 21, 2025
1cea130
Merge branch 'master' into feature/paged_reference
mlukasze Jul 23, 2025
f682490
[FIX] Compilation errors
PiotrKrzem Jul 30, 2025
5af811d
[FIX] Cache eviction with working block logic
PiotrKrzem Aug 1, 2025
49b0316
[ADD] Inserter of cache into models, replace all key_cache. and value…
PiotrKrzem Aug 5, 2025
ec0d3d6
[FIX] Rewire and improve compiled model and sync infer request for ca…
PiotrKrzem Aug 7, 2025
b1a04bb
[FIX] Clean code, minor logic fixes
PiotrKrzem Aug 8, 2025
657d401
[FIX] CompiledModel dependency
PiotrKrzem Aug 8, 2025
0998633
[FIX] Build errors
PiotrKrzem Aug 8, 2025
adb2d81
[FIX] Clang
PiotrKrzem Aug 10, 2025
ba6de21
[FIX] Remove relocation artifact
PiotrKrzem Aug 11, 2025
c26b31a
Merge branch 'master' into feature/paged_reference
PiotrKrzem Aug 11, 2025
5a4d8db
[FIX] set_out
PiotrKrzem Aug 11, 2025
7dbfc79
Merge branch 'master' into feature/paged_reference
PiotrKrzem Aug 11, 2025
36752fe
[FIX] Shape inference
PiotrKrzem Aug 13, 2025
826f550
Merge branch 'feature/paged_reference' of https://github.com/PiotrKrz…
PiotrKrzem Aug 18, 2025
10a68e2
Merge branch 'master' into feature/paged_reference
PiotrKrzem Aug 18, 2025
5d30421
Merge branch 'master' into feature/paged_reference
PiotrKrzem Aug 26, 2025
6123418
[FIX] Name to iName to index insertion of cache
PiotrKrzem Aug 26, 2025
155373b
[ADD/FIX] Tests for CM, fix building errors, clang
PiotrKrzem Aug 27, 2025
e2a7678
[FIX] Gods of Cmake please let this work
PiotrKrzem Aug 28, 2025
52607db
[FIX] Clang
PiotrKrzem Aug 28, 2025
5fab303
[FIX] Android build error
PiotrKrzem Aug 28, 2025
4fe3c45
[FIX] Android build pt2
PiotrKrzem Aug 28, 2025
c02e65d
[FIX] Cmake pt 2
PiotrKrzem Aug 28, 2025
b8961af
Merge branch 'master' into feature/paged_reference
PiotrKrzem Aug 29, 2025
4875de5
[FIX] Android clang error pt3
PiotrKrzem Sep 1, 2025
e75886c
git pushMerge branch 'feature/paged_reference' of https://github.com/…
PiotrKrzem Sep 1, 2025
bdaad83
Merge branch 'master' into feature/paged_reference
PiotrKrzem Sep 2, 2025
5cfef8c
Merge branch 'master' into feature/paged_reference
PiotrKrzem Sep 3, 2025
a0d1257
Merge branch 'master' into feature/paged_reference
PiotrKrzem Sep 9, 2025
fda736b
Merge branch 'master' into feature/paged_reference
mlukasze Sep 16, 2025
1a5ec8d
[WIP][ADD] CacheManager version 2 with globally managed single memory…
PiotrKrzem Sep 26, 2025
b2135ef
Merge branch 'master' into feature/paged_reference
PiotrKrzem Sep 29, 2025
0fd4677
[FIX] Remove redundane cache
PiotrKrzem Sep 30, 2025
7552cce
[FIX] Move reference cache to core
PiotrKrzem Sep 30, 2025
f40f1f0
[FIX] PagedCache build errors pt1
PiotrKrzem Sep 30, 2025
0578a3d
Merge branch 'feature/paged_reference' of https://github.com/PiotrKrz…
PiotrKrzem Sep 30, 2025
41e6a39
Merge branch 'master' into feature/paged_reference
PiotrKrzem Sep 30, 2025
c2db68a
Merge branch 'master' into feature/paged_reference
mlukasze Oct 3, 2025
086f358
[FIX] Use node as the key ID, fix memory access error, fix build erro…
PiotrKrzem Oct 4, 2025
cc14b54
Merge branch 'feature/paged_reference' of https://github.com/PiotrKrz…
PiotrKrzem Oct 4, 2025
33fa48d
[FIX] Build fixes pt 3
PiotrKrzem Oct 4, 2025
e06a603
[FIX] Review comments, remove Ref tests for CPU tests, fix liner errors
PiotrKrzem Oct 7, 2025
3abb9f5
[ADD] Ref vs CPU test
PiotrKrzem Oct 7, 2025
7f0eff3
Merge branch 'master' into feature/paged_reference
PiotrKrzem Oct 8, 2025
23fc025
[FIX] Linker errors from circular dependencies, cache uninitialized e…
PiotrKrzem Oct 10, 2025
2ef1e13
git pushMerge branch 'feature/paged_reference' of https://github.com/…
PiotrKrzem Oct 10, 2025
4eea05a
[FIX] Compile node error
PiotrKrzem Oct 10, 2025
f66f366
[FIX] Build error with new ID
PiotrKrzem Oct 12, 2025
e141585
[FIX] Link PCM to OV_API
PiotrKrzem Oct 12, 2025
894881c
[FIX] Shape infer with new inputs
PiotrKrzem Oct 13, 2025
b9c0938
[FIX] Arguemnts list error
PiotrKrzem Oct 13, 2025
3d87b42
[ADD] Debug message for C++ shape infer
PiotrKrzem Oct 13, 2025
d032535
[FIX] Macos whitespace
PiotrKrzem Oct 14, 2025
50374ba
[ADD] Debug flags for PA inputs
PiotrKrzem Oct 14, 2025
af5d423
[FIX] Debug prints
PiotrKrzem Oct 14, 2025
5a6aa18
[FIX] More debug prints
PiotrKrzem Oct 14, 2025
08a5e27
[FIX] Even more debug prints
PiotrKrzem Oct 14, 2025
bd00bbe
Merge branch 'master' into feature/paged_reference
PiotrKrzem Oct 14, 2025
e27262c
[FIX] 21 inputs shape infer error
PiotrKrzem Oct 14, 2025
dc87a3d
[FIX] Tensor accessor
PiotrKrzem Oct 14, 2025
c6c4183
[FIX] Remove check for static shape for past lens
PiotrKrzem Oct 14, 2025
287b160
[FIX] Allow 2-5 rank cache, limit to 4 rank for ref
PiotrKrzem Oct 14, 2025
4f4d435
Merge branch 'master' into feature/paged_reference
PiotrKrzem Oct 20, 2025
94e1653
Update attach_cache_manager_to_paged_attention.cpp
PiotrKrzem Oct 28, 2025
8f7faf4
Update attach_cache_manager_to_paged_attention.hpp
PiotrKrzem Oct 28, 2025
5bdd66c
Update paged_attention.hpp
PiotrKrzem Oct 28, 2025
7b3fd7d
Update paged_attention.hpp
PiotrKrzem Oct 28, 2025
4b4e302
Update paged_cache_manager.cpp
PiotrKrzem Oct 28, 2025
537ea9b
Update paged_attention.hpp
PiotrKrzem Oct 28, 2025
41d69c2
Merge branch 'master' into feature/paged_reference
PiotrKrzem Oct 28, 2025
e4a143c
Merge branch 'master' into feature/paged_reference
PiotrKrzem Oct 28, 2025
1e380f8
Merge branch 'master' into feature/paged_reference
PiotrKrzem Oct 30, 2025
fb4b9f7
[FIX] Force undo changes, fix without namespace
PiotrKrzem Nov 3, 2025
c884ca2
[FIX] Clang
PiotrKrzem Nov 3, 2025
144d549
Update simplify_shape_of_sub_graph.hpp
PiotrKrzem Nov 3, 2025
89df626
Merge branch 'master' into feature/paged_reference
PiotrKrzem Nov 3, 2025
12d874c
[DEBUG] Temporary revert of changes to check conditional compilation CI
PiotrKrzem Nov 4, 2025
3258736
Merge branch 'master' into feature/paged_reference
PiotrKrzem Nov 4, 2025
c51c607
[FIX] Double down by style aligning to other common opt
PiotrKrzem Nov 5, 2025
beb5c11
Merge branch 'feature/paged_reference' of https://github.com/PiotrKrz…
PiotrKrzem Nov 5, 2025
6675ee3
[FIX] Namespace change fix for CM
PiotrKrzem Nov 5, 2025
5aafcbe
[FIX] Style
PiotrKrzem Nov 5, 2025
03d2cfe
[FIX] Ref uses util PCM
PiotrKrzem Nov 5, 2025
1a95889
Try fix CC build 1
praasz Nov 20, 2025
6f92adb
Fix CC build 2
praasz Nov 20, 2025
8e6cbaa
Try fix CC build 3
praasz Nov 20, 2025
0325f02
[WIP][FIX] Review suggestions pt1
PiotrKrzem Nov 24, 2025
a905c13
[WIP][FIX] Review suggestions pt2
PiotrKrzem Nov 24, 2025
3a17578
[WIP][FIX] Review suggestions pt3
PiotrKrzem Nov 24, 2025
762f8e8
[WIP][FIX] Clang
PiotrKrzem Nov 24, 2025
c4cf062
[WIP][FIX] Convert fix, ov alignedbuffer introduction
PiotrKrzem Dec 1, 2025
04dea11
[WIP][FIX] Clang
PiotrKrzem Dec 1, 2025
f382256
Merge branch 'master' into feature/paged_reference
PiotrKrzem Dec 2, 2025
5a785a0
Merge branch 'master' into feature/paged_reference
PiotrKrzem Dec 3, 2025
f8c21be
Merge branch 'master' into feature/paged_reference
mlukasze Dec 3, 2025
f69e0e2
Merge branch 'master' into feature/paged_reference
PiotrKrzem Dec 3, 2025
e1f1d46
Update src/tests/functional/base_func_tests/src/base/utils/generate_i…
PiotrKrzem Dec 4, 2025
275824d
[FIX][WIP] Resolve remaining majority of issues, blocked by relocatio…
PiotrKrzem Dec 4, 2025
f40b062
[FIX] Style
PiotrKrzem Dec 4, 2025
df9940d
Merge branch 'feature/paged_reference' of https://github.com/PiotrKrz…
PiotrKrzem Dec 4, 2025
198b0fb
[FIX][WIP] Opaque ptr, conversions, minor fixes
PiotrKrzem Dec 9, 2025
2962170
[FIX] Clang
PiotrKrzem Dec 9, 2025
5cd5b22
Merge branch 'master' into feature/paged_reference
PiotrKrzem Dec 9, 2025
b64dda2
Merge branch 'master' into feature/paged_reference
PiotrKrzem Dec 18, 2025
b978c8c
[WIP][FIX] Build
PiotrKrzem Dec 18, 2025
6face9e
[FIX] Clang
PiotrKrzem Dec 18, 2025
f833c4b
Merge branch 'master' into feature/paged_reference
PiotrKrzem Dec 18, 2025
652f72c
Merge branch 'master' into feature/paged_reference
PiotrKrzem Jan 15, 2026
9fc5ec3
[FIX] CPU Ref tests XAttn
PiotrKrzem Jan 15, 2026
97cb81b
[FIX] Shape inference of 3rd shape, CPU tests
PiotrKrzem Jan 20, 2026
b5d173e
Merge branch 'master' into feature/paged_reference
PiotrKrzem Jan 28, 2026
f6f20ae
[FIX] C4273 build error
PiotrKrzem Jan 28, 2026
942696c
[FIX] Clang
PiotrKrzem Jan 28, 2026
d9fa322
[FIX] Clang pt2
PiotrKrzem Jan 28, 2026
0732b5e
[FIX] Clang3
PiotrKrzem Jan 28, 2026
e15e1f2
[FIX][WIP] CPU reference tests
PiotrKrzem Jan 30, 2026
152fb68
[FIX][WIP] Fix for cache management pt2 for CPU func tests
PiotrKrzem Feb 3, 2026
9e2576c
[FIX][WIP] Disable SDPA transformation for Ref comparison
PiotrKrzem Feb 3, 2026
89f8ec1
[FIX][WIP] Fix shape building and simplify code for PA CPU tests
PiotrKrzem Feb 3, 2026
ac05f85
[FIX] Shape inference critical error for dynamic evictable sizes
PiotrKrzem Feb 4, 2026
6481a3a
Merge branch 'master' into feature/paged_reference
PiotrKrzem Feb 4, 2026
1320ec6
[FIX] Rewrite tests for clear comparison
PiotrKrzem Feb 4, 2026
c680446
Merge branch 'feature/paged_reference' of https://github.com/PiotrKrz…
PiotrKrzem Feb 4, 2026
c89e8ea
[FIX] Revert old test class to master
PiotrKrzem Feb 4, 2026
956751c
[FIX] C4273 fix for Windows build pt2
PiotrKrzem Feb 4, 2026
0a21789
[FIX] Provide void handle definiton inclass
PiotrKrzem Feb 4, 2026
f24bdda
[FIX] Clang, test fixture includes
PiotrKrzem Feb 4, 2026
85ada87
[FIX] Neutralize quantization transformation for KV cache
PiotrKrzem Feb 4, 2026
5385ca9
[FIX] Test only statis data types:
PiotrKrzem Feb 4, 2026
2e94de1
Merge branch 'openvinotoolkit:master' into feature/paged_reference
PiotrKrzem Feb 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
_OPENVINO_OP_REG(AUGRUCell, ov::op::internal)
_OPENVINO_OP_REG(AUGRUSequence, ov::op::internal)
_OPENVINO_OP_REG(RMS, ov::op::internal)
_OPENVINO_OP_REG(PagedAttentionExtension, ov::op)
113 changes: 110 additions & 3 deletions src/core/dev_api/openvino/op/paged_attention.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,127 @@
namespace ov {
namespace op {

// This is an experimental operation that is implemented in the plugins.
// Do not use in user applications, backward compatibility is not guaranteed in future releases.
/// \brief PagedAttentionExtension operation implements paged attention for memory-efficient sequence processing.
///
/// \ingroup ov_ops_cpp_api
///
/// This operation computes attention using a paged memory model, allowing efficient handling of long sequences.
class OPENVINO_API PagedAttentionExtension : public ov::op::Op {
public:
OPENVINO_OP("PagedAttentionExtension");

PagedAttentionExtension() = default;

/// \brief Constructs a PagedAttentionExtension operation.
///
/// \param args Input arguments vector containing:
/// - query
/// - key
/// - value
/// - key_cache
/// - value_cache
/// - past_lens
/// - subsequence_begins
/// - block_indices
/// - block_indices_begins
/// - (optional) scale
/// - (optional) sliding_window
/// - (optional) alibi_slopes
/// - max_context_len
/// - (optional) rotated_block_indices
/// - (optional) rotation_deltas
/// - (optional) rotation_trig_lut
/// - free_block_indices
/// - max_blocks
PagedAttentionExtension(const ov::OutputVector& args);

/// \brief Constructs a PagedAttentionExtension operation. (13 parameter constructor)
///
/// \param query Query tensor.
/// \param key Key tensor.
/// \param value Value tensor.
/// \param key_cache Cached key tensor.
/// \param value_cache Cached value tensor.
/// \param past_lens Lengths of past sequences.
/// \param subsequence_begins Subsequence start indices.
/// \param block_indices Indices of memory blocks.
/// \param block_indices_begins Start indices for block indexing.
/// \param scale (Optional) Scaling factor for attention scores.
/// \param sliding_window (Optional) Sliding window size for local attention.
/// \param alibi_slopes (Optional) ALiBi slopes for biasing attention.
/// \param max_context_len Maximum context length.
/// \param free_block_indices Free blocks in cache.
/// \param max_blocks Per-sequence max occupied blocks.
PagedAttentionExtension(const Output<Node>& query,
const Output<Node>& key,
const Output<Node>& value,
const Output<Node>& key_cache,
const Output<Node>& value_cache,
const Output<Node>& past_lens,
const Output<Node>& subsequence_begins,
const Output<Node>& block_indices,
const Output<Node>& block_indices_begins,
const Output<Node>& scale,
const Output<Node>& sliding_window,
const Output<Node>& alibi_slopes,
const Output<Node>& max_context_len,
const Output<Node>& free_block_indices,
const Output<Node>& max_blocks);

/// \brief Constructs a PagedAttentionExtension operation with rotation support. (16 parameter constructor)
///
/// \param query Query tensor.
/// \param key Key tensor.
/// \param value Value tensor.
/// \param key_cache Cached key tensor.
/// \param value_cache Cached value tensor.
/// \param past_lens Lengths of past sequences.
/// \param subsequence_begins Subsequence start indices.
/// \param block_indices Indices of memory blocks.
/// \param block_indices_begins Start indices for block indexing.
/// \param scale (Optional) Scaling factor for attention scores.
/// \param sliding_window (Optional) Sliding window size for local attention.
/// \param alibi_slopes (Optional) ALiBi slopes for biasing attention.
/// \param max_context_len Maximum context length.
/// \param rotated_block_indices (Optional) Rotated block indices.
/// \param rotation_deltas (Optional) Rotation deltas.
/// \param rotation_trig_lut (Optional) Rotation trig lookup table.
/// \param free_block_indices Free blocks in cache.
/// \param max_blocks Per-sequence max occupied blocks.
PagedAttentionExtension(const Output<Node>& query,
const Output<Node>& key,
const Output<Node>& value,
const Output<Node>& key_cache,
const Output<Node>& value_cache,
const Output<Node>& past_lens,
const Output<Node>& subsequence_begins,
const Output<Node>& block_indices,
const Output<Node>& block_indices_begins,
const Output<Node>& scale,
const Output<Node>& sliding_window,
const Output<Node>& alibi_slopes,
const Output<Node>& max_context_len,
const Output<Node>& rotated_block_indices,
const Output<Node>& rotation_deltas,
const Output<Node>& rotation_trig_lut,
const Output<Node>& free_block_indices,
const Output<Node>& max_blocks);

void validate_and_infer_types() override;
std::shared_ptr<ov::Node> clone_with_new_inputs(const ov::OutputVector& new_args) const override;

/// \brief Gets the output element type at the specified index.
const ov::element::Type get_out_type(int index) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is required?
The getters/setters should be for attributes?
is not same as node::get_output_element_type()


/// \brief Sets the output element type at the specified index.
void set_out_type(int index, const ov::element::Type& output_type);

protected:
std::vector<ov::element::Type> m_output_type = {ov::element::dynamic, ov::element::dynamic};
std::vector<ov::element::Type> m_output_type = {ov::element::dynamic,
ov::element::dynamic,
ov::element::i32,
ov::element::i32,
ov::element::i32};
};

} // namespace op
Expand Down
Loading
Loading