Fix onnx Attention and torch SDPA quantization handling #3751

ruro · 2025-11-21T08:23:31Z

Changes

added ONNXAttentionMetatype for the opset 23 Attention ONNX node
fixed scaled_dot_product_attention quantization in torch2 for the case when Q, K and V are parallel edges coming from the same input node

Reason for changes

See #3750

Related tickets

Fixes #3750

Tests

tests/onnx/quantization/test_graphs.py::test_synthetic_models_graph[AttentionModel]
tests/torch2/function_hook/quantization/test_quantized_graphs.py::test_quantized_graphs[unbind_scaled_dot_product_attention_model]

ruro · 2025-11-21T10:41:50Z

Hm. iirc onnx added support for opset 23 in version 1.18.0. So the new test is currently failing in CI due to

onnx==1.17.0; python_version < '3.13'
onnx==1.18.0; python_version >= '3.13'

Do you have any preferences if I should mark this test as

@pytest.mark.skipif(
    version.parse(onnx.__version__) < version.parse("1.18.0"),
    reason="Opset 23 was added in onnx 1.18.0",
)

or bump the version or something else?

andrey-churkin · 2025-11-25T09:27:08Z

Hm. iirc onnx added support for opset 23 in version 1.18.0. So the new test is currently failing in CI due to
onnx==1.17.0; python_version < '3.13'
onnx==1.18.0; python_version >= '3.13'
Do you have any preferences if I should mark this test as
@pytest.mark.skipif(
    version.parse(onnx.__version__) < version.parse("1.18.0"),
    reason="Opset 23 was added in onnx 1.18.0",
)
or bump the version or something else?

Hi @ruro, thanks for your contribution. We currently support multiple versions of ONNX, and the Attention operator was added in opset 23, which corresponds to ONNX 1.18.0. I believe we should run this test only for ONNX versions >= 1.18.0.

andrey-churkin · 2025-11-28T11:04:24Z

src/nncf/common/insertion_point_graph.py

+                input_port_ids = [input_edge.input_port_id] + input_edge.parallel_input_port_ids
+                node_name = nncf_node.node_name
+                for input_port_id in input_port_ids:
+                    allowed_pre_hook_insertion_points.append(PreHookInsertionPoint(node_name, input_port_id))


@ruro Could you please briefly explain why these changes are necessary?

@daniil-lyakhov Please take a look

I've outlined my reasoning in the last two comments in #3750. The short version is that parallel edges aren't directly represented in the PTNNCFGraph (because it's not a Multi graph and doesn't allow repeated edges), but are instead stored in the parallel_input_port_ids property.

In this case, unbind has 3 outputs that are passed as q, k and v inputs of the sdpa node. Each of these 3 edges should be considered separately for the purposes of quantizer insertion/propagation, but the previous logic only added insertion points for "real" edges, ignoring any extra parallel edges.

Let me know if anything is unclear.

Great contribution! Could you please share a netron/ nncf graph visualization of the brand new supported subgraph? (nncf graph visualization api: https://github.com/openvinotoolkit/nncf/blob/develop/src/nncf/common/graph/graph.py#L611-L613)

This part of the code is the core logic of the NNCF, we need to figure out all possible side effects of this change

I am not sure, what you mean by "netron/nncf graph". The second image in the PR body is the expected graph for unbind+sdpa after applying quantization. Does that work?

Also, here are the before and after graphs, obtained by performing a torch.onnx.export (with opset_version=23) of a quantized timm.layers.attention.Attention module:

(The edge without the q/dq nodes is the V input of Attention as expected)

ruro force-pushed the fix_onnx_attention_torch_sdpa_handling branch from 734ba64 to 791f962 Compare November 21, 2025 09:26

github-actions bot added the NNCF ONNX Pull requests that updates NNCF ONNX label Nov 21, 2025

ruro marked this pull request as ready for review November 21, 2025 09:36

ruro requested a review from a team as a code owner November 21, 2025 09:36

MaximProshin requested review from AlexanderDokuchaev, andrey-churkin and ljaljushkin and removed request for ljaljushkin November 21, 2025 10:37

andrey-churkin self-assigned this Nov 25, 2025

ruro added 5 commits November 27, 2025 14:40

add AttentionModel onnx quantization test

af1d5c8

add ONNXAttentionMetatype

10b9086

add UnbindScaledDotProductModel torch2 quantization test

c111743

handle parallel inputs during default PreHookInsertionPoint generation

d120de5

support variable ONNX opset versions in synthetic ONNXReferenceModels

9af3bfb

ruro force-pushed the fix_onnx_attention_torch_sdpa_handling branch from 791f962 to 9af3bfb Compare November 27, 2025 11:41

andrey-churkin approved these changes Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix onnx Attention and torch SDPA quantization handling #3751

Fix onnx Attention and torch SDPA quantization handling #3751

Uh oh!

ruro commented Nov 21, 2025 •

edited

Loading

Uh oh!

ruro commented Nov 21, 2025 •

edited

Loading

Uh oh!

andrey-churkin commented Nov 25, 2025

Uh oh!

andrey-churkin Nov 28, 2025

Uh oh!

ruro Nov 28, 2025 •

edited

Loading

Uh oh!

daniil-lyakhov Nov 28, 2025 •

edited

Loading

Uh oh!

ruro Nov 28, 2025

Uh oh!

ruro Nov 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix onnx Attention and torch SDPA quantization handling #3751

Are you sure you want to change the base?

Fix onnx Attention and torch SDPA quantization handling #3751

Uh oh!

Conversation

ruro commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason for changes

Related tickets

Tests

Uh oh!

ruro commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrey-churkin commented Nov 25, 2025

Uh oh!

andrey-churkin Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

ruro Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniil-lyakhov Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruro Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

ruro Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ruro commented Nov 21, 2025 •

edited

Loading

ruro commented Nov 21, 2025 •

edited

Loading

ruro Nov 28, 2025 •

edited

Loading

daniil-lyakhov Nov 28, 2025 •

edited

Loading

ruro Nov 28, 2025 •

edited

Loading