Skip to content

Add CPUID for AvxVnniInt8 and AvxVnniInt16 #113956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

khushal1996
Copy link
Member

@khushal1996 khushal1996 commented Mar 27, 2025

This PR adds support for CPUID for AVX-VNNI-INT8 & AVX-VNNI-INT16 ISAs

Design

image
image

The changes are made in a way to enable the 2 ISAs when

  1. Avx10.2 is enabled or
  2. CPUID for both ISAs are enabled

This is w.r.t the discussions done in API proposal #112586

Testing

Note1: Emitter unit tests not ran since they are added and verified along with AVX10.2 PR #111209

Note2: Superpmi results are not accurate since we are adding a new CPUID and it leads to a new jiteeversionguid. Even after changing the jiteeversion manually, superpmi run shows errors and failures based on the old mch files which can be ignored.

Run JIT subtree with AVXVNNIINT* enabled / disabled


AVXVNNIINT* Enabled
image

AVXVNNIINT* disabled
image

@khushal1996
Copy link
Member Author

@tannergooding This is first of the 2 PRs needed for AVX VNNI INT* API introduction #112586

@khushal1996 khushal1996 force-pushed the kcm-avxvnniint8-cpuid branch from 141d643 to 98fc970 Compare April 14, 2025 21:40
@khushal1996
Copy link
Member Author

@tannergooding @saucecontrol I have added the CPUID, API surface, JIT handling and template tests here.

@tannergooding tannergooding self-requested a review April 14, 2025 21:44
@tannergooding tannergooding self-assigned this Apr 14, 2025
//
// Return Value:
// The 64-bit only InstructionSet associated with isa
static CORINFO_InstructionSet X64VersionOfIsa(CORINFO_InstructionSet isa)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: All of these could have stayed in the respective hwintrinsicxarch.cpp and hwintrinsicarm64.cpp files

We already implement other Compiler::* methods in such files, so we should've been able to keep the diffs minimal by just changing lookupInstructionSet to Compiler::lookupInstructionSet and similar for lookupIsa

@@ -849,6 +849,17 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
break;
}

case NI_AVXVNNI_MultiplyWideningAndAdd:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could be moved back to minimize the diff/code churn

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minus a couple nits on ways to simplify the diffs.

CC. @dotnet/jit-contrib for secondary review

@EgorBo
Copy link
Member

EgorBo commented Jul 1, 2025

@khushal1996 could you please resolve the merge conflict?

@@ -278,6 +289,23 @@ bool emitter::IsVexEncodableInstruction(instruction ins) const
return emitComp->compSupportsHWIntrinsic(InstructionSet_AVXVNNI);
}

case INS_vpdpwsud:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge conflict is notably highlighting that we missed adding these instructions to the perfScore handling here in emitxarch.

With the new setup, we can just add the latency/throughput info directly as part of the instruction table instead, which makes the process more streamlined for the typical case.

If we don't have exact timings for these yet, then I'd mirror the values we used for the AVX-VNNI instructions instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member linkable-framework Issues associated with delivering a linker friendly framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants