Tweak KleidiAI's FP16 matmul algorithm #4416

Nicoshev · 2025-06-29T00:28:59Z

Summary:
Hoist memory loads from the outer loop

Intention is to prevent these loads from displacing cache lines, as they may contain matrix data.
Similarly, the loads are likely to inccur in cache misses after the first iteration. Executing the inner loop will probably fill the cache with matrix data.

Benchmarks repeatedly show a throughput improvement of around 1%.

before:
P1854747253

after:
P1854747141

Differential Revision: D77459967

Summary: Hoist memory loads from the outer loop Intention is to prevent these loads from displacing cache lines, as they may contain matrix data. Similarly, the loads are likely to inccur in cache misses after the first iteration. Executing the inner loop will probably fill the cache with matrix data. Benchmarks repeatedly show a throughput improvement of around 1%. before: P1854747253 after: P1854747141 Differential Revision: D77459967

netlify · 2025-06-29T00:29:03Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`473be92`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/686088cd4a91b90008d4abfa
😎 Deploy Preview	https://deploy-preview-4416--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot · 2025-06-29T00:29:08Z

This pull request was exported from Phabricator. Differential Revision: D77459967

facebook-github-bot · 2025-06-30T15:21:48Z

This pull request has been merged in db4f7a3.

facebook-github-bot added the cla signed label Jun 29, 2025

facebook-github-bot added the fb-exported label Jun 29, 2025

facebook-github-bot closed this in db4f7a3 Jun 30, 2025

facebook-github-bot added the Merged label Jun 30, 2025

gchalump added category:improvement feature:gemm labels Jun 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tweak KleidiAI's FP16 matmul algorithm #4416

Tweak KleidiAI's FP16 matmul algorithm #4416

Uh oh!

Nicoshev commented Jun 29, 2025

Uh oh!

netlify bot commented Jun 29, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 29, 2025

Uh oh!

facebook-github-bot commented Jun 30, 2025

Uh oh!

Uh oh!

Tweak KleidiAI's FP16 matmul algorithm #4416

Tweak KleidiAI's FP16 matmul algorithm #4416

Uh oh!

Conversation

Nicoshev commented Jun 29, 2025

Uh oh!

netlify bot commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Jun 29, 2025

Uh oh!

facebook-github-bot commented Jun 30, 2025

Uh oh!

Uh oh!

netlify bot commented Jun 29, 2025 •

edited

Loading