Skip to content

Tweak KleidiAI's FP16 matmul algorithm #4416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Nicoshev
Copy link
Contributor

Summary:
Hoist memory loads from the outer loop

Intention is to prevent these loads from displacing cache lines, as they may contain matrix data.
Similarly, the loads are likely to inccur in cache misses after the first iteration. Executing the inner loop will probably fill the cache with matrix data.

Benchmarks repeatedly show a throughput improvement of around 1%.

before:
P1854747253

after:
P1854747141

Differential Revision: D77459967

Summary:
Hoist memory loads from the outer loop

Intention is to prevent these loads from displacing cache lines, as they may contain matrix data.
Similarly, the loads are likely to inccur in cache misses after the first iteration. Executing the inner loop will probably fill the cache with matrix data.

Benchmarks repeatedly show a throughput improvement of around 1%.

before:
P1854747253

after:
P1854747141

Differential Revision: D77459967
Copy link

netlify bot commented Jun 29, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 473be92
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/686088cd4a91b90008d4abfa
😎 Deploy Preview https://deploy-preview-4416--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D77459967

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in db4f7a3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants