[ET-VK] Allow int4 linear to execute without 8bit buffer support #10030

SS-JIA · 2025-04-09T22:33:42Z

Stack from ghstack (oldest at bottom):

[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205
[ET-VK] Add co-op algorithm for 4 bit weight only quantized linear #10204
-> [ET-VK] Allow int4 linear to execute without 8bit buffer support #10030
[ET-VK][ez] Add support for buffer backed qparams in int4 linear + add checks for physical limits when allocating #9974

Context

Some Vulkan devices do not have support for 8-bit buffers, which is currently required to execute the int4 linear compute shader due to the prepacking shader requiring it.

This diff bypasses that restriction by introducing a variant of the prepacking shader that does not need 8-bit buffers.

Changes

Introduce a variant of the int4 weight prepacking shader that interprets the tensor data as an array of uint instead of uint8_t. Each uint represents 4 uint8_t values.

Differential Revision: D72750897

## Context Some Vulkan devices do not have support for 8-bit buffers, which is currently required to execute the int4 linear compute shader due to the prepacking shader requiring it. This diff bypasses that restriction by introducing a variant of the prepacking shader that does not need 8-bit buffers. ## Changes Introduce a variant of the int4 weight prepacking shader that interprets the tensor data as an array of `uint` instead of `uint8_t`. Each `uint` represents 4 `uint8_t` values. Differential Revision: [D72750897](https://our.internmc.facebook.com/intern/diff/D72750897/) [ghstack-poisoned]

pytorch-bot · 2025-04-09T22:33:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10030

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 92f11a3 with merge base 6d1caca ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

## Context Some Vulkan devices do not have support for 8-bit buffers, which is currently required to execute the int4 linear compute shader due to the prepacking shader requiring it. This diff bypasses that restriction by introducing a variant of the prepacking shader that does not need 8-bit buffers. ## Changes Introduce a variant of the int4 weight prepacking shader that interprets the tensor data as an array of `uint` instead of `uint8_t`. Each `uint` represents 4 `uint8_t` values. Differential Revision: [D72750897](https://our.internmc.facebook.com/intern/diff/D72750897/) ghstack-source-id: 277173967 Pull Request resolved: #10030

facebook-github-bot · 2025-04-09T22:34:03Z

This pull request was exported from Phabricator. Differential Revision: D72750897

…upport" ## Context Some Vulkan devices do not have support for 8-bit buffers, which is currently required to execute the int4 linear compute shader due to the prepacking shader requiring it. This diff bypasses that restriction by introducing a variant of the prepacking shader that does not need 8-bit buffers. ## Changes Introduce a variant of the int4 weight prepacking shader that interprets the tensor data as an array of `uint` instead of `uint8_t`. Each `uint` represents 4 `uint8_t` values. Differential Revision: [D72750897](https://our.internmc.facebook.com/intern/diff/D72750897/) [ghstack-poisoned]

Pull Request resolved: #10030 ## Context Some Vulkan devices do not have support for 8-bit buffers, which is currently required to execute the int4 linear compute shader due to the prepacking shader requiring it. This diff bypasses that restriction by introducing a variant of the prepacking shader that does not need 8-bit buffers. ## Changes Introduce a variant of the int4 weight prepacking shader that interprets the tensor data as an array of `uint` instead of `uint8_t`. Each `uint` represents 4 `uint8_t` values. Differential Revision: [D72750897](https://our.internmc.facebook.com/intern/diff/D72750897/) ghstack-source-id: 277175676

facebook-github-bot · 2025-04-09T22:39:53Z

This pull request was exported from Phabricator. Differential Revision: D72750897

…upport" ## Context Some Vulkan devices do not have support for 8-bit buffers, which is currently required to execute the int4 linear compute shader due to the prepacking shader requiring it. This diff bypasses that restriction by introducing a variant of the prepacking shader that does not need 8-bit buffers. ## Changes Introduce a variant of the int4 weight prepacking shader that interprets the tensor data as an array of `uint` instead of `uint8_t`. Each `uint` represents 4 `uint8_t` values. Differential Revision: [D72750897](https://our.internmc.facebook.com/intern/diff/D72750897/) [ghstack-poisoned]

facebook-github-bot · 2025-04-15T16:56:34Z

This pull request was exported from Phabricator. Differential Revision: D72750897

SS-JIA mentioned this pull request Apr 9, 2025

[ET-VK][ez] Add support for buffer backed qparams in int4 linear + add checks for physical limits when allocating #9974

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 9, 2025

facebook-github-bot added the fb-exported label Apr 9, 2025

This was referenced Apr 15, 2025

[ET-VK] Add co-op algorithm for 4 bit weight only quantized linear #10204

Merged

[ET-VK] Use performant tiled algorithm for 4 bit weight only quantized linear #10205

Merged

trivedivivek added the topic: not user facing label Apr 15, 2025

trivedivivek approved these changes Apr 15, 2025

View reviewed changes

facebook-github-bot merged commit 3aa73e9 into gh/SS-JIA/210/base Apr 16, 2025
84 of 85 checks passed

facebook-github-bot deleted the gh/SS-JIA/210/head branch April 16, 2025 17:42

facebook-github-bot temporarily deployed to cherry-pick-bot April 16, 2025 17:42 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Apr 16, 2025

[ET-VK] Allow int4 linear to execute without 8bit buffer support #10234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Allow int4 linear to execute without 8bit buffer support #10030

[ET-VK] Allow int4 linear to execute without 8bit buffer support #10030

Uh oh!

SS-JIA commented Apr 9, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 9, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 9, 2025

Uh oh!

facebook-github-bot commented Apr 9, 2025

Uh oh!

facebook-github-bot commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

[ET-VK] Allow int4 linear to execute without 8bit buffer support #10030

[ET-VK] Allow int4 linear to execute without 8bit buffer support #10030

Uh oh!

Conversation

SS-JIA commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

pytorch-bot bot commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10030

✅ No Failures

Uh oh!

facebook-github-bot commented Apr 9, 2025

Uh oh!

facebook-github-bot commented Apr 9, 2025

Uh oh!

facebook-github-bot commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

SS-JIA commented Apr 9, 2025 •

edited

Loading

pytorch-bot bot commented Apr 9, 2025 •

edited

Loading