Enhancing Performance of INT4 Data Transformation #3608

djeong20 · 2025-12-05T04:33:16Z

This pull request speeds up the transformation of int4 data in the osv32_isv2 layout to block_q4_0x4 layers by utilizing ARM NEON and OpenMP. The previous version took about 7 to 8 milliseconds to transform a 3072x8192 matrix, while the current patch takes only 2 to 4 milliseconds.

Self-evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

This pull request speeds up the transformation of int4 data in the osv32_isv2 layout to block_q4_0x4 layers by utilizing ARM NEON and OpenMP. The previous version took about 7 to 8 milliseconds to transform a 3072x8192 matrix, while the current patch takes only 2 to 4 milliseconds. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]>

djeong20 requested review from DonghakPark, EunjuYang, SeoHyungjun, again4you, anyj0527, baek2sm, dkjung, gichan-jang, haehun, jaeyun-jung, jihochu, jijoongmoon, leemgs, lhs8928, myungjoo, skykongkong8, songgot and wooksong as code owners December 5, 2025 04:33

github-actions bot added the Need Review label Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancing Performance of INT4 Data Transformation #3608

Enhancing Performance of INT4 Data Transformation #3608

djeong20 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Enhancing Performance of INT4 Data Transformation #3608

Are you sure you want to change the base?

Enhancing Performance of INT4 Data Transformation #3608

Conversation

djeong20 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant