Skip to content

Conversation

@djeong20
Copy link
Contributor

@djeong20 djeong20 commented Dec 5, 2025

This pull request speeds up the transformation of int4 data in the osv32_isv2 layout to block_q4_0x4 layers by utilizing ARM NEON and OpenMP. The previous version took about 7 to 8 milliseconds to transform a 3072x8192 matrix, while the current patch takes only 2 to 4 milliseconds.

Self-evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

This pull request speeds up the transformation of int4 data in the osv32_isv2 layout to block_q4_0x4 layers by utilizing ARM NEON and OpenMP.
The previous version took about 7 to 8 milliseconds to transform a 3072x8192 matrix, while the current patch takes only 2 to 4 milliseconds.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant