-
Notifications
You must be signed in to change notification settings - Fork 99
[Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) #3571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
sachin-nntrainer
wants to merge
19
commits into
nnstreamer:main
Choose a base branch
from
sachin-nntrainer:qwen3_single_token_inference
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) #3571
sachin-nntrainer
wants to merge
19
commits into
nnstreamer:main
from
sachin-nntrainer:qwen3_single_token_inference
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
added cast layer (type casting). - there is no model-unittest yet. - In some cases, optimization may be considered (such as fusing only one layer in front), but for now, only the basic implementation is included. optimization will be applied through ONNX graph optimization work, later. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [ ]Passed [ ]Failed [*]Skipped Signed-off-by: Seungbaek Hong <[email protected]>
For mapping operation unit with ONNX, I reverted weight layer to a structure that has only one weight. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>
- added "How to run ONNX Model with NNTrainer" document (how-to-run-onnx-model.md) - Since operations other than the add operation have not yet been merged into the main branch, the list of supported and planned operations has not been documented. The documentation will be updated once several additional operations are added to the main branch Signed-off-by: Seungbaek Hong <[email protected]>
- Refactor ONNX interpreter operator registration using handler
structure instead of if-else statements
- Add logic to read and convert layer properties
- Add simplified Attention Block example (removing rope, gqa, etc.)
The execution result of the example app is as follows:
```
================================================================================
Layer name Layer type Output dimension Input layer
================================================================================
input input 1:1:1:8
--------------------------------------------------------------------------------
input/generated_out multiout 1:1:1:8 input
--------------------------------------------------------------------------------
onnx__matmul_88 weight 1:1:8:8
--------------------------------------------------------------------------------
onnx__matmul_83 weight 1:1:8:8
--------------------------------------------------------------------------------
v_proj_matmul matmul 1:1:1:8 input/generated_out
onnx__matmul_83
--------------------------------------------------------------------------------
reshape_2 reshape 1:1:1:8 v_proj_matmul
--------------------------------------------------------------------------------
transpose_1 permute 1:1:1:8 reshape_2
--------------------------------------------------------------------------------
onnx__matmul_82 weight 1:1:8:8
--------------------------------------------------------------------------------
k_proj_matmul matmul 1:1:1:8 input/generated_out
onnx__matmul_82
--------------------------------------------------------------------------------
reshape_1 reshape 1:1:1:8 k_proj_matmul
--------------------------------------------------------------------------------
transpose_2 permute 1:1:8:1 reshape_1
--------------------------------------------------------------------------------
onnx__matmul_66 weight 1:1:8:8
--------------------------------------------------------------------------------
q_proj_matmul matmul 1:1:1:8 input/generated_out
onnx__matmul_66
--------------------------------------------------------------------------------
reshape reshape 1:1:1:8 q_proj_matmul
--------------------------------------------------------------------------------
transpose permute 1:1:1:8 reshape
--------------------------------------------------------------------------------
matmul matmul 1:1:1:1 transpose
transpose_2
--------------------------------------------------------------------------------
softmax activation 1:1:1:1 matmul
--------------------------------------------------------------------------------
cast cast 1:1:1:1 softmax
--------------------------------------------------------------------------------
matmul_1 matmul 1:1:1:8 cast
transpose_1
--------------------------------------------------------------------------------
transpose_3 permute 1:1:1:8 matmul_1
--------------------------------------------------------------------------------
reshape_3 reshape 1:1:1:8 transpose_3
--------------------------------------------------------------------------------
o_proj_matmul matmul 1:1:1:8 reshape_3
onnx__matmul_88
================================================================================
```
**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <[email protected]>
Added "gather, slice, negative" layers for supporting onnx model These layer implementations were added to create graph connections during the development of the ONNX interpreter and will soon be replaced by actual layer implementations along with unit-tests in other PRs. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>
- It's an draft PR and I'll modify the commit message
The execution result of the example app is as follows (llama 1b):
```
================================================================================
Layer name Layer type Output dimension Input layer
================================================================================
onnx__add_6 input 1:1:1:1
--------------------------------------------------------------------------------
onnx__add_6/generat multiout 1:1:1:1 onnx__add_6
--------------------------------------------------------------------------------
sin input 1:1:1:64
--------------------------------------------------------------------------------
sin/generated_out_0 multiout 1:1:1:64 sin
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
model_norm_cast_1 cast 1:1:1:2048 model_norm_mul
--------------------------------------------------------------------------------
model_norm_mul_1 multiply 1:1:1:2048 model_norm_weight
model_norm_cast_1
--------------------------------------------------------------------------------
lm_head_matmul matmul 1:1:1:50304 model_norm_mul_1
onnx__matmul_3531
================================================================================
```
(Approximately 8,800 lines, omitted; the total number of layers is estimated to be around 4,000.)
**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <[email protected]>
The input layer was identified during compilation based on the number of input connections(or property). However, weight layer also has no input connection, so it causes some issues. this commit resolved these issues. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>
Signed-off-by: Sumon Nath <[email protected]>
Signed-off-by: Sumon Nath <[email protected]>
Signed-off-by: Sumon Nath <[email protected]>
baek2sm
reviewed
Nov 20, 2025
| std::numeric_limits<size_t>::max(), true, model_file_fd); | ||
| } else { | ||
| // futures.emplace_back(std::async(std::launch::async, [&, node] { | ||
| if (1 || !MMAP_READ) { |
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always true?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
E2E inferencing of single token Qwen 3B model on nntrainer (x86)
Steps:
To generate onnx model, its result and weight. In Applications/onnx/python/qwen3
After building nntrainer run binary of Applications/onnx/jni/main.cpp. This generates a result in bin format (nntrainer_logits.bin)
Use compare.py in Applications/onnx/python to compare modelling_logits.bin and nntrainer_logits.bin.
Self evaluation:
Signed-off-by: Sachin Singh [email protected]