[Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) #3571

sachin-nntrainer · 2025-11-19T11:28:22Z

E2E inferencing of single token Qwen 3B model on nntrainer (x86)

Steps:

To generate onnx model, its result and weight. In Applications/onnx/python/qwen3

Run onnx_exporter.py, creates model, weights and model result in bin format (modelling_logits.bin)
Run create_bin.py, converts onnx layer weights into bin formats. NNtrainer loads them for inference.

After building nntrainer run binary of Applications/onnx/jni/main.cpp. This generates a result in bin format (nntrainer_logits.bin)

Use compare.py in Applications/onnx/python to compare modelling_logits.bin and nntrainer_logits.bin.

Self evaluation:

Build test: []Passed [ ]Failed [X]Skipped
Run test: []Passed [ ]Failed [X]Skipped

Signed-off-by: Sachin Singh [email protected]

added cast layer (type casting). - there is no model-unittest yet. - In some cases, optimization may be considered (such as fusing only one layer in front), but for now, only the basic implementation is included. optimization will be applied through ONNX graph optimization work, later. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [ ]Passed [ ]Failed [*]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

For mapping operation unit with ONNX, I reverted weight layer to a structure that has only one weight. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

- added "How to run ONNX Model with NNTrainer" document (how-to-run-onnx-model.md) - Since operations other than the add operation have not yet been merged into the main branch, the list of supported and planned operations has not been documented. The documentation will be updated once several additional operations are added to the main branch Signed-off-by: Seungbaek Hong <[email protected]>

- Refactor ONNX interpreter operator registration using handler structure instead of if-else statements - Add logic to read and convert layer properties - Add simplified Attention Block example (removing rope, gqa, etc.) The execution result of the example app is as follows: ``` ================================================================================ Layer name Layer type Output dimension Input layer ================================================================================ input input 1:1:1:8 -------------------------------------------------------------------------------- input/generated_out multiout 1:1:1:8 input -------------------------------------------------------------------------------- onnx__matmul_88 weight 1:1:8:8 -------------------------------------------------------------------------------- onnx__matmul_83 weight 1:1:8:8 -------------------------------------------------------------------------------- v_proj_matmul matmul 1:1:1:8 input/generated_out onnx__matmul_83 -------------------------------------------------------------------------------- reshape_2 reshape 1:1:1:8 v_proj_matmul -------------------------------------------------------------------------------- transpose_1 permute 1:1:1:8 reshape_2 -------------------------------------------------------------------------------- onnx__matmul_82 weight 1:1:8:8 -------------------------------------------------------------------------------- k_proj_matmul matmul 1:1:1:8 input/generated_out onnx__matmul_82 -------------------------------------------------------------------------------- reshape_1 reshape 1:1:1:8 k_proj_matmul -------------------------------------------------------------------------------- transpose_2 permute 1:1:8:1 reshape_1 -------------------------------------------------------------------------------- onnx__matmul_66 weight 1:1:8:8 -------------------------------------------------------------------------------- q_proj_matmul matmul 1:1:1:8 input/generated_out onnx__matmul_66 -------------------------------------------------------------------------------- reshape reshape 1:1:1:8 q_proj_matmul -------------------------------------------------------------------------------- transpose permute 1:1:1:8 reshape -------------------------------------------------------------------------------- matmul matmul 1:1:1:1 transpose transpose_2 -------------------------------------------------------------------------------- softmax activation 1:1:1:1 matmul -------------------------------------------------------------------------------- cast cast 1:1:1:1 softmax -------------------------------------------------------------------------------- matmul_1 matmul 1:1:1:8 cast transpose_1 -------------------------------------------------------------------------------- transpose_3 permute 1:1:1:8 matmul_1 -------------------------------------------------------------------------------- reshape_3 reshape 1:1:1:8 transpose_3 -------------------------------------------------------------------------------- o_proj_matmul matmul 1:1:1:8 reshape_3 onnx__matmul_88 ================================================================================ ``` **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

Added "gather, slice, negative" layers for supporting onnx model These layer implementations were added to create graph connections during the development of the ONNX interpreter and will soon be replaced by actual layer implementations along with unit-tests in other PRs. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

- It's an draft PR and I'll modify the commit message The execution result of the example app is as follows (llama 1b): ``` ================================================================================ Layer name Layer type Output dimension Input layer ================================================================================ onnx__add_6 input 1:1:1:1 -------------------------------------------------------------------------------- onnx__add_6/generat multiout 1:1:1:1 onnx__add_6 -------------------------------------------------------------------------------- sin input 1:1:1:64 -------------------------------------------------------------------------------- sin/generated_out_0 multiout 1:1:1:64 sin -------------------------------------------------------------------------------- ... -------------------------------------------------------------------------------- model_norm_cast_1 cast 1:1:1:2048 model_norm_mul -------------------------------------------------------------------------------- model_norm_mul_1 multiply 1:1:1:2048 model_norm_weight model_norm_cast_1 -------------------------------------------------------------------------------- lm_head_matmul matmul 1:1:1:50304 model_norm_mul_1 onnx__matmul_3531 ================================================================================ ``` (Approximately 8,800 lines, omitted; the total number of layers is estimated to be around 4,000.) **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

The input layer was identified during compilation based on the number of input connections(or property). However, weight layer also has no input connection, so it causes some issues. this commit resolved these issues. **Self evaluation:** 1. Build test: [*]Passed [ ]Failed [ ]Skipped 2. Run test: [*]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

Signed-off-by: Sumon Nath <[email protected]>

baek2sm · 2025-11-20T04:59:20Z

nntrainer/models/neuralnet.cpp

-                       std::numeric_limits<size_t>::max(), true, model_file_fd);
-          } else {
+        // futures.emplace_back(std::async(std::launch::async, [&, node] {
+        if (1 || !MMAP_READ) {


always true?

sachin-nntrainer added 19 commits October 14, 2025 17:53

[ONNX][draft] Weight loading from bin files for onnx models

892069a

Signed-off-by: Sumon Nath <[email protected]>

[ONNX] Solved Slice layer setTensorDim erroe

7d15c88

Signed-off-by: Sumon Nath <[email protected]>

matmul error unfinished

3380f13

Signed-off-by: Sumon Nath <[email protected]>

Python modelling and utility scripts

29eb120

ONNX Interpreter changes: Added support for parsing pow op

f874fd8

layer to supports Qwen3B

532870b

Added weight loading support via layer name

67617e8

Debug ops for accuracy:No changes

f4cfa5c

Adding save2raw in main.cpp and changing example value

9bd39a3

remove unwanted files

ca210db

unwanted cout statement

85d866a

Added python env reqs

ed7081e

sachin-nntrainer changed the title ~~E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86)~~ [Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) Nov 19, 2025

baek2sm mentioned this pull request Nov 19, 2025

[ONNX][re-open] onnx interpreter #3573

Merged

baek2sm reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) #3571

[Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) #3571

Uh oh!

sachin-nntrainer commented Nov 19, 2025

Uh oh!

baek2sm Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) #3571

Are you sure you want to change the base?

[Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) #3571

Uh oh!

Conversation

sachin-nntrainer commented Nov 19, 2025

Uh oh!

baek2sm Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants