Skip to content

Conversation

@sachin-nntrainer
Copy link

E2E inferencing of single token Qwen 3B model on nntrainer (x86)

Steps:

To generate onnx model, its result and weight. In Applications/onnx/python/qwen3

  1. Run onnx_exporter.py, creates model, weights and model result in bin format (modelling_logits.bin)
  2. Run create_bin.py, converts onnx layer weights into bin formats. NNtrainer loads them for inference.

After building nntrainer run binary of Applications/onnx/jni/main.cpp. This generates a result in bin format (nntrainer_logits.bin)

Use compare.py in Applications/onnx/python to compare modelling_logits.bin and nntrainer_logits.bin.

Self evaluation:

  1. Build test: []Passed [ ]Failed [X]Skipped
  2. Run test: []Passed [ ]Failed [X]Skipped

Signed-off-by: Sachin Singh [email protected]

added cast layer (type casting).

- there is no model-unittest yet.
- In some cases, optimization may be considered (such as fusing only one
layer in front), but for now, only the basic implementation is included.
optimization will be applied through ONNX graph optimization work,
later.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [ ]Passed [ ]Failed [*]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
For mapping operation unit with ONNX, I reverted weight layer to a
structure that has only one weight.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
- added "How to run ONNX Model with NNTrainer" document
(how-to-run-onnx-model.md)
- Since operations other than the add operation have not yet been merged
into the main branch, the list of supported and planned operations has
not been documented. The documentation will be updated once several
additional operations are added to the main branch

Signed-off-by: Seungbaek Hong <[email protected]>
- Refactor ONNX interpreter operator registration using handler
structure instead of if-else statements
- Add logic to read and convert layer properties
- Add simplified Attention Block example (removing rope, gqa, etc.)

The execution result of the example app is as follows:
```
================================================================================
          Layer name          Layer type    Output dimension         Input layer
================================================================================
               input               input             1:1:1:8
--------------------------------------------------------------------------------
 input/generated_out            multiout             1:1:1:8               input
--------------------------------------------------------------------------------
     onnx__matmul_88              weight             1:1:8:8
--------------------------------------------------------------------------------
     onnx__matmul_83              weight             1:1:8:8
--------------------------------------------------------------------------------
       v_proj_matmul              matmul             1:1:1:8 input/generated_out
                                                                 onnx__matmul_83
--------------------------------------------------------------------------------
           reshape_2             reshape             1:1:1:8       v_proj_matmul
--------------------------------------------------------------------------------
         transpose_1             permute             1:1:1:8           reshape_2
--------------------------------------------------------------------------------
     onnx__matmul_82              weight             1:1:8:8
--------------------------------------------------------------------------------
       k_proj_matmul              matmul             1:1:1:8 input/generated_out
                                                                 onnx__matmul_82
--------------------------------------------------------------------------------
           reshape_1             reshape             1:1:1:8       k_proj_matmul
--------------------------------------------------------------------------------
         transpose_2             permute             1:1:8:1           reshape_1
--------------------------------------------------------------------------------
     onnx__matmul_66              weight             1:1:8:8
--------------------------------------------------------------------------------
       q_proj_matmul              matmul             1:1:1:8 input/generated_out
                                                                 onnx__matmul_66
--------------------------------------------------------------------------------
             reshape             reshape             1:1:1:8       q_proj_matmul
--------------------------------------------------------------------------------
           transpose             permute             1:1:1:8             reshape
--------------------------------------------------------------------------------
              matmul              matmul             1:1:1:1           transpose
                                                                     transpose_2
--------------------------------------------------------------------------------
             softmax          activation             1:1:1:1              matmul
--------------------------------------------------------------------------------
                cast                cast             1:1:1:1             softmax
--------------------------------------------------------------------------------
            matmul_1              matmul             1:1:1:8                cast
                                                                     transpose_1
--------------------------------------------------------------------------------
         transpose_3             permute             1:1:1:8            matmul_1
--------------------------------------------------------------------------------
           reshape_3             reshape             1:1:1:8         transpose_3
--------------------------------------------------------------------------------
       o_proj_matmul              matmul             1:1:1:8           reshape_3
                                                                 onnx__matmul_88
================================================================================
```

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
Added "gather, slice, negative" layers for supporting onnx model

These layer implementations were added to create graph connections
during the development of the ONNX interpreter and will soon be replaced
by actual layer implementations along with unit-tests in other PRs.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
- It's an draft PR and I'll modify the commit message

The execution result of the example app is as follows (llama 1b):
```
================================================================================
          Layer name          Layer type    Output dimension         Input layer
================================================================================
         onnx__add_6               input             1:1:1:1
--------------------------------------------------------------------------------
 onnx__add_6/generat            multiout             1:1:1:1         onnx__add_6
--------------------------------------------------------------------------------
                 sin               input            1:1:1:64
--------------------------------------------------------------------------------
 sin/generated_out_0            multiout            1:1:1:64                 sin
--------------------------------------------------------------------------------

...

--------------------------------------------------------------------------------
   model_norm_cast_1                cast          1:1:1:2048      model_norm_mul
--------------------------------------------------------------------------------
    model_norm_mul_1            multiply          1:1:1:2048   model_norm_weight
                                                              model_norm_cast_1
--------------------------------------------------------------------------------
      lm_head_matmul              matmul         1:1:1:50304    model_norm_mul_1
                                                               onnx__matmul_3531
================================================================================
```
(Approximately 8,800 lines, omitted; the total number of layers is estimated to be around 4,000.)

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
The input layer was identified during compilation based on the number of
input connections(or property). However, weight layer also has no input
connection, so it causes some issues. this commit resolved these issues.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
Signed-off-by: Sumon Nath <[email protected]>
@sachin-nntrainer sachin-nntrainer changed the title E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) [Draft][ONNX] E2E inferencing of single token Qwen 3 1.7B model on nntrainer (x86) Nov 19, 2025
std::numeric_limits<size_t>::max(), true, model_file_fd);
} else {
// futures.emplace_back(std::async(std::launch::async, [&, node] {
if (1 || !MMAP_READ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants