Skip to content

Conversation

@niket-agarwal
Copy link
Contributor

This PR enables Android build support for ONNX models by integrating protobuf and abseil-cpp libraries with architecture-specific compilation using Android NDK. It implements the ONNX Qwen3 model inference pipeline with proper JNI layer configuration and Android.mk setup for running models on device.

Key Changes:

  • Android-specific protobuf and abseil-cpp library compilation using Android NDK
  • ONNX interpreter integration with JNI layer for Android
  • Architecture-specific build configurations (arm64-v8a)

Current Status:

Inference is currently working on Qwen3 models on Android devices, but we are facing accuracy issues that are being investigated. The core Android build infrastructure and ONNX model loading are functional.

Self evaluation:

Build test: []Passed [ ]Failed [X]Skipped
Run test: []Passed [ ]Failed [X]Skipped
Signed-off-by: Niket Agarwal [email protected]

sachin-nntrainer and others added 20 commits October 14, 2025 17:53
added cast layer (type casting).

- there is no model-unittest yet.
- In some cases, optimization may be considered (such as fusing only one
layer in front), but for now, only the basic implementation is included.
optimization will be applied through ONNX graph optimization work,
later.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [ ]Passed [ ]Failed [*]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
For mapping operation unit with ONNX, I reverted weight layer to a
structure that has only one weight.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
- added "How to run ONNX Model with NNTrainer" document
(how-to-run-onnx-model.md)
- Since operations other than the add operation have not yet been merged
into the main branch, the list of supported and planned operations has
not been documented. The documentation will be updated once several
additional operations are added to the main branch

Signed-off-by: Seungbaek Hong <[email protected]>
- Refactor ONNX interpreter operator registration using handler
structure instead of if-else statements
- Add logic to read and convert layer properties
- Add simplified Attention Block example (removing rope, gqa, etc.)

The execution result of the example app is as follows:
```
================================================================================
          Layer name          Layer type    Output dimension         Input layer
================================================================================
               input               input             1:1:1:8
--------------------------------------------------------------------------------
 input/generated_out            multiout             1:1:1:8               input
--------------------------------------------------------------------------------
     onnx__matmul_88              weight             1:1:8:8
--------------------------------------------------------------------------------
     onnx__matmul_83              weight             1:1:8:8
--------------------------------------------------------------------------------
       v_proj_matmul              matmul             1:1:1:8 input/generated_out
                                                                 onnx__matmul_83
--------------------------------------------------------------------------------
           reshape_2             reshape             1:1:1:8       v_proj_matmul
--------------------------------------------------------------------------------
         transpose_1             permute             1:1:1:8           reshape_2
--------------------------------------------------------------------------------
     onnx__matmul_82              weight             1:1:8:8
--------------------------------------------------------------------------------
       k_proj_matmul              matmul             1:1:1:8 input/generated_out
                                                                 onnx__matmul_82
--------------------------------------------------------------------------------
           reshape_1             reshape             1:1:1:8       k_proj_matmul
--------------------------------------------------------------------------------
         transpose_2             permute             1:1:8:1           reshape_1
--------------------------------------------------------------------------------
     onnx__matmul_66              weight             1:1:8:8
--------------------------------------------------------------------------------
       q_proj_matmul              matmul             1:1:1:8 input/generated_out
                                                                 onnx__matmul_66
--------------------------------------------------------------------------------
             reshape             reshape             1:1:1:8       q_proj_matmul
--------------------------------------------------------------------------------
           transpose             permute             1:1:1:8             reshape
--------------------------------------------------------------------------------
              matmul              matmul             1:1:1:1           transpose
                                                                     transpose_2
--------------------------------------------------------------------------------
             softmax          activation             1:1:1:1              matmul
--------------------------------------------------------------------------------
                cast                cast             1:1:1:1             softmax
--------------------------------------------------------------------------------
            matmul_1              matmul             1:1:1:8                cast
                                                                     transpose_1
--------------------------------------------------------------------------------
         transpose_3             permute             1:1:1:8            matmul_1
--------------------------------------------------------------------------------
           reshape_3             reshape             1:1:1:8         transpose_3
--------------------------------------------------------------------------------
       o_proj_matmul              matmul             1:1:1:8           reshape_3
                                                                 onnx__matmul_88
================================================================================
```

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
Added "gather, slice, negative" layers for supporting onnx model

These layer implementations were added to create graph connections
during the development of the ONNX interpreter and will soon be replaced
by actual layer implementations along with unit-tests in other PRs.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
- It's an draft PR and I'll modify the commit message

The execution result of the example app is as follows (llama 1b):
```
================================================================================
          Layer name          Layer type    Output dimension         Input layer
================================================================================
         onnx__add_6               input             1:1:1:1
--------------------------------------------------------------------------------
 onnx__add_6/generat            multiout             1:1:1:1         onnx__add_6
--------------------------------------------------------------------------------
                 sin               input            1:1:1:64
--------------------------------------------------------------------------------
 sin/generated_out_0            multiout            1:1:1:64                 sin
--------------------------------------------------------------------------------

...

--------------------------------------------------------------------------------
   model_norm_cast_1                cast          1:1:1:2048      model_norm_mul
--------------------------------------------------------------------------------
    model_norm_mul_1            multiply          1:1:1:2048   model_norm_weight
                                                              model_norm_cast_1
--------------------------------------------------------------------------------
      lm_head_matmul              matmul         1:1:1:50304    model_norm_mul_1
                                                               onnx__matmul_3531
================================================================================
```
(Approximately 8,800 lines, omitted; the total number of layers is estimated to be around 4,000.)

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
The input layer was identified during compilation based on the number of
input connections(or property). However, weight layer also has no input
connection, so it causes some issues. this commit resolved these issues.

**Self evaluation:**
1. Build test: [*]Passed [ ]Failed [ ]Skipped
2. Run test: [*]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
Signed-off-by: Sumon Nath <[email protected]>
…uf and abseil-cpp libraries with architecture-specific compilation using Android NDK.

- Implemented ONNX Qwen model inference pipeline with proper JNI layer configuration and Android.mk setup for running models on device.

Signed-off-by: Niket Agarwal <[email protected]>
namespace nntrainer {

void SliceLayer::finalize(InitLayerContext &context) {
unsigned int axis = std::get<props::Axis>(slice_props).get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are accessing the local variable axis and not updating/accessing this->axis. Don't you need to update this->axis?

Plus, shadowing isn't good. Please do not shadow class members with local variables. It will be soon prohibited.

break;
case 2:
outDeriv.addValue(b, i, j, selected, inDerivValue, 1);
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waht if axis == 3 ?

and default should process error/corner cases.

break;
case 3:
output.setValue(b, i, j, k, input.getValue(b, i, j, selected));
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to ignore axis=0 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that exception case is handled in the finalize function!
if (axis < 1 || axis > 3) { throw std::invalid_argument( "The axis property of GatherLayer should be between 1 and 3."); }

Copy link
Member

@myungjoo myungjoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add generated files (onnix.pb.cc/h)

case onnx::TensorProto::FLOAT16:
return "FP16";
case onnx::TensorProto::INT64:
return "FP32";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is INT64-->FP32 ok? (8bytes --> 4bytes)

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added the Stale label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants