You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have experimented multiple models with ARM-NN on Cortex A53(mostly int8 quantized models with latency < 200ms). And I found XNNPACK generally gives a better latency result than ARM-NN. So I am trying to understand what kind of model can perform better with ARM-NN.
./benchmark_model --graph=./mobilenet_v2_1.0_224_INT8.tflite --external_delegate_path=./libarmnnDelegate.so --external_delegate_options="backends:CpuAcc;disable-tflite-runtime-fallback:true;number-of-threads:1"
Log parameter values verbosely: [0]
Graph: [./mobilenet_v2_1.0_224_INT8.tflite]
External delegate path: [./libarmnnDelegate.so]
External delegate options: [backends:CpuAcc,CpuRef;disable-tflite-runtime-fallback:true;number-of-threads:1]
Loaded model ./mobilenet_v2_1.0_224_INT8.tflite
INFO: Initialized TensorFlow Lite runtime.
Couldn't find any of the following OpenCL library: libOpenCL.so libGLES_mali.so libmali.so
INFO: TfLiteArmnnDelegate: Added backend CpuAcc
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
EXTERNAL delegate created.
VERBOSE: Replacing 66 node(s) with delegate (TfLiteArmNnDelegate) node, yielding 1 partitions for the whole graph.
Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 4.02094
Initialized session in 287.252ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=2 first=468655 curr=159104 min=159104 max=468655 avg=313880 std=154775
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=177554 curr=131598 min=131528 max=177554 avg=134539 std=6398
Inference timings in us: Init: 287252, First inference: 468655, Warmup (avg): 313880, Inference (avg): 134539
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=67.3633 overall=77.9492
./benchmark_model --graph=./mobilenet_v2_1.0_224_INT8.tflite --num_threads=1
INFO: Initialized TensorFlow Lite runtime.
INFO: Applying 1 TensorFlow Lite delegate(s) lazily.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
VERBOSE: Replacing 64 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 4 partitions for the whole graph.
INFO: Successfully applied the default TensorFlow Lite delegate indexed at 0.
Num threads: [1]
Graph: [./mobilenet_v2_1.0_224_INT8.tflite]
Enable op profiling: [0]
#threads used for CPU inference: [1]
Loaded model mobilenet_v2_1.0_224_INT8.tflite
The input model file size (MB): 4.02094
Initialized session in 108.149ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=4 first=158233 curr=138142 min=138142 max=158233 avg=148234 std=7143
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=120254 curr=119630 min=119404 max=123512 avg=119935 std=722
Inference timings in us: Init: 108149, First inference: 158233, Warmup (avg): 148234, Inference (avg): 119935
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9.45312 overall=13.9961
The text was updated successfully, but these errors were encountered:
There are many factors that will effect the execution time and there will be cases where Arm NN does not provide improved performance. Can I suggest you try using the evaluate_network.sh script in armnn/tests/ExecuteNetwork/. This will try different parameters with ExecuteNetwork and the TfLite delegate to help you choose parameters that might improve performance.
I have experimented multiple models with ARM-NN on Cortex A53(mostly int8 quantized models with latency < 200ms). And I found XNNPACK generally gives a better latency result than ARM-NN. So I am trying to understand what kind of model can perform better with ARM-NN.
For example, I compared the results using the mobilenet model downloaded from ARMNN model zoo: https://github.com/ARM-software/ML-zoo/tree/master/models/image_classification/mobilenet_v2_1.0_224/tflite_int8
The text was updated successfully, but these errors were encountered: