Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling XNNPACK with Raspberry Pi Zero/W #177

Open
gaikwadrahul8 opened this issue Nov 28, 2024 · 2 comments
Open

Enabling XNNPACK with Raspberry Pi Zero/W #177

gaikwadrahul8 opened this issue Nov 28, 2024 · 2 comments

Comments

@gaikwadrahul8
Copy link
Contributor

Click to expand!

Issue Type

Build/Install

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

2.12.0

Custom Code

No

OS Platform and Distribution

Linux Raspberrypi OS 32-bit (Debian bullseye)

Mobile device

Raspberry Pi Zero W

Python version

3.9.2

Bazel version

cmake 3.18.4

GCC/Compiler version

GNU c++ (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110

CUDA/cuDNN version

NA

GPU model and memory

NA

Current Behaviour?

The tf-lite build instructions for Raspberry Pi Zero/Zero W state that the following should be part
of the CFLAGS/CXXFLAGS:
-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations

As per the README for xxnpack, XNNPACK supports running on the armv6 with vpf that's the Raspberry Pi Zero W. However all build instructions for Raspberry Pi Zero request explicitly disabling xnnpack. Given the support for rpi0 in xnnpack documentation, I tried to build tf-lite with xnnpack enabled.

When the xnnpack sub-build is enabled, the following conflicting CFLAGS are added to the compiler invocation during the xnnpack sub-build:
-marm -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8

Please document/extend the cmake and build instructions to allow tf-lite to build correctly with xnnpack enabled for the Raspberry Pi Zero/Zero W.

Standalone code to reproduce the issue

cmake \
   -DCMAKE_C_FLAGS='-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include' \
   -DCMAKE_CXX_FLAGS='-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include'  \
   -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON \
   -DCMAKE_SYSTEM_NAME=Linux \
   -DCMAKE_SYSTEM_PROCESSOR=armv6  \
   -DTFLITE_ENABLE_XNNPACK=ON \
   /home/samveen/tensorflow/build/../tensorflow/lite
...
cmake --build . --verbose -t _pywrap_tensorflow_interpreter_wrapper

Relevant log output

/usr/bin/gmake  -f _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build.make _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build
gmake[3]: Entering directory '/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build'
[ 47%] Building C object _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/amalgam/neondot.c.o
cd /home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/_deps/xnnpack-build && /usr/bin/cc -DEIGEN_MPL2_ONLY -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNOMINMAX=1 -DPTHREADPOOL_NO_DEPRECATED_API=1 -DXNN_ENABLE_ARM_BF16=1 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_JIT=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/xnnpack/src -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/pthreadpool-source/include -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/FXdiv-source/include -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/FP16-source/include -march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include -O3 -DNDEBUG -fPIC -Wno-psabi -O2 -pthread -std=c99  -marm  -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8  -o CMakeFiles/microkernels-prod.dir/src/amalgam/neondot.c.o -c /home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/xnnpack/src/amalgam/neondot.c
/tmp/ccotLSur.s: Assembler messages:
/tmp/ccotLSur.s:63: Error: selected processor does not support `vsdot.s8 q8,q12,d7[0]' in ARM mode
/tmp/ccotLSur.s:65: Error: selected processor does not support `vsdot.s8 q9,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:68: Error: selected processor does not support `vsdot.s8 q11,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:71: Error: selected processor does not support `vsdot.s8 q14,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:74: Error: selected processor does not support `vsdot.s8 q8,q10,d7[1]' in ARM mode
/tmp/ccotLSur.s:77: Error: selected processor does not support `vsdot.s8 q9,q10,d7[1]' in ARM mode
/tmp/ccotLSur.s:80: Error: selected processor does not support `vsdot.s8 q11,q10,d7[1]' in ARM mode
...
@gaikwadrahul8
Copy link
Contributor Author

This issue originally reported by @samveen has been moved to this dedicated repository for LiteRT to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.

We appreciate your understanding and look forward to your continued involvement.

@samveen
Copy link

samveen commented Dec 2, 2024

@gaikwadrahul8 Thank you. It's great to to see some traction on this. The project I was trying to implment is still pending on this 😅

Linked issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants