Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling XNNPACK with Raspberry Pi Zero/W #60282

Closed
samveen opened this issue Apr 9, 2023 · 8 comments
Closed

Enabling XNNPACK with Raspberry Pi Zero/W #60282

samveen opened this issue Apr 9, 2023 · 8 comments
Assignees
Labels
comp:lite TF Lite related issues comp:lite-xnnpack TensorFlow Lite XNNPack related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.12 For issues related to Tensorflow 2.12 type:build/install Build and install issues

Comments

@samveen
Copy link

samveen commented Apr 9, 2023

Click to expand!

Issue Type

Build/Install

Have you reproduced the bug with TF nightly?

No

Source

source

Tensorflow Version

2.12.0

Custom Code

No

OS Platform and Distribution

Linux Raspberrypi OS 32-bit (Debian bullseye)

Mobile device

Raspberry Pi Zero W

Python version

3.9.2

Bazel version

cmake 3.18.4

GCC/Compiler version

GNU c++ (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110

CUDA/cuDNN version

NA

GPU model and memory

NA

Current Behaviour?

The tf-lite build instructions for Raspberry Pi Zero/Zero W state that the following should be part
of the CFLAGS/CXXFLAGS:
-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations

As per the README for xxnpack, XNNPACK supports running on the armv6 with vpf that's the Raspberry Pi Zero W. However all build instructions for Raspberry Pi Zero request explicitly disabling xnnpack. Given the support for rpi0 in xnnpack documentation, I tried to build tf-lite with xnnpack enabled.

When the xnnpack sub-build is enabled, the following conflicting CFLAGS are added to the compiler invocation during the xnnpack sub-build:
-marm -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8

Please document/extend the cmake and build instructions to allow tf-lite to build correctly with xnnpack enabled for the Raspberry Pi Zero/Zero W.

Standalone code to reproduce the issue

cmake \
   -DCMAKE_C_FLAGS='-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include' \
   -DCMAKE_CXX_FLAGS='-march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include'  \
   -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON \
   -DCMAKE_SYSTEM_NAME=Linux \
   -DCMAKE_SYSTEM_PROCESSOR=armv6  \
   -DTFLITE_ENABLE_XNNPACK=ON \
   /home/samveen/tensorflow/build/../tensorflow/lite
...
cmake --build . --verbose -t _pywrap_tensorflow_interpreter_wrapper

Relevant log output

/usr/bin/gmake  -f _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build.make _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build
gmake[3]: Entering directory '/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build'
[ 47%] Building C object _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/amalgam/neondot.c.o
cd /home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/_deps/xnnpack-build && /usr/bin/cc -DEIGEN_MPL2_ONLY -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNOMINMAX=1 -DPTHREADPOOL_NO_DEPRECATED_API=1 -DXNN_ENABLE_ARM_BF16=1 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_JIT=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/xnnpack/src -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/pthreadpool-source/include -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/FXdiv-source/include -I/home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/FP16-source/include -march=armv6 -mfpu=vfp -mfloat-abi=hard -funsafe-math-optimizations -I/usr/include/python3.9 -I/usr/lib/python3/dist-packages/pybind11/include -I/usr/lib/python3/dist-packages/numpy/core/include -O3 -DNDEBUG -fPIC -Wno-psabi -O2 -pthread -std=c99  -marm  -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8  -o CMakeFiles/microkernels-prod.dir/src/amalgam/neondot.c.o -c /home/samveen/tensorflow/build/gen/tflite_pip/python3/cmake_build/xnnpack/src/amalgam/neondot.c
/tmp/ccotLSur.s: Assembler messages:
/tmp/ccotLSur.s:63: Error: selected processor does not support `vsdot.s8 q8,q12,d7[0]' in ARM mode
/tmp/ccotLSur.s:65: Error: selected processor does not support `vsdot.s8 q9,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:68: Error: selected processor does not support `vsdot.s8 q11,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:71: Error: selected processor does not support `vsdot.s8 q14,q10,d7[0]' in ARM mode
/tmp/ccotLSur.s:74: Error: selected processor does not support `vsdot.s8 q8,q10,d7[1]' in ARM mode
/tmp/ccotLSur.s:77: Error: selected processor does not support `vsdot.s8 q9,q10,d7[1]' in ARM mode
/tmp/ccotLSur.s:80: Error: selected processor does not support `vsdot.s8 q11,q10,d7[1]' in ARM mode
...
@google-ml-butler google-ml-butler bot added the type:build/install Build and install issues label Apr 9, 2023
@tiruk007 tiruk007 added comp:lite TF Lite related issues comp:lite-xnnpack TensorFlow Lite XNNPack related issues TF 2.12 For issues related to Tensorflow 2.12 labels Apr 10, 2023
@tiruk007 tiruk007 assigned pjpratik and unassigned tiruk007 Apr 10, 2023
@pjpratik
Copy link
Contributor

Hi @samveen

As per the TFLite documentation, the support for XNNPACK is disabled for ARMv6 since there is no NEON support.

Have you observed the same behaviour with -mfpu=vfpv2 as suggested by README for xxnpack?

Thanks.

@pjpratik pjpratik added the stat:awaiting response Status - Awaiting response from author label Apr 13, 2023
@samveen
Copy link
Author

samveen commented Apr 16, 2023

As can be seen in the log output, no matter what the user supplied values of -march and -mfpu are, the build integration of tflite with XNNPACK will add -march=armv8.2-a+dotprod -mfpu=neon-fp-armv8 to the build flags for XNNPACK.

From the XNNPACK readme, it's clear that there is a subset of XNNPACK that is usable for Raspberry Pi Zero/ Zero W). However, what is not clear is that can tflite use just that subset of XNNPACK when it's being built for the Zero/Zero W.

I'll try and build XNNPACK on the Pi Zero as per the instructions and get back to you on the build state.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 16, 2023
@samveen
Copy link
Author

samveen commented Apr 21, 2023

@pjpratik

  • I've opened an issue against XNNPACK for more details (CMAKE Parameters for armv6+vfp build google/XNNPACK#4636) but there's been no response there
  • I've managed to native build a version of XNNPACK on the ZeroW (upwards of 24 hours to build) with the following:
    bash ./scripts/build-local.sh -DXNNPACK_ENABLE_ARM_DOTPROD:BOOL=OFF
  • One of the DOTPROD kernels has Neon assembly instruction which causes the build to fail, this the need for XNNPACK_ENABLE_ARM_DOTPROD:BOOL=OFF
  • The script runs tests against the build, but hasn't completed the 3rd test for FP32 MobileNet v3 Large even after 24 hours of running, which leads me believe that it might be stuck instead of taking time (the expected time on the Zero/Zero W is less that for FP32 MobileNet v2 1.0X which completes in less than 3 seconds)
  • In light of the above sticking point, I'm hoping I get some response from the XNNPACK team, even a 'no longer possible' one, which should give some clarity.

@pjpratik
Copy link
Contributor

@samveen Thanks for the information.

@sachinprasadhs Could you please look into this issue?

Thanks.

@pjpratik pjpratik assigned sachinprasadhs and unassigned pjpratik Apr 21, 2023
@samveen
Copy link
Author

samveen commented Apr 23, 2023

@pjpratik @sachinprasadhs I've created an issue against XNNPACK with a lot more details with regards to RPi0 builds - google/XNNPACK#4701 , giving the issues I've faced, the build process I followed and details of my native build environment. Hopefully that should give more insight into the underlying issues.

@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 25, 2023
@pkgoogle pkgoogle assigned pkgoogle and unassigned sachinprasadhs Aug 5, 2024
@gaikwadrahul8
Copy link
Contributor

Hi, @samveen

Thanks for raising this issue. Are you aware of the migration to LiteRT? This transition is aimed at enhancing our project's capabilities and providing improved support and focus for our users. As we believe this issue is still relevant to LiteRT we are moving your issue there. Please follow progress here: google-ai-edge/LiteRT#177

Let us know if you have any questions. Thanks.

@samveen
Copy link
Author

samveen commented Dec 2, 2024

Closing this as the issue is now being tracked at google-ai-edge/LiteRT#177

@samveen samveen closed this as completed Dec 2, 2024
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues comp:lite-xnnpack TensorFlow Lite XNNPack related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.12 For issues related to Tensorflow 2.12 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

7 participants