You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you reproduced the bug with TensorFlow Nightly?
No
Source
binary
TensorFlow version
v2.13.0-17-gf841394b1b7
Custom code
No
OS platform and distribution
No response
Mobile device
No response
Python version
3.10.13
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Quantizing models to integer works as expected, but because some of the final operations work in INT8, a large accuracy drop can be observed in some models.
This ticket is a feature request to be able to exclude specific operations from quantization and execute in FP32. OpenVINO supports this feature as ignored_scope param during quantization. Link to OpenVINO quantizer documentation. Considering how Edge TPU works, the solution should be to set where to stop quantization and execute the rest of the OPs in FP32 on the CPU.
Lets take yolov8n as an example and convert the pytorch model to TF using onnx2tf. Lets compare the main branch in FULL INT8 quantization, with a dirty hack by detaching the last operations and executing as INT8 + FP32. As a note, Edge TPU compiled models larger than 192pixel input execute the head on the CPU as some Transpose operations are too large for the TPU.
This issue originally reported by @adamp87 has been moved to this dedicated repository for ai-edge-torch to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.
We appreciate your understanding and look forward to your continued involvement.
Issue type
Feature Request
Have you reproduced the bug with TensorFlow Nightly?
No
Source
binary
TensorFlow version
v2.13.0-17-gf841394b1b7
Custom code
No
OS platform and distribution
No response
Mobile device
No response
Python version
3.10.13
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Quantizing models to integer works as expected, but because some of the final operations work in INT8, a large accuracy drop can be observed in some models.
This ticket is a feature request to be able to exclude specific operations from quantization and execute in FP32. OpenVINO supports this feature as
ignored_scope
param during quantization. Link to OpenVINO quantizer documentation. Considering how Edge TPU works, the solution should be to set where to stop quantization and execute the rest of the OPs in FP32 on the CPU.Lets take yolov8n as an example and convert the pytorch model to TF using onnx2tf. Lets compare the main branch in FULL INT8 quantization, with a dirty hack by detaching the last operations and executing as INT8 + FP32. As a note, Edge TPU compiled models larger than 192pixel input execute the head on the CPU as some Transpose operations are too large for the TPU.
Standalone code to reproduce the issue
Relevant log output
No response
The text was updated successfully, but these errors were encountered: