Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFLite Converter, add possibility to ignore some OPs from quantization #387

Open
gaikwadrahul8 opened this issue Nov 27, 2024 · 1 comment
Assignees

Comments

@gaikwadrahul8
Copy link

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

v2.13.0-17-gf841394b1b7

Custom code

No

OS platform and distribution

No response

Mobile device

No response

Python version

3.10.13

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Quantizing models to integer works as expected, but because some of the final operations work in INT8, a large accuracy drop can be observed in some models.

This ticket is a feature request to be able to exclude specific operations from quantization and execute in FP32. OpenVINO supports this feature as ignored_scope param during quantization. Link to OpenVINO quantizer documentation. Considering how Edge TPU works, the solution should be to set where to stop quantization and execute the rest of the OPs in FP32 on the CPU.

Lets take yolov8n as an example and convert the pytorch model to TF using onnx2tf. Lets compare the main branch in FULL INT8 quantization, with a dirty hack by detaching the last operations and executing as INT8 + FP32. As a note, Edge TPU compiled models larger than 192pixel input execute the head on the CPU as some Transpose operations are too large for the TPU.

Model yolo8n mAP50 mAP50-95 Note Speed on Intel CPU
Baseline FP32 52.6 37.4 Main branch N/A
TFLite Full INT8 48.8 32.9 per-tensor 162.2 ms
TFLite INT8 + FP32 50.3 35.2 per-tensor 166.0ms
TFLite Full INT8 49.8 33.9 per-channel N/A
TFLite INT8 + FP32 51.4 36.3 per-channel N/A

Standalone code to reproduce the issue

https://github.com/adamp87/ultralytics/blob/tflite_detach_dirty/yolo8_full_int8_nohead_test.ipynb

Relevant log output

No response

@gaikwadrahul8
Copy link
Author

This issue originally reported by @adamp87 has been moved to this dedicated repository for ai-edge-torch to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.

We appreciate your understanding and look forward to your continued involvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant