Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to convert OpenAI hugging face TF Whisper model to int8 model #394

Open
pkgoogle opened this issue Nov 27, 2024 · 8 comments
Open
Assignees
Labels
status:awaiting ai-edge-developer type:feature For feature requests type:quantization For issues related to quantization

Comments

@pkgoogle
Copy link
Contributor

Description of the bug:

Original Issue: tensorflow/tensorflow#58451
Opening on behalf of @nyadla-sys

1. System information

  • Linux Ubuntu 16.04:

2. Code

Provide code to help us reproduce your issues using one of the following options:

Option A: Reference colab notebooks

Reference [TensorFlow Lite Model Colab]

Option B: Paste your code here or provide a link to a custom end-to-end colab

https://colab.research.google.com/drive/1rApSDy3KMoMMaK3SIQwvu21yPas2VFjx?usp=sharing

3. Failure after conversion

  • Model produces correct results with hybrid model.
  • Colab session is getting crashed with int8 model

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

@gaikwadrahul8
Copy link

This issue originally reported by @nyadla-sys has been moved to this dedicated repository for ai-edge-torch to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.

We appreciate your understanding and look forward to your continued involvement.

@pkgoogle
Copy link
Contributor Author

pkgoogle commented Dec 18, 2024

Hi @nyadla-sys, I was able to accomplish this, with this library like this:

import torch
import whisper

import ai_edge_torch
import tensorflow as tf


model = whisper.load_model("turbo")

mel_shape = (1, 128, 3000)
tokens_shape = (1, 448)

sample_input = (torch.randn(mel_shape), torch.randint(low=0, high=51865, size=tokens_shape))

def representative_data_gen():
  for _ in range(100):
    yield [torch.randn((1, 128, 3000)).numpy(), torch.randint(0, 51865, (1, 448)).numpy()]

tfl_converter_flags = {
  'optimizations': [tf.lite.Optimize.DEFAULT],
  'representative_dataset': representative_data_gen,
  'target_spec.supported_ops': [tf.lite.OpsSet.TFLITE_BUILTINS_INT8],
  'inference_input_type': tf.uint8,
  'inference_output_type': tf.uint8,
}

edge_model = ai_edge_torch.convert(model.eval(),sample_input, _ai_edge_converter_flags=tfl_converter_flags)
edge_model.export("whisper.tflite")

You may wish to sample your real dataset to produce the representative dataset. Let me know if that works for you. I should note this also took me ~6 hours on 64 cores. So you may need to wait awhile (or use a smaller model).

@pkgoogle pkgoogle added status:awaiting user response When awaiting user response type:support For use-related issues labels Dec 18, 2024
@pkgoogle pkgoogle added the type:quantization For issues related to quantization label Dec 18, 2024
@nyadla-sys
Copy link

For tiny model, use the below script and it is still in experiment stage and will update as soon as i have result.

!pip install git+https://github.com/google-ai-edge/ai-edge-torch.git
!pip install git+https://github.com/openai/whisper.git 
import torch
import whisper

import ai_edge_torch
import tensorflow as tf


model = whisper.load_model("tiny.en")

mel_shape = (1, 80, 3000)
tokens_shape = (1, 448)

sample_input = (torch.randn(mel_shape), torch.randint(low=0, high=51865, size=tokens_shape))

def representative_data_gen():
  for _ in range(100):
    yield [torch.randn((1, 80, 3000)).numpy(), torch.randint(0, 51865, (1, 448)).numpy()]

tfl_converter_flags = {
  'optimizations': [tf.lite.Optimize.DEFAULT],
  'representative_dataset': representative_data_gen,
  'target_spec.supported_ops': [tf.lite.OpsSet.TFLITE_BUILTINS_INT8],
  'inference_input_type': tf.uint8,
  'inference_output_type': tf.uint8,
}

edge_model = ai_edge_torch.convert(model.eval(),sample_input, _ai_edge_converter_flags=tfl_converter_flags)
edge_model.export("whisper.tflite")

Please change representative_date_gen() with the below implementation

def representative_dataset():
    for _ in range(1):#Change this to 100 and provide 100 different audio files from known dataset like libri dataset 
      mel_from_file = log_mel_spectrogram('/content/whisper/tests/jfk.flac')
      segment = pad_or_trim(mel_from_file, N_FRAMES)
      segment = tf.expand_dims(segment, 0)
      print(segment.shape)
      yield [segment]

@nyadla-sys
Copy link

nyadla-sys commented Dec 18, 2024

@pkgoogle
While running inference with the newly generated INT8 model using the script mentioned above, I encountered the following error:

./minimal /home/nyadla/whisper.tflite/whisper.tflite ../samples/jfk.wav

n_vocab:50256

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: /home/nyadla/whisper.tflite/whisper_native/tensorflow_src/tensorflow/lite/kernels/embedding_lookup.cc:77 output->type == kTfLiteFloat32 was not true.
ERROR: Node number 242 (EMBEDDING_LOOKUP) failed to prepare.
ERROR: Failed to apply the default TensorFlow Lite delegate indexed at 0.
Error at /home/nyadla/whisper.tflite/whisper_native/tensorflow_src/tensorflow/lite/examples/minimal/minimal.cc:176

@nyadla-sys
Copy link

@pkgoogle can you please share the working whisper tflite model (int8 model)

@pkgoogle
Copy link
Contributor Author

Hi @nyadla-sys, It's too large.. (634 MB) even when compressed.. can you share your exact .wav file? I think I can reproduce it on my end.

@pkgoogle
Copy link
Contributor Author

pkgoogle commented Dec 19, 2024

Thanks @nyadla-sys, I was able to reproduce with just running the minimal program actually:

./minimal whisper.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: xxxxxxxx/git/tensorflow/tensorflow/lite/kernels/embedding_lookup.cc:77 output->type == kTfLiteFloat32 was not true.
ERROR: Node number 1698 (EMBEDDING_LOOKUP) failed to prepare.
ERROR: Failed to apply the default TensorFlow Lite delegate indexed at 0.
Error at xxxxxxxx/git/tensorflow/tensorflow/lite/examples/minimal/minimal.cc:62

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/embedding_lookup.cc#L77

It seems this op only supports float32 output for now.

@pkgoogle pkgoogle added type:bug Bug status:awaiting ai-edge-developer type:feature For feature requests and removed status:awaiting user response When awaiting user response type:support For use-related issues type:bug Bug labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:awaiting ai-edge-developer type:feature For feature requests type:quantization For issues related to quantization
Projects
None yet
Development

No branches or pull requests

3 participants