Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3.2-1B .task export does not work in mediapipe #411

Open
mikel-brostrom opened this issue Dec 10, 2024 · 7 comments
Open

Llama3.2-1B .task export does not work in mediapipe #411

mikel-brostrom opened this issue Dec 10, 2024 · 7 comments
Assignees

Comments

@mikel-brostrom
Copy link

mikel-brostrom commented Dec 10, 2024

Description of the bug:

I exported my Llama3.2-BB to .task using:

# pip installs
pip install --upgrade pip setuptools wheel
pip install ai-edge-torch-nightly
pip install ai-edge-quantizer-nightly
pip install transformers
pip install tensorflow-cpu

# tflite & task export
curl -o verify.py https://raw.githubusercontent.com/google-ai-edge/ai-edge-torch/refs/heads/main/ai_edge_torch/generative/examples/llama/verify.py
curl -o convert_to_tflite.py https://raw.githubusercontent.com/google-ai-edge/ai-edge-torch/refs/heads/main/ai_edge_torch/generative/examples/llama/convert_to_tflite.py
curl -o tokenizer_to_sentencepiece.py https://raw.githubusercontent.com/google-ai-edge/ai-edge-torch/main/ai_edge_torch/generative/tools/tokenizer_to_sentencepiece.py
python3 verify.py
python3 tokenizer_to_sentencepiece.py --checkpoint=meta-llama/Llama-3.2-1B-Instruct --output_path=/tmp/llama3.spm.model
python3 convert_to_tflite.py --checkpoint_path '/root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-1B-Instruct/snapshots/9213176726f574b556790deb65791e0c5aa438b6'
python3 -c "from mediapipe.tasks.python.genai import bundler; \
config = bundler.BundleConfig(tflite_model='/tmp/llama_1b_q8_ekv1280.tflite', \
                      tokenizer_model='/tmp/llama3.spm.model', \
                      start_token='<|begin_of_text|>', \
                      stop_tokens=['<|end_of_text|>'], \
                      output_filename='/tmp/llama_1b_q8_ekv1280.task', \
                      enable_bytes_to_unicode_mapping=False); \
print('Configuration created:', config); \
bundler.create_bundle(config)"

When loading the .task model into into the mediapipe app I get:

Error -
internal: Failed to initialize session: %sINTERNAL: CalculatorGraph::Run() failed: Calculator::Open() for node "odml.infra.TfLitePrefillDecodeRunnerCalculator" failed; RET_CHECK failure (external/odml/odml/infra/genai/inference/utils/tflite_utils/tflite_llm_utils.cc:59) std::find_if(signature_keys.begin(), signature_keys.end(), [&](const std::string* key) { return *key == required_key; }) != signature_keys.end()

Am I missing something?

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

@pkgoogle pkgoogle self-assigned this Dec 10, 2024
@pkgoogle
Copy link
Contributor

Hi @mikel-brostrom, can you ensure you followed the steps here? https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference#pytorch-models . Let me know if that works for you.

@pkgoogle pkgoogle added status:awaiting user response When awaiting user response status:more data needed This label needs to be added to stale issues and PRs. type:support For use-related issues and removed type:bug Bug labels Dec 10, 2024
@chienhuikuo
Copy link

I have the same error log. I’ve already used the task bundler to convert the TFLite model to a .task file.
I also referred to this ai-edge-torch issue and cloned the latest repository to run the convert_to_tflite.py script, but the error message still persists as follows:

E0000 00:00:1733896874.996735   21469 calculator_graph.cc:898] INTERNAL: CalculatorGraph::Run() failed: 
Calculator::Open() for node "odml.infra.TfLitePrefillDecodeRunnerCalculator" failed: ; RET_CHECK failure (external/odml/odml/infra/genai/inference/utils/tflite_utils/tflite_llm_utils.cc:59) std::find_if(signature_keys.begin(), signature_keys.end(), [&](const std::string* key) { return *key == required_key; }) != signature_keys.end()

@mikel-brostrom
Copy link
Author

mikel-brostrom commented Dec 11, 2024

I updated my comment with the full script I am running @pkgoogle, @chienhuikuo. It clarifies my export workflow and hopefully makes this reproducible

@mikel-brostrom mikel-brostrom changed the title Llama3.2-1B TFLite export does not work in mediapipe Llama3.2-1B .task export does not work in mediapipe Dec 11, 2024
@mikel-brostrom mikel-brostrom changed the title Llama3.2-1B .task export does not work in mediapipe Llama3.2-1B .task export does not work in mediapipe Dec 11, 2024
@talumbau
Copy link
Contributor

I haven't been able to reproduce yet. The error message seems to indicate that the resulting TF Lite file (which is wrapped in the .task file) does not have a signature called decode. This would be strange if it were the case because the conversion code explicitly creates a decode signature. Still investigation.

@mikel-brostrom
Copy link
Author

mikel-brostrom commented Dec 12, 2024

My understanding is that the signature stuff happens here:

python3 convert_to_tflite.py

@krishna1870
Copy link

krishna1870 commented Dec 12, 2024

Description of the bug:

I exported my Llama3.2-BB to .task using:

curl -o verify.py https://raw.githubusercontent.com/google-ai-edge/ai-edge-torch/refs/heads/main/ai_edge_torch/generative/examples/llama/verify.py
curl -o convert_to_tflite.py https://raw.githubusercontent.com/google-ai-edge/ai-edge-torch/refs/heads/main/ai_edge_torch/generative/examples/llama/convert_to_tflite.py
curl -o tokenizer_to_sentencepiece.py https://raw.githubusercontent.com/google-ai-edge/ai-edge-torch/main/ai_edge_torch/generative/tools/tokenizer_to_sentencepiece.py
python3 verify.py
python3 tokenizer_to_sentencepiece.py --checkpoint=meta-llama/Llama-3.2-1B-Instruct --output_path=/tmp/llama3.spm.model
python3 convert_to_tflite.py --checkpoint_path '/root/.cache/huggingface/hub/models--meta-llama--Llama-3.2-1B-Instruct/snapshots/9213176726f574b556790deb65791e0c5aa438b6'
python3 -c "from mediapipe.tasks.python.genai import bundler; \
config = bundler.BundleConfig(tflite_model='/tmp/llama_1b_q8_ekv1280.tflite', \
                      tokenizer_model='/tmp/llama3.spm.model', \
                      start_token='<|begin_of_text|>', \
                      stop_tokens=['<|end_of_text|>'], \
                      output_filename='/tmp/llama_1b_q8_ekv1280.task', \
                      enable_bytes_to_unicode_mapping=False); \
print('Configuration created:', config); \
bundler.create_bundle(config)"

When loading the .task model into into the mediapipe app I get:

Error -
internal: Failed to initialize session: %sINTERNAL: CalculatorGraph::Run() failed: Calculator::Open() for node "odml.infra.TfLitePrefillDecodeRunnerCalculator" failed; RET_CHECK failure (external/odml/odml/infra/genai/inference/utils/tflite_utils/tflite_llm_utils.cc:59) std::find_if(signature_keys.begin(), signature_keys.end(), [&](const std::string* key) { return *key == required_key; }) != signature_keys.end()

Am I missing something?

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

Hey mikel, i am trying to run the script you mentioned here in google colab. I am getting error "File "/content/verify.py", line 23, in
from ai_edge_torch.generative.examples.llama import llama
ModuleNotFoundError: No module named 'ai_edge_torch.generative.examples.llama'". How are you able to run this file ?.I am getting this error even i have done pip install ai_edge_torch.

@mikel-brostrom
Copy link
Author

mikel-brostrom commented Dec 12, 2024

Added the pip installs I am running to the original comment @krishna1870, @talumbau

@pkgoogle pkgoogle added status:awaiting ai-edge-developer and removed status:awaiting user response When awaiting user response status:more data needed This label needs to be added to stale issues and PRs. labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants