-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well #757
base: main
Are you sure you want to change the base?
Conversation
…ctions in LLM_finetuning_overview.md as well
@himanshushukla12 is our community legend! Thanks for another PR! :) |
@init27 Thank you for the recognition😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for another PR that is super helpful for our users to their fine-tuned Lora checkpoints for inference. I noticed that for vision model we are having 2 inference script: multi_modal_infer.py and multi_modal_infer_gradio_UI.py (Thanks to your help!). I wonder if it is better to add the Lora ability on top of them, instead of creating a new script. Having 3 inference scripts for vision model may be a little confusing for new users. Maybe later when I have time, we can work together to also merge all the inference scripts under local_inference into one script that can handle both text model and vision model, and have options for --gradio_ui and --lora_adaptor. Ideally it will be much easier for user to just learn one script that can handle everything. Let me know if you have any suggestion! Thank you again for this great PR.
…ng_overview.md to local_inference/README.md
…nferencing, 3. checkpoint inferencing
…2. gradio inferencing, 3. checkpoint inferencing in UI/CLI
@wukaixingxp All changes are successfully implemented and is described below Model Overview
Key Features in
All functionality has been consolidated into a single file with three main modes:
python multi_modal_infer.py \
--image_path "path/to/image.jpg" \
--prompt_text "Describe this image" \
--model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
--hf_token "your_token"
python multi_modal_infer.py \
--model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
--hf_token "your_token" \
--gradio_ui
python multi_modal_infer.py \
--image_path "path/to/image.jpg" \
--prompt_text "Describe this image" \
--model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
--hf_token "your_token" \
--finetuning_path "path/to/lora/weights" Key Improvements
Kindly let me know if there is something left. We can figure it out. |
Test of all the three modes
python multi_modal_infer.py \
--image_path "path/to/image.jpg" \
--prompt_text "Describe this image" \
--model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
--hf_token "your_token" Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.30it/s]
Generated Text: end_header_id|>
The image presents a complex network diagram comprising 10 nodes, each represented by a distinct colored square... (continued)
python multi_modal_infer.py \
--model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
--hf_token "your_token" \
--gradio_ui Output: Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.32it/s]
/home/llama-recipes/venvLlamaRecipes/lib/python3.10/site-packages/gradio/components/chatbot.py:222: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
warnings.warn(
* Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
python multi_modal_infer.py \
--image_path "path/to/image.jpg" \
--prompt_text "Describe this image" \
--model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
--hf_token "your_token" \
--finetuning_path "path/to/lora/weights" Output: Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.31it/s]
Loading adapter from '/home/llama-recipes/PATH2/to/save/PEFT/model/'...
Adapter merged successfully
Generated Text: end_header_id|> |
@wukaixingxp All changes are done, can you please check |
What does this PR do?
This PR adds detailed instructions for using the
multi_modal_infer.py
script to generate text from images after fine-tuning the Llama 3.2 vision model. The script supports merging PEFT adapter weights from a specified path. The changes include:LLM_finetuning_overview.md
file under the "Inference" heading.
Fixes # (issue)
Feature/Issue validation/testing
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
code-merge-inference.py
script runs successfully with the provided example command.Logs for Test A:
Output:
Before submitting
Thanks for contributing 🎉!