Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well #757

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

himanshushukla12
Copy link
Contributor

@himanshushukla12 himanshushukla12 commented Oct 29, 2024

What does this PR do?

This PR adds detailed instructions for using the multi_modal_infer.py script to generate text from images after fine-tuning the Llama 3.2 vision model. The script supports merging PEFT adapter weights from a specified path. The changes include:

  • Adding a new section in the

LLM_finetuning_overview.md

file under the "Inference" heading.

  • Providing a usage example for running the inference script with the necessary parameters.

Fixes # (issue)

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A: Verified that the code-merge-inference.py script runs successfully with the provided example command.
    Logs for Test A:
    python multi_modal_infer.py \
        --image_path "path/to/your/image.png" \
        --prompt_text "Your prompt text here" \
        --temperature 1 \
        --top_p 0.5 \
        --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
        --hf_token "your_hugging_face_token" \
        --finetuning_path "path/to/your/finetuned/model"

Output:

Loading checkpoint shards: 100%|██████████████████| 5/5 [00:03<00:00,  1.40it/s]
Loading adapter from 'PATH/to/save/PEFT/model'...
Adapter merged successfully with the pre-trained model.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@init27
Copy link
Contributor

init27 commented Oct 29, 2024

@himanshushukla12 is our community legend! Thanks for another PR! :)

@himanshushukla12
Copy link
Contributor Author

@init27 Thank you for the recognition😊

@wukaixingxp wukaixingxp self-requested a review October 30, 2024 18:33
Copy link
Contributor

@wukaixingxp wukaixingxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for another PR that is super helpful for our users to their fine-tuned Lora checkpoints for inference. I noticed that for vision model we are having 2 inference script: multi_modal_infer.py and multi_modal_infer_gradio_UI.py (Thanks to your help!). I wonder if it is better to add the Lora ability on top of them, instead of creating a new script. Having 3 inference scripts for vision model may be a little confusing for new users. Maybe later when I have time, we can work together to also merge all the inference scripts under local_inference into one script that can handle both text model and vision model, and have options for --gradio_ui and --lora_adaptor. Ideally it will be much easier for user to just learn one script that can handle everything. Let me know if you have any suggestion! Thank you again for this great PR.

recipes/quickstart/finetuning/LLM_finetuning_overview.md Outdated Show resolved Hide resolved
recipes/quickstart/finetuning/code-merge-inference.py Outdated Show resolved Hide resolved
@himanshushukla12
Copy link
Contributor Author

@wukaixingxp All changes are successfully implemented and is described below

Model Overview

  • Base model: meta-llama/Llama-3.2-11B-Vision-Instruct
  • Uses PEFT library (v0.13.1) for efficient fine-tuning
  • Supports vision-language tasks with instruction capabilities

Key Features in

multi_modal_infer.py

All functionality has been consolidated into a single file with three main modes:

  1. Basic Inference
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token"
  1. Gradio UI Mode
python multi_modal_infer.py \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --gradio_ui
  1. LoRA Fine-tuning Integration
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --finetuning_path "path/to/lora/weights"

Key Improvements

  • Single file implementation instead of multiple scripts
  • Dynamic LoRA loading through UI toggle
  • Integrated model state management
  • Unified command line interface
  • Interactive web UI with parameter controls
  • Support for both CLI and UI-based workflows

Kindly let me know if there is something left. We can figure it out.

@himanshushukla12 himanshushukla12 changed the title Added fix for issue 702 and added code for that as well, added instructions in LLM_finetuning_overview.md as well All functionality has been consolidated into a single file and Added fix for issue 702 and added code for that as well, added instructions in LLM_finetuning_overview.md as well Nov 2, 2024
@himanshushukla12 himanshushukla12 changed the title All functionality has been consolidated into a single file and Added fix for issue 702 and added code for that as well, added instructions in LLM_finetuning_overview.md as well All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well Nov 2, 2024
@himanshushukla12 himanshushukla12 marked this pull request as draft November 2, 2024 21:25
@himanshushukla12
Copy link
Contributor Author

Test of all the three modes

  1. Basic Inference
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token"
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.30it/s]
Generated Text: end_header_id|>

The image presents a complex network diagram comprising 10 nodes, each represented by a distinct colored square... (continued)
  1. Gradio UI Mode
python multi_modal_infer.py \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --gradio_ui

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.32it/s]
/home/llama-recipes/venvLlamaRecipes/lib/python3.10/site-packages/gradio/components/chatbot.py:222: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
  1. LoRA Fine-tuning Integration
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --finetuning_path "path/to/lora/weights"

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.31it/s]
Loading adapter from '/home/llama-recipes/PATH2/to/save/PEFT/model/'...
Adapter merged successfully
Generated Text: end_header_id|>

@himanshushukla12 himanshushukla12 marked this pull request as ready for review November 2, 2024 22:10
@himanshushukla12
Copy link
Contributor Author

Thank you so much for another PR that is super helpful for our users to their fine-tuned Lora checkpoints for inference. I noticed that for vision model we are having 2 inference script: multi_modal_infer.py and multi_modal_infer_gradio_UI.py (Thanks to your help!). I wonder if it is better to add the Lora ability on top of them, instead of creating a new script. Having 3 inference scripts for vision model may be a little confusing for new users. Maybe later when I have time, we can work together to also merge all the inference scripts under local_inference into one script that can handle both text model and vision model, and have options for --gradio_ui and --lora_adaptor. Ideally it will be much easier for user to just learn one script that can handle everything. Let me know if you have any suggestion! Thank you again for this great PR.

@wukaixingxp All changes are done, can you please check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants