All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well #757

himanshushukla12 · 2024-10-29T17:06:28Z

What does this PR do?

This PR adds detailed instructions for using the multi_modal_infer.py script to generate text from images after fine-tuning the Llama 3.2 vision model. The script supports merging PEFT adapter weights from a specified path. The changes include:

Adding a new section in the

LLM_finetuning_overview.md

file under the "Inference" heading.

Providing a usage example for running the inference script with the necessary parameters.

Fixes # (issue)

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A: Verified that the code-merge-inference.py script runs successfully with the provided example command.
Logs for Test A:

python multi_modal_infer.py \
    --image_path "path/to/your/image.png" \
    --prompt_text "Your prompt text here" \
    --temperature 1 \
    --top_p 0.5 \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_hugging_face_token" \
    --finetuning_path "path/to/your/finetuned/model"

Output:

Loading checkpoint shards: 100%|██████████████████| 5/5 [00:03<00:00,  1.40it/s]
Loading adapter from 'PATH/to/save/PEFT/model'...
Adapter merged successfully with the pre-trained model.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Thanks for contributing 🎉!

…ctions in LLM_finetuning_overview.md as well

init27 · 2024-10-29T22:41:43Z

@himanshushukla12 is our community legend! Thanks for another PR! :)

himanshushukla12 · 2024-10-30T06:15:01Z

@init27 Thank you for the recognition😊

wukaixingxp

Thank you so much for another PR that is super helpful for our users to their fine-tuned Lora checkpoints for inference. I noticed that for vision model we are having 2 inference script: multi_modal_infer.py and multi_modal_infer_gradio_UI.py (Thanks to your help!). I wonder if it is better to add the Lora ability on top of them, instead of creating a new script. Having 3 inference scripts for vision model may be a little confusing for new users. Maybe later when I have time, we can work together to also merge all the inference scripts under local_inference into one script that can handle both text model and vision model, and have options for --gradio_ui and --lora_adaptor. Ideally it will be much easier for user to just learn one script that can handle everything. Let me know if you have any suggestion! Thank you again for this great PR.

recipes/quickstart/finetuning/LLM_finetuning_overview.md

recipes/quickstart/finetuning/code-merge-inference.py

…ng_overview.md to local_inference/README.md

…local_inference

…nferencing, 3. checkpoint inferencing

…2. gradio inferencing, 3. checkpoint inferencing in UI/CLI

himanshushukla12 · 2024-11-02T18:03:36Z

@wukaixingxp All changes are successfully implemented and is described below

Model Overview

Base model: meta-llama/Llama-3.2-11B-Vision-Instruct
Uses PEFT library (v0.13.1) for efficient fine-tuning
Supports vision-language tasks with instruction capabilities

Key Features in

multi_modal_infer.py

All functionality has been consolidated into a single file with three main modes:

Basic Inference

python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token"

Gradio UI Mode

python multi_modal_infer.py \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --gradio_ui

LoRA Fine-tuning Integration

python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --finetuning_path "path/to/lora/weights"

Key Improvements

Single file implementation instead of multiple scripts
Dynamic LoRA loading through UI toggle
Integrated model state management
Unified command line interface
Interactive web UI with parameter controls
Support for both CLI and UI-based workflows

Kindly let me know if there is something left. We can figure it out.

…mmit

himanshushukla12 · 2024-11-02T22:08:21Z

Test of all the three modes

Basic Inference

python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token"

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.30it/s]
Generated Text: end_header_id|>

The image presents a complex network diagram comprising 10 nodes, each represented by a distinct colored square... (continued)

Gradio UI Mode

python multi_modal_infer.py \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --gradio_ui

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.32it/s]
/home/llama-recipes/venvLlamaRecipes/lib/python3.10/site-packages/gradio/components/chatbot.py:222: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

LoRA Fine-tuning Integration

python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --finetuning_path "path/to/lora/weights"

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.31it/s]
Loading adapter from '/home/llama-recipes/PATH2/to/save/PEFT/model/'...
Adapter merged successfully
Generated Text: end_header_id|>

himanshushukla12 · 2024-11-06T10:36:44Z

Thank you so much for another PR that is super helpful for our users to their fine-tuned Lora checkpoints for inference. I noticed that for vision model we are having 2 inference script: multi_modal_infer.py and multi_modal_infer_gradio_UI.py (Thanks to your help!). I wonder if it is better to add the Lora ability on top of them, instead of creating a new script. Having 3 inference scripts for vision model may be a little confusing for new users. Maybe later when I have time, we can work together to also merge all the inference scripts under local_inference into one script that can handle both text model and vision model, and have options for --gradio_ui and --lora_adaptor. Ideally it will be much easier for user to just learn one script that can handle everything. Let me know if you have any suggestion! Thank you again for this great PR.

@wukaixingxp All changes are done, can you please check

himanshushukla12 added 2 commits October 29, 2024 16:56

Added fix for issue 702 and added code for that as well, added instru…

e074d33

…ctions in LLM_finetuning_overview.md as well

Merge branch 'main' of https://github.com/himanshushukla12/llama-recipes

9f4be9e

facebook-github-bot added the cla signed label Oct 29, 2024

wukaixingxp self-assigned this Oct 29, 2024

wukaixingxp self-requested a review October 30, 2024 18:33

wukaixingxp requested changes Oct 30, 2024

View reviewed changes

recipes/quickstart/finetuning/LLM_finetuning_overview.md Outdated Show resolved Hide resolved

recipes/quickstart/finetuning/code-merge-inference.py Outdated Show resolved Hide resolved

himanshushukla12 and others added 5 commits October 30, 2024 18:55

Move details of loading lora checkpoints from finetuning/LLM_finetuni…

5250a20

…ng_overview.md to local_inference/README.md

Moved the file code-merge-inference.py from fine-tuning firectory to …

4377505

…local_inference

Merge branch 'meta-llama:main' into main

69d8319

added working in single file for 1. terminal inferencing, 2. gradio i…

6b1c0d5

…nferencing, 3. checkpoint inferencing

Added complete inferencing functionality of 1. terminal inferencing, …

0e2703c

…2. gradio inferencing, 3. checkpoint inferencing in UI/CLI

Fixed spelling mistake

95f42ee

Renamed file from code-merge-inference.py to multi_modal_infer.py

2433325

himanshushukla12 marked this pull request as draft November 2, 2024 21:25

himanshushukla12 added 2 commits November 2, 2024 21:51

added working code of CLI/gradio UI/ LoRA weights merge

20dd474

fixed gradio UI during performing the tests, it is working in this co…

6b6bb37

…mmit

himanshushukla12 marked this pull request as ready for review November 2, 2024 22:10

Merge branch 'meta-llama:main' into main

06ea2a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well #757

All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well #757

himanshushukla12 commented Oct 29, 2024 •

edited

Loading

init27 commented Oct 29, 2024

himanshushukla12 commented Oct 30, 2024

wukaixingxp left a comment

himanshushukla12 commented Nov 2, 2024

himanshushukla12 commented Nov 2, 2024

himanshushukla12 commented Nov 6, 2024

All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well #757

Are you sure you want to change the base?

All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well #757

Conversation

himanshushukla12 commented Oct 29, 2024 • edited Loading

What does this PR do?

Feature/Issue validation/testing

Before submitting

init27 commented Oct 29, 2024

himanshushukla12 commented Oct 30, 2024

wukaixingxp left a comment

Choose a reason for hiding this comment

himanshushukla12 commented Nov 2, 2024

Model Overview

Key Features in

Key Improvements

himanshushukla12 commented Nov 2, 2024

Test of all the three modes

himanshushukla12 commented Nov 6, 2024

himanshushukla12 commented Oct 29, 2024 •

edited

Loading