Skip to content

[WIP] [LoRA] support omi hidream lora. #11660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Jun 5, 2025

What does this PR do?

Check #11653.

This PR isn't at all ready. But opening up to discuss some doubts. Currently, this PR is only aimed at supporting the transformer components of the LoRA state dict (other components will be iterated in this PR itself).

I tried with the following code on top of this PR:

Expand
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
from diffusers import HiDreamImagePipeline


text_encoder_4 = LlamaForCausalLM.from_pretrained(
    "terminusresearch/hidream-i1-llama-3.1-8b-instruct",
    subfolder="text_encoder_4",
    output_hidden_states=True,
    output_attentions=True,
    torch_dtype=torch.bfloat16,
).to("cuda", dtype=torch.bfloat16)
tokenizer_4 = AutoTokenizer.from_pretrained(
    "terminusresearch/hidream-i1-llama-3.1-8b-instruct",
    subfolder="tokenizer_4",
)
pipe = HiDreamImagePipeline.from_pretrained(
    "HiDream-ai/HiDream-I1-Dev",
    text_encoder_4=text_encoder_4,
    tokenizer_4 = tokenizer_4,
    torch_dtype=torch.bfloat16,

).to("cuda")
pipe.load_lora_weights(f"RhaegarKhan/OMI_LORA")
image = pipe(
    'A cat holding a sign that says "Hi-Dreams.ai".',
    height=1024,
    width=1024,
    guidance_scale=5.0,
    num_inference_steps=50,
    generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("output.png")

However, it currently leads to this problem and I am not sure what those params correspond and how they should be handled in the first place.

Additionally, the LoRA has: https://pastebin.com/diwEwtsS

image

Could you shed some details @ali-afridi26?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ali-afridi26
Copy link

ali-afridi26 commented Jun 5, 2025

Hi @sayakpaul , thank you for this PR. Judy Hopps seem to be pivotal tuning embed for the trigger word "Judy Hopps" itself. For now you may ignore the pivotal embeds but maybe in a future addition could have a helper to extract and properly add this as a embed with weightings.

@sayakpaul
Copy link
Member Author

@ali-afridi26 thanks. I had guessed that to be the case but what about the others as shown in https://pastebin.com/diwEwtsS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants