Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting a Bug] Error when prompt contains "!" #94

Open
sk-uma opened this issue Jun 21, 2024 · 3 comments
Open

[Reporting a Bug] Error when prompt contains "!" #94

sk-uma opened this issue Jun 21, 2024 · 3 comments

Comments

@sk-uma
Copy link

sk-uma commented Jun 21, 2024

When using SDXL, an error will occur if a certain prompt contains too many "!" characters.

The minimal code that reproduces the problem is below.

from diffusers import StableDiffusionXLPipeline
import torch
from compel import Compel, ReturnedEmbeddingsType


prompt = "!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!"

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "cagliostrolab/animagine-xl-3.1",
    torch_dtype=torch.float16, 
    use_safetensors=True, 
).to("cuda")

compel = Compel(
    tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
    text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[False, True],
    truncate_long_prompts=False,
)

compel.build_conditioning_tensor(prompt)

At this time, the following error occurs.

Traceback (most recent call last):
  File "/home/sk-uma/create_dataset/test_compel.py", line 26, in <module>
    compel.build_conditioning_tensor(prompt)
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 112, in build_conditioning_tensor
    conditioning, _ = self.build_conditioning_tensor_for_conjunction(conjunction)
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 186, in build_conditioning_tensor_for_conjunction
    this_conditioning, this_options = self.build_conditioning_tensor_for_prompt_object(p)
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 218, in build_conditioning_tensor_for_prompt_object
    return self._get_conditioning_for_flattened_prompt(prompt), {}
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 282, in _get_conditioning_for_flattened_prompt
    return self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments(
  File "/opt/conda/lib/python3.10/site-packages/compel/embeddings_provider.py", line 535, in get_embeddings_for_weighted_prompt_fragments
    text_embeddings = torch.cat(text_embeddings_list, dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 77 but got size 154 for tensor number 1 in the list.

The problem is due to the difference between SDXL's tokenizer and tokenizer_2.
The problem is that pad_token of tokenizer is <|endoftext|>, while pad_token of tokenizer_2 is !.
These tokenizers also treat consecutive !s as one token.
For this reason, the number of tokens in the processing results of tokenizer and tokenizer_2 is different and an error occurs.

The simplest solution is to load a similar tokenizer.

compel = Compel(
    tokenizer=[pipeline.tokenizer, pipeline.tokenizer],
    text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[False, True],
    truncate_long_prompts=False,
)
@Clement-Lelievre
Copy link

Clement-Lelievre commented Jul 31, 2024

using compel==2.0.3 (latest as of today)

I've had a similar case where the part of the prompt that's tokenized differently across both SDXL tokenizers is !'
tokenizer 1 encodes it as [13222]
tokenizer 2 encodes it as [0, 262]

However, just because the tokens sequences lengths are different and both above 77 doesn't mean that the call to compel will raise an error like the RuntimeError above, I've had plenty of counter-examples. For example, the prompt:
"!'"*50 builds sequenes of tokens respectively of lengths 101 and 102 but works without error.

So there's something else that breaks it, It is the shape of the embeddings, typically the mismatch will be 77 vs 154 (2*77).

This method is responsible : https://github.com/damian0815/compel/blob/v2.0.3/src/compel/embeddings_provider.py#L282

The call to compel breaks when:

  • using tokenizers that have identical vocabularies but differing pad tokens (eg SDXL's tokenizers)
  • one tokenizer produces a token sequence length n that has n // 77 = k
  • the other tokenizer produces, from the same text prompt, a token sequence length different from n, say m, such that m // 77 != k (typically k+1 or k-1)

In this case, one exits the while loop referenced above with a shape of 77xk while the other one will exit the while loop with shape 77*(k+1) or 77*(k-1), and that will be breaking later, specifically in get_embeddings_for_weighted_prompt_fragments, when attempting text_embeddings = torch.cat(text_embeddings_list, dim=-1)

Having said that, the responsibility does not seem to be on compel's side nor on transformer's side (the library holding the CLIPTokenizer class). It seems to come from, as stated above, the fact that the pad_tokens differ across both tokenizers, as can be seen in the tokenizers config files:

cc @damian0815

@damian0815
Copy link
Owner

good work and thank you for tracking down the numbers. i've long suspected there's some weird edge case issue but haven't had a good repro for it

@Clement-Lelievre
Copy link

Clement-Lelievre commented Aug 2, 2024

good work and thank you for tracking down the numbers. i've long suspected there's some weird edge case issue but haven't had a good repro for it

Hi @damian0815 below is a repro snippet:

from compel import Compel
from diffusers import StableDiffusionXLPipeline

pipeline = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")

compel = Compel(
    tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
    text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
    truncate_long_prompts=False,
)
prompt = "3" * 74 + "!'"
compel([prompt])

This len-76 prompt will have 77 tokens for tokenizer 1 and 78 with tokenizer 2, hence the latter will produce two batches of 77 ie a shape of 154 vs 77.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants