-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reporting a Bug] Error when prompt contains "!" #94
Comments
using I've had a similar case where the part of the prompt that's tokenized differently across both SDXL tokenizers is However, just because the tokens sequences lengths are different and both above 77 doesn't mean that the call to compel will raise an error like the RuntimeError above, I've had plenty of counter-examples. For example, the prompt: So there's something else that breaks it, It is the shape of the embeddings, typically the mismatch will be 77 vs 154 (2*77). This method is responsible : https://github.com/damian0815/compel/blob/v2.0.3/src/compel/embeddings_provider.py#L282 The call to compel breaks when:
In this case, one exits the while loop referenced above with a shape of 77xk while the other one will exit the while loop with shape 77*(k+1) or 77*(k-1), and that will be breaking later, specifically in Having said that, the responsibility does not seem to be on compel's side nor on transformer's side (the library holding the
cc @damian0815 |
good work and thank you for tracking down the numbers. i've long suspected there's some weird edge case issue but haven't had a good repro for it |
Hi @damian0815 below is a repro snippet: from compel import Compel
from diffusers import StableDiffusionXLPipeline
pipeline = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
compel = Compel(
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
truncate_long_prompts=False,
)
prompt = "3" * 74 + "!'"
compel([prompt]) This len-76 prompt will have 77 tokens for tokenizer 1 and 78 with tokenizer 2, hence the latter will produce two batches of 77 ie a shape of 154 vs 77. |
When using SDXL, an error will occur if a certain prompt contains too many "!" characters.
The minimal code that reproduces the problem is below.
At this time, the following error occurs.
The problem is due to the difference between SDXL's
tokenizer
andtokenizer_2
.The problem is that
pad_token
oftokenizer
is<|endoftext|>
, whilepad_token
oftokenizer_2
is!
.These tokenizers also treat consecutive
!
s as one token.For this reason, the number of tokens in the processing results of
tokenizer
andtokenizer_2
is different and an error occurs.The simplest solution is to load a similar tokenizer.
The text was updated successfully, but these errors were encountered: