-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add truncation to CLIP model #2969
Conversation
@tomaarsen would you be willing to take a loog at this. Happy to address any comments to get this merged. |
Hello! Big apologies for missing this PR when it was opened. Because of this, I propose to only use In short: the tokenizer maximum length is now abided by, and updating the maximum sequence length is now only possible by updating What do you think?
|
Thanks a lot for taking a look. What you say generally makes sense. I am only concerned about the fact there is a getter and setter for sentence-transformers/sentence_transformers/SentenceTransformer.py Lines 1747 to 1770 in 3fd59c3
Would it maybe be better to add a getter and setter to this class that in turn gets and sets |
I pushed a commit that adds a getter and setter. Let me know what you think. PS: I verified that |
It seems that sentence transformer does currently not support truncation on CLIP models, which leads to an error when calling
SentenceTransformer("sentence-transformers/clip-ViT-L-14").encode("my long text" * 26)
This PR adds support for truncation to the clip model. Allowing to pass in the standard sentence transformer
max_seq_length
argument and defaulting it toprocessor.tokenizer.model_max_length
(i.e. 77).