You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to know the text input capacity of eva-clip-18b? To my knowledge, OpenAI's CLIP requires less than 20 tokens / 我想了解一下 eva-clip-18b 文本输入容量是多少?据我了解,OpenAI 的 CLIP 的话低于20 个 token #165
I would like to know the text input capacity of eva-clip-18b? To my knowledge, OpenAI's CLIP requires less than 20 tokens
OpenAI's CLIP has two major shortcomings:
The text input capacity is very limited. At most, it only supports input of 77 tokens. According to LongCLIP's experiment, its effective input does not exceed 20 tokens.
Poor performance in pure text retrieval. There are two main reasons: firstly, the training objective of the CLIP model is to align text and images, without specialized optimization for pure text retrieval. Secondly, the training data for the CLIP model mainly consists of relatively short texts, making it difficult to generalize to broader text retrieval scenarios.
I don't know if eva-clip-18b has any restrictions like openia-clip for text retrieval?