You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Different from the popular shallow-align method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. CogVLM enables deep fusion of visual language features without sacrificing any performance on NLP tasks.
Can LLaVA implement the techniques CogVLM uses to improve the model? I don't understand the technicalities of this but if anybody here wants to comment.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
CogVLM
Can LLaVA implement the techniques CogVLM uses to improve the model? I don't understand the technicalities of this but if anybody here wants to comment.
Beta Was this translation helpful? Give feedback.
All reactions