The new SOTA model in town - CogVLM #576

aiaicode · 2023-10-14T13:38:47Z

aiaicode
Oct 14, 2023

Different from the popular shallow-align method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. CogVLM enables deep fusion of visual language features without sacrificing any performance on NLP tasks.

CogVLM

Can LLaVA implement the techniques CogVLM uses to improve the model? I don't understand the technicalities of this but if anybody here wants to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The new SOTA model in town - CogVLM #576

{{title}}

Replies: 0 comments

Select a reply

The new SOTA model in town - CogVLM #576

aiaicode Oct 14, 2023

Replies: 0 comments

aiaicode
Oct 14, 2023