[Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment #1904

nguyencongtuyenlp · 2025-09-09T15:23:54Z

👋 Hi @haotian-liu, this PR is ready for review.

Adds flexible projector options (Linear, 2-layer MLP, lightweight Attention).

Keeps backward compatibility with linear.

Verified initialization and training for all projector types without breaking existing configs.

Would appreciate your feedback when you have time 🙏

This PR introduces a new projectors.py module and updates llava_arch.py to allow flexible selection of projector types for image-text alignment.

Added support for Linear, 2-layer MLP, and lightweight Attention-based projectors.
Updated LlavaMetaModel to use the new build_projector() API.
Kept backward compatibility (linear as default).

Motivation:
Different projector architectures may improve performance depending on dataset size and modality alignment. This change allows the community to experiment more easily.

Tested:

Verified training initializes with each projector type.
Code runs without breaking existing configs.

Fixes #1884

Update LLaVA: improve model architecture and training components

a99bbf7

nguyencongtuyenlp changed the title ~~Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment~~ [Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment Sep 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment #1904

[Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment #1904

nguyencongtuyenlp commented Sep 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment #1904

Are you sure you want to change the base?

[Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment #1904

Conversation

nguyencongtuyenlp commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nguyencongtuyenlp commented Sep 9, 2025 •

edited

Loading