Skip to content

Conversation

@nguyencongtuyenlp
Copy link

@nguyencongtuyenlp nguyencongtuyenlp commented Sep 9, 2025

👋 Hi @haotian-liu, this PR is ready for review.

Adds flexible projector options (Linear, 2-layer MLP, lightweight Attention).

Keeps backward compatibility with linear.

Verified initialization and training for all projector types without breaking existing configs.

Would appreciate your feedback when you have time 🙏

This PR introduces a new projectors.py module and updates llava_arch.py to allow flexible selection of projector types for image-text alignment.

  • Added support for Linear, 2-layer MLP, and lightweight Attention-based projectors.
  • Updated LlavaMetaModel to use the new build_projector() API.
  • Kept backward compatibility (linear as default).

Motivation:
Different projector architectures may improve performance depending on dataset size and modality alignment. This change allows the community to experiment more easily.

Tested:

  • Verified training initializes with each projector type.
  • Code runs without breaking existing configs.

Fixes #1884

@nguyencongtuyenlp nguyencongtuyenlp changed the title Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment [Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Usage] Mismatch between mm_projector in saved config and mm_projector used in code.

1 participant