[Enhancement] Add flexible projector selection (Linear, MLP, Attention) for multimodal alignment #1904
+75
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
👋 Hi @haotian-liu, this PR is ready for review.
Adds flexible projector options (Linear, 2-layer MLP, lightweight Attention).
Keeps backward compatibility with linear.
Verified initialization and training for all projector types without breaking existing configs.
Would appreciate your feedback when you have time 🙏
This PR introduces a new
projectors.pymodule and updatesllava_arch.pyto allow flexible selection of projector types for image-text alignment.LlavaMetaModelto use the newbuild_projector()API.linearas default).Motivation:
Different projector architectures may improve performance depending on dataset size and modality alignment. This change allows the community to experiment more easily.
Tested:
Fixes #1884