-
Notifications
You must be signed in to change notification settings - Fork 39
llava conversion fixes #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return { | ||
| "projector_hidden_act": config.activation.hf_name, | ||
| "multimodal_projector_bias": config.add_linear_biases, | ||
| # Not in LlavaConfig, but needed for consistency check in LlavaBaseModelConverter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing? This is essential to ensure compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to ensure compatibility with what?
As stated in the comment, this is not in LlavaConfig. And it caused issues when trying to load Apriel-1.5, where it would set the default value for this param and fail in the assertion here https://github.com/ServiceNow/Fast-LLM/pull/399/files#diff-319643f77a4055995eb8f844aee095266ba3b15fa11f52e16acd89386058e51bL314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The projector intermediate size needs to match with the LM hidden size, which is not guaranteed on the Fast-LLM size. The entry is not in the final output, it's there specifically for the assertion in https://github.com/ServiceNow/Fast-LLM/pull/399/files#diff-319643f77a4055995eb8f844aee095266ba3b15fa11f52e16acd89386058e51bL314. A failing assertion points to an actual error elsewhere.
What do you mean by "load Apriel-1.5"? Shouldn't that go through import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess this could be due to the bug you fixed above, where the intermediate size was set incorrectly on import?
| return [ | ||
| *cls.embeddings_converter_class.get_converters( | ||
| config.embeddings, "vision_encoder.embeddings", "model.vision_tower" | ||
| config.embeddings, "vision_encoder.embeddings", "vision_tower" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these changes? The current names are required for LlavaForConditionalGeneration and confirmed to work. The model prefix is explicitly needed for LlavaForConditionalGeneration https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/modeling_llava.py#L316and the language model is a MistralModel which takes no model prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm indeed, it's strange.
Without all these changes, we're not able to load https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker/tree/main in fast-llm. The weights in that model somehow match this different format with language_model.model...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must have something to do with _checkpoint_conversion_mapping:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/modeling_llava.py#L306-L311
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that there are two equivalent ways to see the model. It can either be a LlavaForConditionalGeneration with a MistralModel text model, or a LlavaModel with a MistralForCausalLM. Main exports in the first format, but the dev branch seems to use the second one, though is still uses LlavaForConditionalGeneration as the architecture (maybe _checkpoint_conversion_mapping addresses the mismatch?)
I'd think the first option is more appropriate, but I could be wrong. Maybe we could just support both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. From what I understand, this _checkpoint_conversion_mapping is something they made for backward compatibility. So indeed I think you're right that the first option is the right one, but our Apriel-1.5 checkpoint uses this older format.
How should we support both cases? Shall we create a new format called llava_legacy or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work.
✨ Description
Small changes to be able to load Apriel-1.5-15B:
projector_intermediate_sizefrom llava_hybrid and llava converter (use text-config's hidden-size, like in llava)geluactivationWith these changes, I am able to load Apriel-1.5-15B in fast-llm.
This also points to the issue that these were not caught by the conversion tests.