Skip to content

Conversation

@RaymondLi0
Copy link
Contributor

@RaymondLi0 RaymondLi0 commented Dec 2, 2025

✨ Description

Small changes to be able to load Apriel-1.5-15B:

  • remove projector_intermediate_size from llava_hybrid and llava converter (use text-config's hidden-size, like in llava)
  • map hf's gelu activation
  • fix some of the hf weight prefixes

With these changes, I am able to load Apriel-1.5-15B in fast-llm.
This also points to the issue that these were not caught by the conversion tests.

@RaymondLi0 RaymondLi0 marked this pull request as ready for review December 3, 2025 22:25
@RaymondLi0 RaymondLi0 changed the title Raymond/gelu act llava conversion fixes Dec 3, 2025
return {
"projector_hidden_act": config.activation.hf_name,
"multimodal_projector_bias": config.add_linear_biases,
# Not in LlavaConfig, but needed for consistency check in LlavaBaseModelConverter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing? This is essential to ensure compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure compatibility with what?
As stated in the comment, this is not in LlavaConfig. And it caused issues when trying to load Apriel-1.5, where it would set the default value for this param and fail in the assertion here https://github.com/ServiceNow/Fast-LLM/pull/399/files#diff-319643f77a4055995eb8f844aee095266ba3b15fa11f52e16acd89386058e51bL314

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The projector intermediate size needs to match with the LM hidden size, which is not guaranteed on the Fast-LLM size. The entry is not in the final output, it's there specifically for the assertion in https://github.com/ServiceNow/Fast-LLM/pull/399/files#diff-319643f77a4055995eb8f844aee095266ba3b15fa11f52e16acd89386058e51bL314. A failing assertion points to an actual error elsewhere.

What do you mean by "load Apriel-1.5"? Shouldn't that go through import?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess this could be due to the bug you fixed above, where the intermediate size was set incorrectly on import?

return [
*cls.embeddings_converter_class.get_converters(
config.embeddings, "vision_encoder.embeddings", "model.vision_tower"
config.embeddings, "vision_encoder.embeddings", "vision_tower"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these changes? The current names are required for LlavaForConditionalGeneration and confirmed to work. The model prefix is explicitly needed for LlavaForConditionalGeneration https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/modeling_llava.py#L316and the language model is a MistralModel which takes no model prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm indeed, it's strange.
Without all these changes, we're not able to load https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker/tree/main in fast-llm. The weights in that model somehow match this different format with language_model.model...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@jlamypoirier jlamypoirier Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that there are two equivalent ways to see the model. It can either be a LlavaForConditionalGeneration with a MistralModel text model, or a LlavaModel with a MistralForCausalLM. Main exports in the first format, but the dev branch seems to use the second one, though is still uses LlavaForConditionalGeneration as the architecture (maybe _checkpoint_conversion_mapping addresses the mismatch?)

I'd think the first option is more appropriate, but I could be wrong. Maybe we could just support both cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. From what I understand, this _checkpoint_conversion_mapping is something they made for backward compatibility. So indeed I think you're right that the first option is the right one, but our Apriel-1.5 checkpoint uses this older format.
How should we support both cases? Shall we create a new format called llava_legacy or something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants