Skip to content

Question about attention mask of text_hidden_states #275

@1145284121

Description

@1145284121

Thanks for open-sourcing this excellent work!

I noticed HunyuanVideo is different from some other implementations (e.g., CogVideo and Flux) when handling the prompt padding and the mask of mmdit's 3D Full Attetnion.

def _get_llama_prompt_embeds(self, prompt, ...):
    text_inputs = self.tokenizer(
        prompt,
        max_length=max_sequence_length,
        padding="max_length",
        truncation=True,
        return_attention_mask=True,  # Explicitly request mask
    )
    prompt_embeds = self.text_encoder(
        input_ids=text_input_ids,
        attention_mask=prompt_attention_mask,  # Pass mask to encoder
        output_hidden_states=True,
    )
    
     Return both embeddings and mask
    return prompt_embeds, prompt_attention_mask

HunyuanVideo'llama encoder returns prompt_attetnion_mask and use it in attention_processor

 hidden_states = F.scaled_dot_product_attention(
            query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
          # the attention_mask is not None 
        )

But in CogVideo and Flux(both MMDiT architecture) :

if prompt_embeds is None:
            prompt_embeds = self._get_t5_prompt_embeds(
                prompt=prompt,
                num_videos_per_prompt=num_videos_per_prompt,
                max_sequence_length=max_sequence_length,
                device=device,
                dtype=dtype,
            )

the t5_encoder not returns padding mask of prompt.

hidden_states = F.scaled_dot_product_attention(
            query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
         #  the attention_mask is not None 
        )

I'm confused about what led to this difference.

  • keep consistent with the training phase?
  • Or did Hunyuan team done some experiments showing that adding masks improves performance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions