scaled_dot_product_attention throws an error when both attn_mask and is_causal are set

Hi, thanks for developing a very wonderful project.

I found `torch.nn.functional.scaled_dot_product_attention` [throws an error when both attn_mask and is_causal are set ](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html).
But, currently the language_model.py code uses both.

https://github.com/huggingface/nanoVLM/blob/6ba9082e16f1fc8c21a1f8d0c54b26c9233c8771/models/language_model.py#L141

A simple fix is to create a causal mask by yourself, but if there's other ways, I want to know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scaled_dot_product_attention throws an error when both attn_mask and is_causal are set #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scaled_dot_product_attention throws an error when both attn_mask and is_causal are set #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions