Add SDPA and FlashAttention-2 support to LayoutLMv3 #42225

jackiehimel · 2025-11-16T05:50:57Z

What does this PR do?

Adds SDPA and FlashAttention-2 support to LayoutLMv3 using the unified attention interface pattern, following the same architecture used in BERT and other recent model implementations.

Fixes #35467

Implementation

Refactored LayoutLMv3SelfAttention to use ALL_ATTENTION_FUNCTIONS interface instead of separate attention classes
Created layoutlmv3_eager_attention_forward function that implements the CogView attention mechanism (alpha=32 scaling) with support for LayoutLMv3's relative position bias and spatial attention bias
Added _supports_sdpa = True and _supports_flash_attn = True flags to LayoutLMv3PreTrainedModel
Updated mask creation to use create_bidirectional_mask (replacing get_extended_attention_mask)
Threaded layer_idx parameter through attention classes for consistency

Backward compatibility

SDPA and FlashAttention-2 backends are incompatible with LayoutLMv3's relative position bias and spatial attention bias. The code raises a clear ValueError when users attempt to combine these features, directing them to use eager mode. Default behavior (eager attention with all bias features) remains unchanged.

Type of change

New feature (non-breaking change which adds functionality)

How has this change been tested?

Code follows the unified attention interface pattern established in the codebase
Linting passes: ruff check and ruff format both pass
Ready for CI test suite execution

Before submitting

Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@vasqu - Thank you for the detailed feedback on #41801! I've refactored the implementation to use the unified attention interface pattern as you suggested. Would definitely appreciate your feedback on this updated approach.

@ArthurZucker @Cyrilvallez - attention implementation reviewers

- Add unified attention implementation following BERT pattern - Support eager, SDPA, and FlashAttention-2 attention backends - Handle relative position bias limitations correctly - Use create_bidirectional_mask for proper mask handling Fixes huggingface#35467

Override test to use eager attention since LayoutLMv3's relative position bias is incompatible with SDPA/FlashAttention.

…file

github-actions · 2025-11-16T06:49:41Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: layoutlmv3

jackiehimel added 3 commits November 15, 2025 15:32

Fix test_batching_equivalence for LayoutLMv3 with relative position bias

f203f87

Override test to use eager attention since LayoutLMv3's relative position bias is incompatible with SDPA/FlashAttention.

Fix code quality: remove whitespace from blank lines and format test …

39f0a31

…file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SDPA and FlashAttention-2 support to LayoutLMv3 #42225

Add SDPA and FlashAttention-2 support to LayoutLMv3 #42225

jackiehimel commented Nov 16, 2025

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add SDPA and FlashAttention-2 support to LayoutLMv3 #42225

Are you sure you want to change the base?

Add SDPA and FlashAttention-2 support to LayoutLMv3 #42225

Conversation

jackiehimel commented Nov 16, 2025

What does this PR do?

Implementation

Backward compatibility

Type of change

How has this change been tested?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant