Skip to content

Conversation

@jackiehimel
Copy link

What does this PR do?

Adds SDPA and FlashAttention-2 support to LayoutLMv3 using the unified attention interface pattern, following the same architecture used in BERT and other recent model implementations.

Fixes #35467

Implementation

  • Refactored LayoutLMv3SelfAttention to use ALL_ATTENTION_FUNCTIONS interface instead of separate attention classes
  • Created layoutlmv3_eager_attention_forward function that implements the CogView attention mechanism (alpha=32 scaling) with support for LayoutLMv3's relative position bias and spatial attention bias
  • Added _supports_sdpa = True and _supports_flash_attn = True flags to LayoutLMv3PreTrainedModel
  • Updated mask creation to use create_bidirectional_mask (replacing get_extended_attention_mask)
  • Threaded layer_idx parameter through attention classes for consistency

Backward compatibility

SDPA and FlashAttention-2 backends are incompatible with LayoutLMv3's relative position bias and spatial attention bias. The code raises a clear ValueError when users attempt to combine these features, directing them to use eager mode. Default behavior (eager attention with all bias features) remains unchanged.

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested?

  • Code follows the unified attention interface pattern established in the codebase
  • Linting passes: ruff check and ruff format both pass
  • Ready for CI test suite execution

Before submitting

  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@vasqu - Thank you for the detailed feedback on #41801! I've refactored the implementation to use the unified attention interface pattern as you suggested. Would definitely appreciate your feedback on this updated approach.

@ArthurZucker @Cyrilvallez - attention implementation reviewers

- Add unified attention implementation following BERT pattern
- Support eager, SDPA, and FlashAttention-2 attention backends
- Handle relative position bias limitations correctly
- Use create_bidirectional_mask for proper mask handling

Fixes huggingface#35467
Override test to use eager attention since LayoutLMv3's relative position bias is incompatible with SDPA/FlashAttention.
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: layoutlmv3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support SDPA & Flash Attention 2 for LayoutLMv3

1 participant