fix to accept cumulative_seqlens from TransformersKwargs in FA #40194

Kurt232 · 2025-08-15T10:42:44Z

The helper _flash_attention_forward now falls back to the keys cumulative_seqlens_q/k that may arrive inside TransformersKwargs when the explicit cu_seq_lens_q/k arguments are absent

A warning is raised on conflict, ensuring users notice any override

What does this PR do?

I note that current transformer use TransformersKwargs as extra, but it will refuse the cumulative_seqlens_q and cumulative_seqlens_k args in flash_attention func, as unmatched argument name.

So I updated _flash_attention_forward to ensure compatibility with TransformersKwargs.

Fixes #40193

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Cyrilvallez · 2025-08-18T08:38:47Z

cc @vasqu, can you check? We want to have one and only one name for the same objects everywhere, and avoid such warnings and reattributions of names. Probably we need to update the TransformerKwargs

Kurt232 · 2025-08-18T08:47:28Z

I think it is necessary to update TransformerKwargs, and my fix is temporary before you update all members in TransformerKwargs and FlashAttentionKwargs.

vasqu · 2025-08-18T11:21:48Z

@Cyrilvallez We have

the FlashAttentionKwargs in modeling_flash_attention_utils
the TransformersKwargs in generic
the PagedAttentionArgs in continuous_batching (+ some dependencies there for other methods 👀)

Tbh, I'd be more pro making it more unified than having this workaround 😅 especially since it's more of a typing issue than a functional issue (dataclasses clash with what's really supposed to be passed as kwargs)

Cyrilvallez · 2025-08-20T12:31:06Z

Yes, we don't really want this kind of workaround that clutters the code - @Kurt232 do you want to fix the typings where @vasqu mentionned instead? Otherwise we can do it 🤗
But TLDR, let's fix the root cause immediately, instead of first merging this and then reverting

Cyrilvallez · 2025-08-20T12:32:54Z

Also, @vasqu can you make sure the current typing would be BC? If you changed it in your refactor, it may need to be update in flash_attention_forward instead of the Kwargs classes - otherwise it would plainly break BC for all downstream libs that used to pass them

vasqu · 2025-08-20T12:53:30Z

Checked, it was introduced in #33932 and the signature of the function hasn't changed in regards to those kwargs. I.e. the dataclasses would need to be updated imo.

Could search when the dataclass was changed but pretty sure sometime in all of the fa/kwarg changes the dataclass had the renamings.

Kurt232 · 2025-08-20T14:04:36Z

@Cyrilvallez I'm willing to contribute this, so I just need to create new PR to fix it?
@vasqu I want to double check that I just need to rename the kwargs dataclass to fit the FA function signature? e.g. cumulative_seqlens_q/k -> cu_seq_lens_q/k.

vasqu · 2025-08-20T14:07:40Z

@Kurt232 Yes, that's correct

Cyrilvallez · 2025-08-20T15:15:44Z

@Kurt232 Either use this PR and revert the previous changes, or open a new PR, whichever you want

cumulative_seqlens_q/k -> cu_seq_lens_q/k: - in the FlashAttentionKwargs in modeling_flash_attention_utils - in the TransformersKwargs in generic - in the PagedAttentionArgs in continuous_batching It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`

Kurt232 · 2025-08-21T08:19:02Z

PagedAttention was added in git@211f2b0 and used in ContinuousBatchingManager in src/transformers/generation/continuous_batching.py.

At git@242bb2ca, in src/transformers/generation/continuous_batching.py (current):
I checked it is BC after renaming cumulative_seqlens_q/k -> cu_seq_lens_q/k.
Because they are created in ContinuousBatchProcessor.setup_static_tensors:L762, used in ContinuousBatchingManager._model_forward:L1233 and destroyed with ContinuousBatchProcessor.

Please review @vasqu. Thx😄

vasqu

Can you add a 🚨 to the title, this is definitely breaking (API wise) for continuous batching (CB)

I'm not sure if CB is user-facing in this case so it might not be too bad but would check with @ArthurZucker

src/transformers/integrations/flash_paged.py

unused function arg in `PagedAttentionCache.update` Co-authored-by: Anton Vlasjuk <[email protected]>

src/transformers/generation/continuous_batching.py

ArthurZucker · 2025-08-21T11:56:55Z

src/transformers/integrations/flash_paged.py

+        cu_seq_lens_q: (batch_size + 1,), dtype torch.int32. The cumulative sequence lengths
           of the sequences in the batch, used to index into q.


my main issue with this naming is that is is no helpful for newbies, cu does not mean anything!

src/transformers/utils/generic.py

ArthurZucker · 2025-08-22T13:14:34Z

Don't worry le't s just revert for continuous batching for now, the rest is fine!

vasqu · 2025-08-22T13:43:06Z

(we can remove the 🚨 as well then - only CB mightve been breaking)

Kurt232 · 2025-08-22T17:04:24Z

src/transformers/integrations/flash_paged.py

+    k, v = cache.update(k, v, module.layer_idx, **kwargs)

    sliding_window = (-1, -1) if not getattr(module, "sliding_window", False) else (module.sliding_window, 0)
    if implementation is not None:


I think paged_attention_forward should use cu_seq_lens_q/k instead of cumulative_seqlens_q/k to keep coherency with flash_attn_varlen_func.

https://github.com/huggingface/transformers/blob/29ddcacea3ad9d3cdf6c5d8e51d1d39cbc5e7dfa/src/transformers/modeling_flash_attention_utils.py#L557C1-L578C3

ArthurZucker

Okay! I don't want to fight over this, its a small nit!

Kurt232 changed the title ~~fix to accept cumulative_seqlens from TransformersKwargs in FA #40193~~ fix to accept cumulative_seqlens from TransformersKwargs in FA Aug 15, 2025

Kurt232 force-pushed the fix/args_in_flash_attention_forward branch from fedbf6d to dc0624d Compare August 21, 2025 08:17

format changes by ruff

a9cd0b2

vasqu reviewed Aug 21, 2025

View reviewed changes

src/transformers/integrations/flash_paged.py Outdated Show resolved Hide resolved

Kurt232 changed the title ~~fix to accept cumulative_seqlens from TransformersKwargs in FA~~ 🚨 fix to accept cumulative_seqlens from TransformersKwargs in FA Aug 21, 2025

Update src/transformers/integrations/flash_paged.py

a141514

unused function arg in `PagedAttentionCache.update` Co-authored-by: Anton Vlasjuk <[email protected]>

ArthurZucker reviewed Aug 21, 2025

View reviewed changes

revert continuous_batching signiture, which is more meaningful

d93cdd1

Kurt232 commented Aug 22, 2025

View reviewed changes

Kurt232 changed the title ~~🚨 fix to accept cumulative_seqlens from TransformersKwargs in FA~~ fix to accept cumulative_seqlens from TransformersKwargs in FA Aug 22, 2025

ArthurZucker approved these changes Aug 25, 2025

View reviewed changes

ArthurZucker merged commit 14b89fe into huggingface:main Aug 25, 2025
20 of 22 checks passed

Kurt232 deleted the fix/args_in_flash_attention_forward branch August 25, 2025 09:02

		cu_seq_lens_q: (batch_size + 1,), dtype torch.int32. The cumulative sequence lengths
		of the sequences in the batch, used to index into q.

fix to accept cumulative_seqlens from TransformersKwargs in FA #40194

fix to accept cumulative_seqlens from TransformersKwargs in FA #40194

Conversation

Kurt232 commented Aug 15, 2025

What does this PR do?

Before submitting

Uh oh!

Cyrilvallez commented Aug 18, 2025

Uh oh!

Kurt232 commented Aug 18, 2025

Uh oh!

vasqu commented Aug 18, 2025

Uh oh!

Cyrilvallez commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Aug 20, 2025

Uh oh!

vasqu commented Aug 20, 2025

Uh oh!

Kurt232 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Aug 20, 2025

Uh oh!

Cyrilvallez commented Aug 20, 2025

Uh oh!

Kurt232 commented Aug 21, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker commented Aug 22, 2025

Uh oh!

vasqu commented Aug 22, 2025

Uh oh!

Kurt232 Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented Aug 20, 2025 •

edited

Loading

Kurt232 commented Aug 20, 2025 •

edited

Loading

Kurt232 Aug 22, 2025 •

edited

Loading