Skip to content

Generate broadcast for matrix multiplication where one of the matrix is filled with ones #4166

@umangyadav

Description

@umangyadav

DOR (Definition of Ready)

Ready, no blockers

Description

Let's say i have gemm C = A * B

where A is of shape [B, M, 1]. A is filled with ones.

Let's say B is of shape [B, 1, N].

in this case C would be of shape [B, M, N] .

Because A is filled with ones, C would actually just be broadcasted B.

C = broadcast(B) to shape [B, M, N]

This could be useful transformation for seqLenKV=1 in attention. It is unusual case but in general this transformation can be useful for further graph simplifications.

DOD (Definition of Done)

Unittests are added
Show perf improvement

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions