How to Perform 3D Tensor Multiplication with FP8 Data Type (Beyond te.Linear)?

Hi, thanks for your great work on Transformer Engine!

I am working on a project that requires high-performance batched matrix multiplication (i.e., 3D tensor multiplication) where all inputs are stored in the FP8 data type. However, I noticed that te.Linear only takes a single input matrix and uses its internal weights, which does not fit the case where I need to multiply two arbitrary 3D tensors.

**Could you please advise which function or API in Transformer Engine is recommended for performing batched matrix multiplication (GEMM) directly on two FP8 3D tensors?** Is there a public interface for this use case, or is it only available through the lower-level generic_gemm/general_gemm functions? If so, could you share an example or best practice for this scenario?

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Perform 3D Tensor Multiplication with FP8 Data Type (Beyond te.Linear)? #1910

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to Perform 3D Tensor Multiplication with FP8 Data Type (Beyond te.Linear)? #1910

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions