simpleTensorCoreGEMM has errors in output when compiled with CUDA10 for Turing GPUs 

simpleTensorCoreGEMM has errors in output(beyond the additive tolerance of 1e-5 and multiplicative tol of 1.01) when compiled with CUDA10 for Turing GPU (arch=sm_70, RTX 2080Ti)

I did not modify any datatypes in the run and both the wmma based explicit GEMM implementation and the cuBlasGemmEx call use the Tensorcores.

I am wondering what might be causing the errors beyond the specified tolerance limits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

simpleTensorCoreGEMM has errors in output when compiled with CUDA10 for Turing GPUs #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

simpleTensorCoreGEMM has errors in output when compiled with CUDA10 for Turing GPUs #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions