-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
Description
GEMM Problem Shape --m=8 --n=8192 --k=8192 Does NOT Work
/tools/profiler/cutlass_profiler --dist=uniform,min:-2.3,max:2.3,scale:-1 --kernels=cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem --m=8 --n=8192 --k=8192 --verification-enabled
=false
=============================
Problem ID: 1
Provider: CUTLASS
OperationKind: gemm
Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
Status: Success
Verification: OFF
Disposition: Failed
Arguments: --gemm_kind=universal --m=8 --n=8192 --k=8192 --A=bf16:row --B=bf16:column --C=bf16:column --D=bf16:column \
--alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \
--runtime_input_datatype_a=invalid --runtime_input_datatype_b=invalid --use_pdl=false --enable_sm90_mixed_dtype_shuffle_test=false \
--swizzle_size=1 --op_class=tensorop --accum=f32 --cta_m=128 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \
--cluster_k=1 --cluster_m_fallback=0 --cluster_n_fallback=0 --cluster_k_fallback=0 --stages=7 --warps_m=4 \
--warps_n=2 --warps_k=1 --inst_m=64 --inst_n=128 --inst_k=16 --min_cc=90 --max_cc=90
Bytes: 134479872 bytes
FLOPs: 1073872896 flops
FLOPs/Byte: 7
GEMM Problem Shape --m=8 --n=8192 --k=128 Works
./tools/profiler/cutlass_profiler --dist=uniform,min:-2.3,max:2.3,scale:-1 --kernels=cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem --m=8 --n=8192 --k=128 --verification-enabled=false
=============================
Problem ID: 1
Provider: CUTLASS
OperationKind: gemm
Operation: cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_bf16_bf16_128x128x64_4x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
Status: Success
Verification: OFF
Disposition: Not verified
Arguments: --gemm_kind=universal --m=8 --n=8192 --k=128 --A=bf16:row --B=bf16:column --C=bf16:column --D=bf16:column \
--alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \
--runtime_input_datatype_a=invalid --runtime_input_datatype_b=invalid --use_pdl=false --enable_sm90_mixed_dtype_shuffle_test=false \
--swizzle_size=1 --op_class=tensorop --accum=f32 --cta_m=128 --cta_n=128 --cta_k=64 --cluster_m=1 --cluster_n=1 \
--cluster_k=1 --cluster_m_fallback=0 --cluster_n_fallback=0 --cluster_k_fallback=0 --stages=7 --warps_m=4 \
--warps_n=2 --warps_k=1 --inst_m=64 --inst_n=128 --inst_k=16 --min_cc=90 --max_cc=90
Bytes: 2230272 bytes
FLOPs: 16908288 flops
FLOPs/Byte: 7
Runtime: 0.0130992 ms
Memory: 158.567 GiB/s
Math: 1290.79 GFLOP/s