Commit 5b90b7f
Update cudnn-frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs (#650)
* Update cudnn frontend to 1.0.3 to fix cudnn v9 Nans
Signed-off-by: Charlene Yang <[email protected]>
* make d_out contiguous for bwd
Signed-off-by: Charlene Yang <[email protected]>
* remove cudnnDestroy to let torch handle it
Signed-off-by: Charlene Yang <[email protected]>
* Update transformer_engine/pytorch/attention.py
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: cyanguwa <[email protected]>
* Update transformer_engine/pytorch/attention.py
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: cyanguwa <[email protected]>
* Update transformer_engine/pytorch/attention.py
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: cyanguwa <[email protected]>
---------
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: cyanguwa <[email protected]>
Co-authored-by: Tim Moon <[email protected]>1 parent df9c29e commit 5b90b7f
File tree
3 files changed
+4
-6
lines changed- 3rdparty
- transformer_engine
- common/fused_attn
- pytorch
3 files changed
+4
-6
lines changedSubmodule cudnn-frontend updated 49 files
- CMakeLists.txt+1-1
- README.FE.1.0.md+5
- README.md+7-1
- include/cudnn_backend_base.h-2
- include/cudnn_frontend.h+3-1
- include/cudnn_frontend/cudnn_interface.h+3-3
- include/cudnn_frontend/node/batchnorm.h+4-2
- include/cudnn_frontend/node/batchnorm_inference.h+5-2
- include/cudnn_frontend/node/bn_finalize.h+4-2
- include/cudnn_frontend/node/conv_dgrad.h+4-2
- include/cudnn_frontend/node/conv_fprop.h+5-2
- include/cudnn_frontend/node/conv_wgrad.h+4-2
- include/cudnn_frontend/node/dbn.h+4-2
- include/cudnn_frontend/node/dbn_weight.h+5-2
- include/cudnn_frontend/node/dln.h+4-2
- include/cudnn_frontend/node/genstats.h+4-2
- include/cudnn_frontend/node/instancenorm.h+8-4
- include/cudnn_frontend/node/layernorm.h+5-2
- include/cudnn_frontend/node/matmul.h+5-2
- include/cudnn_frontend/node/pointwise.h+4-2
- include/cudnn_frontend/node/reduction.h+4-2
- include/cudnn_frontend/node/reshape.h+5-2
- include/cudnn_frontend/node/rmsnorm.h+9-4
- include/cudnn_frontend/node/rng.h+4-2
- include/cudnn_frontend/node/scaled_dot_product_flash_attention.h+119-4
- include/cudnn_frontend/node_interface.h+27-4
- include/cudnn_frontend_ConvDesc.h-3
- include/cudnn_frontend_Engine.h-3
- include/cudnn_frontend_EngineConfig.h-3
- include/cudnn_frontend_EngineFallbackList.h-1
- include/cudnn_frontend_ExecutionPlan.h-3
- include/cudnn_frontend_Filters.h-2
- include/cudnn_frontend_Heuristics.h-3
- include/cudnn_frontend_MatMulDesc.h-3
- include/cudnn_frontend_Operation.h-3
- include/cudnn_frontend_OperationGraph.h-3
- include/cudnn_frontend_PointWiseDesc.h-3
- include/cudnn_frontend_ReductionDesc.h-3
- include/cudnn_frontend_Resample.h-3
- include/cudnn_frontend_Rng.h-3
- include/cudnn_frontend_VariantPack.h-3
- python_bindings/properties.cpp+9-5
- samples/legacy_samples/conv_sample.h-3
- samples/legacy_samples/cpu_references.h+26-18
- samples/legacy_samples/norm_samples.cpp+1-1
- samples/legacy_samples/test_list.cpp+9-9
- samples/python/test_conv_bias.py+4-2
- samples/python/test_mhas.py+21-9
- setup.py+1-1
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | 155 | | |
161 | 156 | | |
162 | 157 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1733 | 1733 | | |
1734 | 1734 | | |
1735 | 1735 | | |
| 1736 | + | |
1736 | 1737 | | |
1737 | 1738 | | |
1738 | 1739 | | |
| |||
1802 | 1803 | | |
1803 | 1804 | | |
1804 | 1805 | | |
| 1806 | + | |
1805 | 1807 | | |
1806 | 1808 | | |
1807 | 1809 | | |
| |||
1883 | 1885 | | |
1884 | 1886 | | |
1885 | 1887 | | |
| 1888 | + | |
1886 | 1889 | | |
1887 | 1890 | | |
1888 | 1891 | | |
| |||
0 commit comments