Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
323 commits
Select commit Hold shift + click to select a range
3c995c7
Extend DualGemm: support batched mode + decouple B0/B1 layouts (#790)
aakhundov Feb 13, 2023
8f5c242
Update dual_gemm_common.h
hwu36 Feb 13, 2023
9fb38ac
fix alignmentC=8 for imma N=128 (#822)
hwu36 Feb 15, 2023
a101ac2
Fix some typos (#791)
MARD1NO Feb 16, 2023
34bed24
Update helper.h
hwu36 Feb 16, 2023
d8359c8
Changes to iterators to support s8 gemm with f16 outputs (#812)
gsujankumar Feb 16, 2023
91b8de8
streamk fix (#830)
hwu36 Feb 20, 2023
95f673e
Update base_grouped.h (#832)
ppwwyyxx Feb 21, 2023
9cdbe33
Add fixed_channel and few_channel mode to int8 in generator (#829)
ShuaiShao93 Feb 22, 2023
f303889
fMHA: Sync FW with xFormers (#828)
danthe3rd Feb 23, 2023
65688c2
streamk fix (#836)
hwu36 Feb 23, 2023
92ebbf1
Fix typos (#839)
apivovarov Feb 27, 2023
f396cdd
ex24[gemm_grouped]: Allow to change layout/dtype (#841)
danthe3rd Mar 1, 2023
a31b43b
Re-enable aarch64 support lost in 277bd6e5379e0c1e1eb64db1a654b30e1ef…
psaab Mar 2, 2023
a68e2f9
Reduce versbosity in manifest.py (#845)
Mar 7, 2023
c4f6b8c
Updates for 3.0 (#857)
ANIKET-SHIVAM Mar 9, 2023
7e370c9
Fix typos 2 (#842)
apivovarov Mar 10, 2023
29801e3
Hide streams and typinfo from nvrtc (#853)
kroburg Mar 10, 2023
86cae03
expose StoreT parameter for potential speed (#838)
erees1 Mar 10, 2023
af332d4
Add missing comma in cutlass/arch/mma_sm90.h (#862)
thakkarV Mar 14, 2023
2670b97
Fix sign-compare warning in `reorder_array` (#869)
malfet Mar 20, 2023
6116706
Set batch_strides on Params::update (#883)
jackkosaian Mar 20, 2023
209faf7
remove spurious comma (#871)
thakkarV Mar 20, 2023
42290f5
Fix for dangling pointers (#885)
alexander-zinoviev Mar 25, 2023
77549ae
Update PUBLICATIONS.md
hwu36 Mar 26, 2023
87070b6
add a CUTLASS publication (#893)
yzhaiustc Mar 28, 2023
1eef5c3
add guards for __CUDA_ARCH__ >= 530 (#891)
ptrblck Mar 28, 2023
15d9d31
CUTLASS 3.0 Hopper GEMMs are GETTs in disguise (#897)
thakkarV Mar 29, 2023
bc36122
[layout] Fix AffineRank2ColumnMajor::packed() (#879)
Mar 29, 2023
660a05f
fix split_k_mode and add reduction kernel for f16 input/accum/output …
Mar 30, 2023
ecbd245
Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754)
Gregory-Meyer Apr 1, 2023
0964bdb
update gemm and conv2d cmdline --help output (#878)
Adnios Apr 1, 2023
2ba1ef1
Increase max dynamic SMEM size in GemmSoftmax (#903)
aakhundov Apr 3, 2023
0435979
Remove const from 3.x GemmUniversalAdapter::operator() (#905)
aakhundov Apr 4, 2023
e2d439e
Add tile_n=32 and tile_k=32 kernels in generator.py (#858)
ShuaiShao93 Apr 6, 2023
9b8166e
fMHA: Add backward pass (#844)
danthe3rd Apr 7, 2023
d572cc1
CUTLASS 3.1 (#915)
ANIKET-SHIVAM Apr 15, 2023
4a68cf7
added support of b2b bmm (#849)
PivovarA Apr 15, 2023
43cfbe0
Allow L2 prefect for clang compiler (#914)
grypp Apr 15, 2023
54bebe4
Fix some typos in CuTe tutorials (#912)
aakhundov Apr 17, 2023
9a83bd3
CUTLASS 3.1 Python interface documentation (#917)
jackkosaian Apr 18, 2023
e36912f
Fix for dangling references in the MHA example (#918)
alexander-zinoviev Apr 20, 2023
180c562
Add missing checks for NVRTC in CuTe (#921)
jszuppe Apr 25, 2023
df02482
Add missing schedules argument in SM90 fp16 op generation (#920)
aakhundov Apr 26, 2023
fe2f491
Get SM count with cudaDeviceGetAttribute in KernelHardwareInfo (#927)
aakhundov Apr 28, 2023
6f8596c
Add missing #include directive to get access to cutlass::epilogue::th…
Gregory-Meyer Apr 29, 2023
7c04f95
Updates for 3.1 (#932)
ANIKET-SHIVAM Apr 29, 2023
24c8b7d
Fix cuTE compilation with clang (#939)
JanuszL May 9, 2023
b250fac
Make operator() const-correct and add missing static functions. (#936)
Gregory-Meyer May 9, 2023
fcfbd23
Fix host compilation of cute::cast_smem_ptr_to_uint. (#940)
Gregory-Meyer May 10, 2023
19c4a48
replace division with multiplication in GELU (#942)
wllqwzx May 12, 2023
e2953d4
Update gemm_api.md
hwu36 May 12, 2023
b974048
Adding 128x256 tile for 16b input datatype WGMMA gemm (#950)
May 17, 2023
6fbc0d3
Update layout.md
hwu36 May 18, 2023
13f4134
Stream-K with broadcast (#892)
alihassanijr May 22, 2023
f079619
More updates for 3.1 (#958)
ANIKET-SHIVAM May 24, 2023
b4ab501
Adds CUDA path for x86-64 (#957)
alihassanijr May 24, 2023
d3e7271
Add support for sparse GEMM with row broadcasted bias vector (#951)
alexsamardzic May 24, 2023
7859fe3
Update PUBLICATIONS.md
hwu36 May 24, 2023
4638250
Update CHANGELOG.md
hwu36 May 24, 2023
6f47420
Update README.md
hwu36 May 24, 2023
7dbf423
Add conversion from ElementBias to ElementCompute (#961)
jackkosaian May 27, 2023
fde824a
Update Hopper performance plot for CUTLASS 3.1 + CTK 12.1 (#967)
thakkarV Jun 1, 2023
87349d3
Add grouped b2b GEMM (#970)
jackkosaian Jun 5, 2023
473a670
Fix Int8 and TF32 generator (#976)
ANIKET-SHIVAM Jun 12, 2023
f6d42f2
add library_dirs (#977)
grimoire Jun 14, 2023
9b923dd
fix minor typos (#984)
thecodingwizard Jul 5, 2023
e066ced
fix epilogue iterator error (#995)
ChangyouSiom Jul 11, 2023
f679663
Add RMS norm (#979)
masahi Jul 11, 2023
146d314
Update fMHA kernels (#992)
danthe3rd Jul 13, 2023
8e85580
fix layout bug (#1006)
lygztq Jul 19, 2023
d20f3a9
spelling (#1007)
sophiawisdom Jul 20, 2023
a0d787b
Fix one publication (#1019)
MKimiSH Jul 28, 2023
4575443
CUTLASS 3.2 (#1024)
ANIKET-SHIVAM Aug 8, 2023
2d9a557
torch.bfloat16 support in cutlass python (#1037)
sophiawisdom Aug 16, 2023
7e5ee8b
[doc] fix: fix typos in the comment (#1049)
eric-haibin-lin Aug 16, 2023
3930f70
Fix typo in `0x_gemm_tutorial.md` (#1035)
chelini Aug 17, 2023
2e56cfa
fix typo (#1047)
zjjott Aug 18, 2023
2a9fa23
Avoid cute::print compiler warnings with -Wformat-security (#1041)
ahendriksen Aug 18, 2023
27de343
Add one Publication which is inspired by cutlass (#1022)
reed-lau Aug 22, 2023
a88c41c
Updates for 3.2 release (#1065)
ANIKET-SHIVAM Aug 26, 2023
7618e9b
Fix numeric conversion warning (#1021)
vincentccc Aug 27, 2023
6673df0
fix typos (#1059)
reed-lau Aug 27, 2023
3a8f57a
Add simple hash and eq methods for gemm_operations. (#1053)
ipiszy Aug 28, 2023
34fd980
fix cinttypes issue with STDC_FORMAT_MACROS (#1068)
tmm1 Aug 29, 2023
e01b9b5
Shard gemm reference templates into multiple TUs for parallel compila…
thakkarV Aug 30, 2023
88c0d7c
make only visible on device (#1071)
drisspg Sep 7, 2023
34bbadd
standarize fp8 generator (#1078)
ANIKET-SHIVAM Sep 7, 2023
a77b2c9
style(examples): typo (#1080)
tpoisonooo Sep 11, 2023
6407bcd
fix matrix B indices (#1089)
yzhaiustc Sep 12, 2023
8783c41
Replace 0x1f with 0xffffffff in __shfl_sync (#1097)
vmarkovtsev Sep 18, 2023
e0aaa3c
fix GmmaDescriptor print format string error (#1102)
reed-lau Sep 20, 2023
90d3b0f
CUTLASS 3.2.1 (#1113)
ANIKET-SHIVAM Sep 26, 2023
14f69bd
[fix] fix comparison operator for integer_subbyte (#1090)
KnowingNothing Sep 26, 2023
67ae8e0
Change the position of minus sign in line1549 array.h (#1091)
ptxu78 Sep 26, 2023
5cd735c
Fix Parallel Split-K on Gemm Operation Profiler (#1109)
Sep 26, 2023
7d8317a
Support for Mixed Input TensorOp (#1084)
Sep 27, 2023
26986bb
Fix type typo in rmsnorm (#1119)
abcdabcd987 Oct 3, 2023
ff61a49
Allow changing epsilon parameter in RMS norm kernel (#1112)
masahi Oct 3, 2023
61a38f8
Add #include <limits> to platform.h (#1121)
felker Oct 3, 2023
5f13dca
set kIsHeavy member variables (#1012)
FabianSchuetze Oct 4, 2023
4082fed
Add missing int64 and uint64 overloads for conj (#1127)
klecki Oct 6, 2023
ff02da2
Fx parallel split-k (#1116)
Oct 6, 2023
1125901
Add config.yml issue template with Discord link. (#1135)
jrhemstad Oct 10, 2023
fa8dfe6
fix missing return warning for repeat and axpby (#1124)
reed-lau Oct 12, 2023
757275f
Adding more Threadblock Tiles for Mixed-input TensorOp (BF16 * S8) in…
Oct 13, 2023
5e1a0a5
fix alignmentC for h16816_s8xf16 (#1146)
hwu36 Oct 17, 2023
fb10fa5
Fix broken pipeline link in docs (#1143)
milesvant Oct 18, 2023
7a7796a
Fix is_zero (#1147)
cyyever Oct 23, 2023
922fb51
clean the format (#1140)
reed-lau Oct 25, 2023
c008b4a
CUTLASS 3.3.0 (#1167)
IonThruster Nov 2, 2023
557be3a
Fix several typos (#1169)
wang-y-z Nov 3, 2023
1d7f2a2
Fix several broken links (#1168)
wang-y-z Nov 3, 2023
39c6a83
fix missing return warning (#1173)
reed-lau Nov 4, 2023
5ae8133
Doc only change changelog 3.3 (#1180)
Nov 13, 2023
1ab6cc7
Fix `std::abs` overloading for `bfloat16_t` (#1179)
chhwang Nov 13, 2023
6e60b9b
enable L2::128B prefetch for cp.async by default (#1177)
reed-lau Nov 13, 2023
b5d8a5d
Allow SM90 pingpong kernel to use custom tile schedulers (#1194)
klevzoff Nov 15, 2023
8098336
Updates to Python interface for PyPI packaging (#1209)
jackkosaian Nov 28, 2023
eb01d54
fix cp.async L2 prefetch typo (#1187)
reed-lau Nov 28, 2023
56fc3df
Adding missing `typename` (#1191)
chsigg Nov 29, 2023
a759e85
Add subclass declarations to generated files. (#1193)
chsigg Nov 30, 2023
99c4eeb
Explicitly cast `blockIdx` to `uint3` (#1192)
chsigg Nov 30, 2023
10b850f
Fix some sign conversion warnings (#1172)
cyyever Nov 30, 2023
60c8251
Remove unused variables (#1195)
chsigg Dec 1, 2023
2375a07
Qualify calls to make_fragment_? from templated base class. (#1196)
chsigg Dec 1, 2023
bef1fbc
Add missing `#include <cstdio>` (#1197)
chsigg Dec 1, 2023
4a1709e
Fixed illegal PTX syntax (#1225)
hwu36 Dec 1, 2023
e9e30c2
Updates and Bug fixes to CUTLASS 3.3 (#1232)
IonThruster Dec 5, 2023
a75b4ac
Fix Stream-K reduce bug in epilogue with broadcast (#1224)
alihassanijr Dec 5, 2023
9c9b51d
Update PUBLICATIONS.md
hwu36 Dec 7, 2023
f188f9b
Fix typo in quickstart.md (#1257)
FedyuninV Dec 7, 2023
f4a0216
Fix bug in single source GEMM with residual + streamk (#1249)
alihassanijr Dec 7, 2023
e1483d5
Collection of changes to fix clang build. (#1200)
chsigg Dec 8, 2023
30ec1a4
Use size_t index to iterate up to std::vector::size() (#1251)
andportnoy Dec 9, 2023
f60786b
Remove undefined behavior from default constructor of PredicatedTileA…
Gregory-Meyer Dec 12, 2023
b7508e3
Fix inline ptx escaping for predicates. (#1264)
chsigg Dec 14, 2023
8236f30
CUTLASS 3.4.0 (#1286)
IonThruster Dec 29, 2023
5c756eb
Add support for sparse GEMM with visitor epilogue (#1189)
alexsamardzic Jan 4, 2024
c9591a6
fix typo (#1279)
jeejeelee Jan 4, 2024
d4be5ab
Allow per-column bias in EpilogueTensorBroadcast (#1275)
alihassanijr Jan 4, 2024
8ac2edc
expose stream API in python kernel call interfaces (#1287)
K-Wu Jan 5, 2024
74d1f3e
Fix cute::array<T, 0> iterator (#1273)
ezhulenev Jan 8, 2024
acba5be
Fix flops calculation and tensor b stride calculation in the example …
getianao Jan 8, 2024
2f589ff
Updates for 3.4 release. (#1305)
ANIKET-SHIVAM Jan 16, 2024
751eb9a
Update license year (#1306)
ANIKET-SHIVAM Jan 16, 2024
362abbf
Support ElementD to be void for tma (#1153)
kongroo Jan 16, 2024
ca37d63
Remove sparse GEMM with row broadcasted bias vector (#1302)
alexsamardzic Jan 17, 2024
139b93d
update publications (#1308)
jayhshah Jan 17, 2024
b4b5b11
Update PUBLICATIONS.md
hwu36 Jan 18, 2024
9385141
Update PUBLICATIONS.md
hwu36 Jan 19, 2024
092f14d
fix tile_size_mnk compilation warning (#1294)
reed-lau Jan 30, 2024
8825fbf
fix unrecognized print format specifier for int8/uint8 (#1303)
reed-lau Jan 30, 2024
6e3df97
Modify comments in code examples/08_turing_tensorop_gemm/turing_tens…
xws117 Feb 1, 2024
57e01e1
Fix missing include file (#1318)
LyricZhao Feb 3, 2024
47a3ebb
Add a missing platform include (#1328)
drisspg Feb 3, 2024
bbe579a
Updates for CUTLASS 3.4.1 (#1346)
ANIKET-SHIVAM Feb 15, 2024
a8f2c80
fix `tile_size(TiledCopy<Args...> const&)` error (#1357)
luliyucoordinate Feb 24, 2024
ffa34e7
(NFC) improve doc: Add missing verb to sentence (#1377)
chelini Mar 4, 2024
629f465
CUTLASS 3.5.0 (#1411)
thakkarV Mar 19, 2024
c4e3e12
group gemm set stride L = cute::Int<0> (#1416)
Xseventh Mar 20, 2024
8f7d278
[NFC] improve doc: fix typo in mma doc (#1417)
ThomsonTan Mar 27, 2024
28cbacb
fix stride compilation warning (#1415)
reed-lau Mar 30, 2024
f9ece1b
Python `Gemm` `tile_descriptions` fix (#1439)
jeromeku Mar 30, 2024
19f3cc3
Fix uint128 operator add (#1400)
reed-lau Apr 2, 2024
8e7d9f4
add missing header for size_t in `numeric_types.h` (#1420)
Ghost-LZW Apr 9, 2024
a40e08e
Update 02_layout_algebra.md (#1451)
yazdanimehdi Apr 10, 2024
7d49e6c
Updates for CUTLASS 3.5.0 (#1468)
thakkarV Apr 12, 2024
5c447dd
Update packed_stride.hpp to add CUTLASS_HOST_DEVICE decorator to new …
djns99 Apr 19, 2024
acc3ee1
Fix typos in cute docs (#1486)
irasin May 2, 2024
033d9ef
[Documentation] Fixes the confusion between concatenated vs. composed…
May 2, 2024
637b159
Fix C++17 version detection in helper_macros.hpp (#1479)
nickjeliopoulos May 28, 2024
2448bb5
Update gemm_api_3x.md (#1386)
RaulPPelaez Jul 10, 2024
dbfced0
Fix typos in convolution tests (#1433)
alexander-zinoviev Jul 10, 2024
81b06ee
Fix B operand variable name and comments (#1458)
andylolu2 Jul 10, 2024
d6580c3
Support use of external/system GTest installation (#1469)
iskunk Jul 10, 2024
c5239d8
Add Faster Neighborhood Attention to pubs (#1471)
alihassanijr Jul 10, 2024
e48c761
[bug] fix device thread `gemm.h` constructor (#1473)
luliyucoordinate Jul 10, 2024
843adf0
Fix SMEM index for C in CuTe examples (#1477)
joerowell Jul 10, 2024
52fb43f
fix mbarrier invalidate (#1494)
KTong821 Jul 10, 2024
56b46e2
Fix grouped gemm invalid memory access to problem shapes (#1543)
kongroo Jul 10, 2024
be60a0b
CUTLASS 3.5.1 (#1623)
thakkarV Jul 29, 2024
5b283c8
Add more GMMA shapes (#1630)
tridao Jul 29, 2024
fbd116c
fix build on SM 5.2 (#1664)
eqy Jul 31, 2024
8b2a040
Profiler docs and argument update for raster order (#1667)
depaulmillz Jul 31, 2024
1f2b590
Skip void-C kernels in the profiler when beta is non zero (#1661)
alihassanijr Jul 31, 2024
36cbfcf
Add extended wgmma shapes for all data types (#1666)
sklevtsov-nvidia Jul 31, 2024
eee0cab
Stamp out 1x1x1 clusters, 128x256 CTA shape (#1665)
alihassanijr Aug 1, 2024
06b2134
1x1x1 cluster launch (#1673)
depaulmillz Aug 1, 2024
19b4c5e
Fix isnan namespace qualification in cutlass/functional.h (#1679)
mhoemmen Aug 5, 2024
e22ba59
support data type w2 used in cutlass_library (#1517)
gavinchen430 Aug 6, 2024
2049c6c
5476 cutlass 3x gemm kernels (#1695)
depaulmillz Aug 8, 2024
7192f4a
Add CLayout_64x208 (#1680)
tridao Aug 8, 2024
4e5a8f6
3.5.1 plots and updated readme (#1708)
depaulmillz Aug 12, 2024
fb17043
Update half.h (#1709)
eqy Aug 14, 2024
8d8cfdf
update 3.5.1 readme/changelog
hwu36 Aug 15, 2024
865be73
Merge pull request #1713 from NVIDIA/351_sparse_update
d-k-b Aug 15, 2024
b0296bf
fix uint128
hwu36 Aug 16, 2024
3f084f7
Add couple configs into generator.py for mixed input MM (#1350)
alexsamardzic Aug 16, 2024
f93a691
Merge pull request #1714 from NVIDIA/u128_div
d-k-b Aug 16, 2024
4dbf5db
Use CUDA runtime API to retrieve function pointer to driver API (#1700)
shunfan-shao Aug 19, 2024
f7b19de
minor fix for a double quote in CMakeLists.txt (#1727)
Shreya-gaur Aug 20, 2024
e1976da
Add support for mixed 4-bit/8-bit data types GEMM (#1413)
alexsamardzic Aug 30, 2024
6c30441
Update barrier.h (#1782)
Algy Sep 4, 2024
7369adc
Add Sm90LinCombPerColBias (#1774)
ucassjy Sep 4, 2024
06e3377
Remove extraneous comma in declaration (#1776)
saagarjha Sep 5, 2024
82f5075
set_slice3x3 -> set_slice_3x3 (#1784)
lucifer1004 Sep 6, 2024
323c817
Support ComputeFn where output type differs from input type (#1771)
tridao Sep 6, 2024
21d0534
fix assertion (#1790)
seanxwzhang Sep 9, 2024
dbdae51
Support for TMA Epilogue for Group Gemm and add pingpong ptr array & …
Junkai-Wu Sep 11, 2024
3a8c01a
Prefix a member template name with the template keyword. (#1796)
shumway Sep 11, 2024
9f68995
add publication: ‘EVT: Accelerating Deep Learning Training with Epilo…
reed-lau Sep 16, 2024
1ebda1c
Fix MMA promotion interval assertions (#1641)
LyricZhao Sep 16, 2024
2991ce1
Add print_svg for mma (#1733)
reed-lau Sep 18, 2024
44dae8b
Adjust profiler space for SM89 (#1553)
wenlei-bao Sep 19, 2024
e2b0789
Add some can implement rules of hopper convolution. (#1835)
Junkai-Wu Sep 25, 2024
b27c49e
Fix cute doc (#1529)
jiweibo Oct 7, 2024
477a677
Fix typos in test/unit/conv/cache_testbed_output.h (#1652)
alexander-zinoviev Oct 7, 2024
0837a2a
Fix typo in comment (#1787)
Oct 7, 2024
cc3c29a
CUTLASS 3.6.0 (#1850)
yzhaiustc Oct 9, 2024
5366879
Handle MNK Sm90{Row, Col}Reduction problem shapes (#1803)
saagarjha Oct 14, 2024
755194a
add is_last_tile
hwu36 Oct 17, 2024
08101d9
Improve sm90 mixed dtype kernel (#1883)
sklevtsov-nvidia Oct 18, 2024
5b50a8f
Add GMMA shape m64n40k16 (#1864)
tridao Oct 22, 2024
d65266a
Add all supported GMMA shapes (#1890)
sklevtsov-nvidia Oct 22, 2024
f3a3bfc
add maximum support (#1833)
Xinyu302 Oct 23, 2024
ea69cc2
fix typo (#1853)
sijialouintel Oct 23, 2024
b0c09ed
fix by adding public (#1753)
Xinyu302 Oct 23, 2024
83ae20c
added mapping for bf16 to torch::kBFloat16 (#1843)
Bogumil-Sapinski-Mobica Oct 23, 2024
e5f3caf
Fix README (#1658)
leimao Oct 23, 2024
03e3bff
Adjusting code indentation (#1639)
103yiran Oct 23, 2024
f02913c
Include of regular_tile_iterator.h fixed for NVRTC (#1765)
MaxAkaAltmer Oct 23, 2024
12626bc
Update gemm_f16n_f16t_f32t_tensor_op_f32_sm80.cu with include "cutlas…
houqi Oct 23, 2024
be692b4
remove redundant hardcoded packing configs in mixed dtype gemm (#1894)
IwakuraRein Oct 23, 2024
a424ca6
fix wrong A/BLayout in MMA_Traits for binary mma and append other MMA…
CalebDu Oct 24, 2024
08a4995
Add a print for the uint{x}b_t type. (#1871)
luliyucoordinate Oct 24, 2024
e8a8b69
Refactor some GroupedGEMM logic (#1899)
azhurkevich Oct 26, 2024
19f5159
feat: support kFactor 8 used in mma tensor op tile iterator (#1512)
gavinchen430 Oct 29, 2024
9004ed2
Update publications (#1912)
wenlei-bao Nov 6, 2024
32e3c38
remove restriction of stride == kernel in nhwc_pooling (#1896)
thorneliu Nov 6, 2024
d656afb
fix undefined in device code error (#1880)
luliyucoordinate Nov 6, 2024
8aa95db
Fix the racing condition of mixed-input gemm when writing the registe…
IwakuraRein Nov 8, 2024
b0e09d7
Fix `cutlass` python library with cuda `12.6.2.post1` (#1942)
danthe3rd Nov 18, 2024
80243e0
add {uint4, uint2, int2} => {fp16, bf16} conversion (#1966)
IwakuraRein Dec 3, 2024
4c42f73
Improve mixed dtype GEMM (#1972)
IwakuraRein Dec 6, 2024
2b6cfd3
fix a typo that fails the compiling when ElementScale is not the same…
IwakuraRein Dec 10, 2024
33c5843
Fix CuTe README Typo (#1951)
leimao Dec 11, 2024
e1cd8c7
Fix Typo (#1962)
leimao Dec 11, 2024
b12b66f
3.6.0 update
Dec 20, 2024
87eaa69
doc and swap stuff
hwu36 Dec 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ A clear and concise description of what you expected to happen.
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]

**Additional context**
Add any other context about the problem here.
Add any other context about the problem here.
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: CUTLASS Discord
url: https://discord.gg/nvidiadeveloper
about: Come chat about using and contributing to CUTLASS!
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/documentation_request.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ A clear and concise description of what documentation you believe it is needed a
A clear and concise description of what you want to happen.

**Steps taken to search for needed documentation**
List any steps you have taken:
List any steps you have taken:
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/submit_question.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ assignees: ''

---

**What is your question?**
**What is your question?**
2 changes: 1 addition & 1 deletion .github/workflows/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ jobs:
steps:
- uses: actions/labeler@main
with:
repo-token: "${{ secrets.GITHUB_TOKEN }}"
repo-token: "${{ secrets.GITHUB_TOKEN }}"
2 changes: 1 addition & 1 deletion .github/workflows/new-issues-to-triage-projects.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_PROJECT_URL: https://github.com/NVIDIA/cutlass
GITHUB_PROJECT_COLUMN_NAME: 'Needs prioritizing'
GITHUB_PROJECT_COLUMN_NAME: 'Needs prioritizing'
2 changes: 1 addition & 1 deletion .github/workflows/stale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,4 @@ jobs:
exempt-pr-labels: "0 - Blocked,0 - Backlog,good first issue"
days-before-pr-stale: 90
days-before-pr-close: -1
operations-per-run: 50
operations-per-run: 50
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# PyCache files
__pycache__/
cutlass_library.egg-info/
345 changes: 279 additions & 66 deletions CHANGELOG.md

Large diffs are not rendered by default.

84 changes: 57 additions & 27 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,61 @@ message: >-
following metadata.
type: software
authors:
- given-names: Andrew
email: [email protected]
family-names: Kerr
- given-names: Vijay
family-names: Thakkar
email: [email protected]
affiliation: NVIDIA
- given-names: Pradeep
family-names: Ramani
email: [email protected]
affiliation: NVIDIA
- given-names: Cris
family-names: Cecka
email: [email protected]
affiliation: NVIDIA
- given-names: Aniket
family-names: Shivam
email: [email protected]
affiliation: NVIDIA
- given-names: Honghao
family-names: Lu
email: [email protected]
affiliation: NVIDIA
- given-names: Ethan
family-names: Yan
email: [email protected]
affiliation: NVIDIA
- given-names: Jack
family-names: Kosaian
email: [email protected]
affiliation: NVIDIA
- given-names: Mark
family-names: Hoemmen
email: [email protected]
affiliation: NVIDIA
- given-names: Haicheng
family-names: Wu
affiliation: NVIDIA
email: [email protected]
- given-names: Manish
family-names: Gupta
affiliation: Google
email: [email protected]
- given-names: Dustyn
family-names: Blasig
email: [email protected]
affiliation: NVIDIA
- given-names: Pradeep
family-names: Ramini
email: [email protected]
- given-names: Andrew
family-names: Kerr
email: [email protected]
affiliation: NVIDIA
- given-names: Matt
family-names: Nicely
email: [email protected]
affiliation: NVIDIA
- given-names: Duane
family-names: Merrill
email: [email protected]
affiliation: NVIDIA
- given-names: Aniket
family-names: Shivam
email: [email protected]
- given-names: Dustyn
family-names: Blasig
email: [email protected]
affiliation: NVIDIA
- given-names: Fengqi
family-names: Qiao
email: [email protected]
affiliation: NVIDIA
- given-names: Piotr
family-names: Majcher
Expand All @@ -49,10 +77,12 @@ authors:
family-names: Wang
email: [email protected]
affiliation: NVIDIA
- given-names: Matt
family-names: Nicely
email: [email protected]
affiliation: NVIDIA
- given-names: Manish
family-names: Gupta
affiliation: Google
email: [email protected]


repository-code: 'https://github.com/NVIDIA/cutlass'
abstract: >-
CUTLASS is a collection of CUDA C++ template
Expand All @@ -71,12 +101,12 @@ abstract: >-
flexibility simplifies their use as building blocks
within custom kernels and applications.
keywords:
- 'cutlass, tensor cores, cuda'
- 'cutlass, tensor cores, cuda, cute, nvidia, gpu, linear algebra, matrix computations'
license: BSD-3-Clause
license-url: https://github.com/NVIDIA/cutlass/blob/v2.9.0/LICENSE.txt
version: '2.9'
date-released: '2022-04-27'
license-url: https://github.com/NVIDIA/cutlass/blob/v3.0.0/LICENSE.txt
version: '3.0.0'
date-released: '2023-01-23'
identifiers:
- type: url
value: "https://github.com/NVIDIA/cutlass/tree/v2.9.0"
description: The GitHub release URL of tag 2.9.0
value: "https://github.com/NVIDIA/cutlass/tree/v3.0.0"
description: The GitHub release URL of tag 3.0.0
Loading