You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUTLASS strives to achieve the highest performance possible on NVIDIA GPUs while also offering a
17
-
flexible composition that an be easily applied to solve new problems related to Deep Learning and
17
+
flexible composition that can be easily applied to solve new problems related to Deep Learning and
18
18
linear algebra. Though we intend to make CUTLASS as simple and straightforward as possible, given
19
19
a tradeoff between simplicity and performance, CUTLASS chooses performance. Consequently, several
20
20
design patterns are necessary to yield a composable structure while also satisfying these performance
@@ -31,7 +31,7 @@ CUTLASS embodies a design paradigm exemplified by the [CUB library](https://nvla
31
31
32
32
## <aname="S-patterns-tiles-iterators"></a> Tiles and Iterators
33
33
34
-
Efficient dense linear algebra computations emphasize data movement to match the execution of mathemtical operators to the flow of data. Consequently, CUTLASS defines a rich set of primitives for partitioning a tile of data among participating threads, warps, and threadblocks. CUTLASS applies the familiar iterator design pattern to provide an abstraction layer to (1.) access these tile objects and (2.) traverse a sequence of objects embedded in a higher level data structure. These subpartitions are typically defined by compile-time constants
34
+
Efficient dense linear algebra computations emphasize data movement to match the execution of mathematical operators to the flow of data. Consequently, CUTLASS defines a rich set of primitives for partitioning a tile of data among participating threads, warps, and threadblocks. CUTLASS applies the familiar iterator design pattern to provide an abstraction layer to (1.) access these tile objects and (2.) traverse a sequence of objects embedded in a higher level data structure. These subpartitions are typically defined by compile-time constants
35
35
specifying element type, size, and data layout. CUTLASS refers to subpartitions as _tiles_.
36
36
37
37
_Iterators_ are familiar design patterns in C++ that provide an abstraction for accessing individual
@@ -353,7 +353,7 @@ An example of splitK usage can be found [here](examples/06_splitK_gemm/splitK_ge
353
353
354
354
# Copyright
355
355
356
-
Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved.
356
+
Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
357
357
358
358
```
359
359
Redistribution and use in source and binary forms, with or without modification, are permitted
0 commit comments