Note: MatX is approaching a 1.0 release with several major updates. 1.0 will contain CUDA JIT capabilities that allow better kernel fusion and overall improvements in kernel runtimes. Along with the JIT capabilities, most files have changes that allow for efficient improvements in the kernels. MatX 1.0 will require C++20 support in both the CUDA and host compilers. CUDA 11.8 support will no longer be supported.
Notable Changes:
- apply() and apply_idx() operators for writing lambda-based custom operators
Full Changelog
- Add profiling unit tests and fix timer safety by @cliffburdick in #1060
- Fixed-size reductions by @cliffburdick in #1061
- Fix gcc warning by @cliffburdick in #1062
- Added enum documentation for all operators by @cliffburdick in #1063
- Support ND operators and transforms to/from python by @cliffburdick in #1064
- Add prerun_done_ flag to prevent duplicate PreRun executions in transform operators by @cliffburdick in #1065
- Fix some iterator issues that come up with CCCL ToT by @miscco in #1066
- Properly use an
if constexprto guard segemented CUB algorithms by @miscco in #1067 - Fix cuTENSORNet/cuDSS library path and update to new cuTensorNet API by @cliffburdick in #1069
- Added apply() operator by @cliffburdick in #1072
- Update stdd docs by @cliffburdick in #1076
- Update release container to CUDA 13.0.1 by @tmartin-gh in #1068
- Add apply_idx operator for index-aware computations by @cliffburdick in #1077
- Fix missing include of
<cuda/std/utility>by @miscco in #1078
Full Changelog: v0.9.3...v0.9.4