Skip to content

Releases: NVIDIA/MatX

v0.9.4

27 Oct 17:38
ad55c6b

Choose a tag to compare

Note: MatX is approaching a 1.0 release with several major updates. 1.0 will contain CUDA JIT capabilities that allow better kernel fusion and overall improvements in kernel runtimes. Along with the JIT capabilities, most files have changes that allow for efficient improvements in the kernels. MatX 1.0 will require C++20 support in both the CUDA and host compilers. CUDA 11.8 support will no longer be supported.

Notable Changes:

  • apply() and apply_idx() operators for writing lambda-based custom operators

Full Changelog

Full Changelog: v0.9.3...v0.9.4

v0.9.3

26 Sep 23:30
86d0b82

Choose a tag to compare

New operators: find_peaks, zipvec
Key Updates:

  • C2R FFT transforms
  • Indexing speedup for accessing tensors

What's Changed

New Contributors

Full Changelog: v0.9.2...v0.9.3

v0.9.2

29 Jul 19:13
fa9e872

Choose a tag to compare

New operator: interp

Other Additions:

  • Improvements to sparse support including new batched tri-diagonal solver
  • Automatic vectorization and ILP support
  • DLPack updated to 1.1
  • Many bug fixes

What's Changed

New Contributors

Full Changelog: v0.9.1...v0.9.2

v0.9.1

14 May 15:43
4475c22

Choose a tag to compare

Sparse support + bugfixes

  • New operators: argminmax, dense2sparse, sparse2dense, interp1, normalize, argsort
  • Removed requirement for --relaxed-constexpr
  • Added MatX NVTX domain
  • Significantly improved speed of svd and inv
  • Python integration sample
  • Experimental sparse tensor support (SpMM and solver routines supported)
  • Significantly reduced FFT memory usage

What's Changed

Read more

v0.9.0

15 Oct 18:12
af55b57

Choose a tag to compare

Version v0.9.0 adds comprehensive support for more host CPU transforms such as BLAS and LAPACK, including multi-threaded versions.

Beyond the CPU support, there are many more minor improvements:

  • Added several new operators include vector_norm, matrix_norm, frexp, diag, and more
  • Many compiler fixes to support a wider range of older and newer compilers
  • Performance improvements to avoid overhead of permutation operators when unnecessary
  • Much more!

A full changelist is below

What's Changed

Read more

v0.8.0

04 Apr 17:27
7719779

Choose a tag to compare

Release highlights:

  • Features
    • Updated cuTENSOR and cuTensorNet versions
    • Added configurable print formatting
    • ARM FFT support via NVPL
    • New operators: abs2(), outer(), isnan(), isinf()
    • Many more unit tests for CPU tests
  • Bug fixes for matmul on Hopper, 2D FFTs, and more

Full changelist:

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7.0

04 Jan 21:06

Choose a tag to compare

Features

Fixes

Full Changelog: v0.6.0...v0.7.0

v0.6.0

02 Oct 16:50

Choose a tag to compare

Notable Updates

Full changelog below:

What's Changed

New Contributors

Full Changelog: v0.5.0...v0.6.0

v0.5.0

03 Jul 21:38

Choose a tag to compare

Notable Updates

  • Documentation rewritten to include working examples for every function based on unit tests
  • Polyphase resampler based on SciPy/cuSignal's resample_poly

Full changelog below:

What's Changed

New Contributors

Full Changelog: v0.4.1...v0.5.0

v0.4.1

02 Jun 15:17

Choose a tag to compare

This is a minor release mostly focused on bug fixes for different compilers and CUDA versions. One major feature added was all reductions are supported on the host using a single threaded executor. Multi-threaded executor support coming soon.

What's Changed

Full Changelog: v0.4.0...v0.4.1