|
11 | 11 | ### Features: |
12 | 12 | ### Bugfixes: |
13 | 13 |
|
| 14 | +## 1.15.0-rc2 (July 27, 2023) |
| 15 | +### Features: |
| 16 | +#### RDMA CORE (IB, ROCE, etc.) |
| 17 | +* Implemented is_reachable_v2 for IB interfaces |
| 18 | +#### Build |
| 19 | +* Enabled build with binutils 2.40 |
| 20 | +* Added versioned dependency to switch between packages with the same names |
| 21 | + |
| 22 | +### Bugfixes: |
| 23 | +#### UCP |
| 24 | +* Fixed endpoint reconfiguration error due to wrong locality detection |
| 25 | +#### RDMA CORE (IB, ROCE, etc.) |
| 26 | +* Fixed performance degradation when indirect atomic key is not supported by the hardware |
| 27 | +* Fixed remote access error to strict-order key because of wrong offset |
| 28 | +#### GPU (CUDA, ROCM) |
| 29 | +* Fixed CUDA IPC performance degradation after libnuma removal |
| 30 | + |
14 | 31 | ## 1.15.0-rc1 (May 10, 2023) |
15 | | -TBD |
| 32 | +### Features: |
| 33 | +#### UCP |
| 34 | +* Added 2-stage pipeline protocol in the new protocol infrastructure |
| 35 | +* Added reset and abort functionality of rendezvous protocols in the new infrastructure |
| 36 | +* Added zero-copy rendezvous data send protocol in the new infrastructure |
| 37 | +* Added support for user memory handle in the new protocol infrastructure |
| 38 | +* Added option to force ODP registration for certain memory types |
| 39 | +* Enabled lock free memory region deregistration |
| 40 | +* Updated allow/deny transport list feature to control auxiliary transport selection |
| 41 | +* Multiple performance improvements of the new protocol infrastructure |
| 42 | +* Multiple improvements in error and debug messages |
| 43 | +#### UCT |
| 44 | +* Split UCT_MD_MKEY_PACK_FLAG_INVALIDATE into two flags for RMA and AMO |
| 45 | +* Added put_zcopy and get_zcopy scheme support for self transport |
| 46 | +* Added base implementation of is_reachable_v2 API using intra/inter flag |
| 47 | +* Introduced MD capability for non-blocking registration memory types |
| 48 | +#### RDMA CORE (IB, ROCE, etc.) |
| 49 | +* Added option to control CQE zipping per CQ RX/TX direction |
| 50 | +* Added option to specify how DCI selects port under RoCE LAG |
| 51 | +* Added hw_dcs to the list of policies to select DCI by an endpoint |
| 52 | +* Removed implicit on-demand paging |
| 53 | +* Added option to set RoCE lag dct port for response under queue affinity mode |
| 54 | +* Improved IB memlock limit logging |
| 55 | +#### UCS |
| 56 | +* Added ucs_string_buffer_rbrk() to split token |
| 57 | +#### GPU (CUDA, ROCM) |
| 58 | +* Added support for atomic reply_buffer on GPU memory |
| 59 | +* Added system device information for AMD GPUs |
| 60 | +* Improved performance estimation of gdr_copy transport |
| 61 | +* Added a simplistic implementation of performance estimation of cuda_ipc transport |
| 62 | +* Improved performance estimation of cuda_ipc on Hopper architecture |
| 63 | +* Added rcache parameters for rocm transports |
| 64 | +* Introduced dmabuf support for rocm transports |
| 65 | +* Implemented asynchronous progress for the zcopy operations in the rocm_copy transport |
| 66 | +* Added option to enable using cross-device dmabuf file descriptor for rocm |
| 67 | +#### Java |
| 68 | +* Added Java bindings for exported memh feature |
| 69 | +#### Tests |
| 70 | +* Added a rocm docker container for testing |
| 71 | +* Added option to send client_id in iodemo test |
| 72 | +* Added support for multiple connections to the same server in iodemo test |
| 73 | +* Added synchronization before exit to hello world examples |
| 74 | +#### Tools |
| 75 | +* Added user-side memcpy option for AM benchmarks in ucx_perftest |
| 76 | +* Added wireshark LUA dissectors for some UCX protocols |
| 77 | +#### Build |
| 78 | +* Added a separate xpmem deb subpackage |
| 79 | +* Added aarch64 support to the binary distribution pipeline |
| 80 | +* Removed dependency on libnuma |
| 81 | + |
| 82 | +### Bugfixes: |
| 83 | +#### UCP |
| 84 | +* Fixed crash during connection manager cleanup |
| 85 | +* Fixed rkey index calculation for rendezvous protocol |
| 86 | +* Fixed rcache dump function |
| 87 | +* Removed logging from rkey unpack in release mode |
| 88 | +* Fixed dobule free of rkey in rendezvous protocol |
| 89 | +* Fixed rendezvous pipeline protocol error flow |
| 90 | +* Fixed error handling in rendezvous get zcopy protocol |
| 91 | +* Replay pending requests of wireup EP CM during connection establishment to prevent potential ordering issues and wrong configuration |
| 92 | +* Pass user-provided memory type to the function that checks whether the buffer can be sent inline or not |
| 93 | +* Avoid memory registration during UCP context initialization |
| 94 | +* Fixed CPU/device atomics selection in the new protocol infrastructure |
| 95 | +* Multiple fixes in the new protocol infrastructure information output |
| 96 | +#### UCT |
| 97 | +* Fixed exported memh packing |
| 98 | +* Fixed an error in checking return status of multi-threaded memory registration function |
| 99 | +#### RDMA CORE (IB, ROCE, etc.) |
| 100 | +* Added check for UAR support to memory domain opening |
| 101 | +* Fixed updating port counters for devx qp |
| 102 | +* Fixed ibv_create_cq error message on node without Infiniband |
| 103 | +* Fixed performance degradation due to using 2 paths on NDR400 by default |
| 104 | +* Removed unnecessary async lock which otherwise would block UD progress |
| 105 | +#### UCS |
| 106 | +* Fixed displaying wrong environment variable suggestions |
| 107 | +* Fixed VFS warning output |
| 108 | +* Fixed SEGV in ucs_debug_backtrace_next(), upon previous SEGV handling, due to ENOMEM situation |
| 109 | +* Fixed memory corruption when using UCX_MPOOL_FIFO=y |
| 110 | +#### UCM |
| 111 | +* Fixed mremap() override |
| 112 | +#### GPU (CUDA, ROCM) |
| 113 | +* Fixed usage of dmabuf when the buffer is not page-aligned |
| 114 | +* Removed async_cb from cuda_copy to avoid the issue with UCP worker async lock |
| 115 | +#### Java |
| 116 | +* Fixed leakage of jucx_request global references |
| 117 | +#### Documentation |
| 118 | +* Updated ucp_worker_release_address description |
| 119 | +#### Tests |
| 120 | +* Fixed wrong usage of ep_close in examples |
| 121 | +#### Tools |
| 122 | +* Removed support for librte from perf |
| 123 | +* Fixed worker flush deadlock when using multiple workers in ucx_perftest |
| 124 | +#### Build |
| 125 | +* Changed 'unsupported option' ICC command line warning to error |
| 126 | +* Removed never used fault-injection configuration option |
| 127 | +* Fixed obsolete macro warnings in new autoconf/libtool |
| 128 | +* Fixed building UCX with GCC 13 |
| 129 | +* Fixed UCX RPM build on machines that have libxpmem-devel rpm from MLNX_OFED installation |
| 130 | +* Fixed ucx-rdmacm package requirements |
| 131 | +* Fixed compilation errors with armcc-22.1 |
| 132 | +* Fixed passing port number to goperftest |
16 | 133 |
|
| 134 | +## 1.14.1 (May 22, 2023) |
| 135 | +### Bugfixes: |
| 136 | +* Fixed ROCm to prevent the locking of host pinned memory |
| 137 | +* Added CUDA 12 based UCX builds to the release flow |
| 138 | +* Increased the maximal number of endpoint configurations |
| 139 | +* Fixed filter for a slow-lanes in selection logic |
| 140 | +* Fixed TCP transport bandwidth calculation |
| 141 | +* Fixed device detection for ROCM |
| 142 | +* Fixed compatibility with CUDA 12 |
| 143 | +* Fixed rendezvous threshold for multi-path configurations |
| 144 | +* Fixed error message in case of static link |
| 145 | +* Fixed BlueField-3 detection |
| 146 | +* Multiple fixes for Azure CI pipeline |
17 | 147 |
|
18 | 148 | ## 1.14.0 (March 13, 2023) |
19 | 149 | ### Features: |
|
0 commit comments