|
11 | 11 | ### Features: |
12 | 12 | ### Bugfixes: |
13 | 13 |
|
| 14 | +## 1.13.0 (May 19, 2022) |
| 15 | +#### Features |
| 16 | +##### Core |
| 17 | +* Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints |
| 18 | +* Added support for UCX static libraries |
| 19 | +* Added profiling for rkey management routines |
| 20 | +* PCIe relaxed order enabled by default for AMD CPUs |
| 21 | +#### UCP |
| 22 | +* Added API to pass pre-registered memory handle to UCP operations |
| 23 | +* Added implementation of AM rendezvous protocol |
| 24 | +* Added 2-stage pipeline rendezvous protocol for GPU |
| 25 | +* Added support for fragment mem_type for v1 pipeline proto, disabled by default |
| 26 | +* Added active message support for proto v2 |
| 27 | +* Added UCP memory registration cache |
| 28 | +* Improved adaptive progress - deactivate iface when all p2p lanes are destroyed |
| 29 | +* Added support for user memh in proto_v1 |
| 30 | +* Added support for selecting local address when creating a client endpoint |
| 31 | +* Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE |
| 32 | +* Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter |
| 33 | +#### UCT |
| 34 | +* Introduced API uct_md_mkey_pack_v2 |
| 35 | +* Introduced UCT iface features API |
| 36 | +* Introduced max_inflight_eps parameter in perf_attr API |
| 37 | +* Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer |
| 38 | +* Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking |
| 39 | +#### RDMA CORE (IB, ROCE, etc.) |
| 40 | +* Introduced NDR autorecognition |
| 41 | +* Introduced CQE zipping support |
| 42 | +* Set the default MAX_RD_ATOMIC to maximum value supported by the hardware |
| 43 | +#### ROCM |
| 44 | +* Increased maximum number of HSA agents |
| 45 | +#### UCS |
| 46 | +* Added topo module infrastructure |
| 47 | +* Added memtrack and rcache information to VFS |
| 48 | +#### Tools |
| 49 | +* Added support for pre-registered memory in ucx_perftest |
| 50 | +* Added loopback transport support for UCT perf tests |
| 51 | +### Bugfixes |
| 52 | +#### Core |
| 53 | +* Fixed not deallocating memory from ucp_mem_unmap if no rcache |
| 54 | +* Fixed versioning infrastructure |
| 55 | +* Multiple code improvements: refactoring, debug prints and assertions, etc. |
| 56 | +* Multiple improvements in build, test and docs infrastructure |
| 57 | +#### UCP |
| 58 | +* Resolving remote EP ID when creating local EP disabled by default |
| 59 | +* Multiple fixes in keepalive protocol |
| 60 | +* Fixed initialization request send state if software RMA/AMO in use |
| 61 | +* Fixed error handling in RMA and BW lanes selection logic |
| 62 | +* Fixed CM wireup fallback |
| 63 | +* Fixed occasional crash in finalize |
| 64 | +* Fixed AM proto flags |
| 65 | +* Fixed single zcopy proto initialization for AM |
| 66 | +* Fixed proto v2 selection, take into account user header length |
| 67 | +* Fixed selecting auxiliary transports when creating EP for sending EP_REMOVED |
| 68 | +* Fixed printing invalid configuration |
| 69 | +* Fixed allocation of indirect remote ID for internal EP if connected EP supports PEER_FAILURE |
| 70 | +* Fixed memh allocation when no rcache |
| 71 | +* Fixed protocol selection logic for UCP AM send |
| 72 | +* Fixed error handling flow for EP discard requests from pending queue |
| 73 | +* Fixed EP destroy flow |
| 74 | +* Fixed rsc_index for prereg_md_map |
| 75 | +* Fixed wireup error handling flow Create EP which send WIREUP_MSG/EP_REMOVED with AM lane only |
| 76 | +* Fixed probe for multi-fragment eager |
| 77 | +* Fixed alignment for AM rdesc init |
| 78 | +* Fixed perf estimation for proto v2 |
| 79 | +* Fixed CM wireup with proto v2 |
| 80 | +* Fixed EP discard flow during fast-forward |
| 81 | +* Fixed datatype issue in TAG send |
| 82 | +* Fixed EP refcount overflow |
| 83 | +* Fixed EP error handling flow |
| 84 | +* Fixed wire compatibility in address unpacking |
| 85 | +* Fixed ucp_ep_close_nb for failed endpoint when related requests have registered memory that should be invalidated |
| 86 | +* Fixed fragmented proto v2 |
| 87 | +* Fixed UCP address v2 packing/unpacking and usage of seg_size |
| 88 | +* Fixed purge requests on failed endpoint |
| 89 | +* Fixed error handling of connecting p2p lanes during WIREUP phase |
| 90 | +* Fixed UCP endpoint use after free |
| 91 | +#### UCT |
| 92 | +* Fixed ABI break of uct_ep_params_t |
| 93 | +* Fixed common intra-node keepalive protocol |
| 94 | +* Fixed a typo UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEIVCE -> UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEVICE |
| 95 | +* Fixed potential crash on MD mem alloc |
| 96 | +* Disabled PEER_FAILURE capability for XPMEM |
| 97 | +#### RDMA CORE (IB, ROCE, etc.) |
| 98 | +* Fixed 2G aligned MR registration |
| 99 | +* Fixed FC_HARD_REQ resending |
| 100 | +* Fixed remote access to invalidated MR |
| 101 | +* Fixed max_rd_atomic_dc value for DV |
| 102 | +* Fixed DC handshake logic |
| 103 | +* Fixed error handling flows |
| 104 | +* Fixed flush(CANCEL) with UD and DC transports |
| 105 | +* Fixed multi-path handling for passive endpoint with UD transport |
| 106 | +* Fixed attributes for DV QP creation |
| 107 | +* Fixed device query |
| 108 | +* Fixed memory leak in case of disabling RDMA transport |
| 109 | +* Fixed dci->pool_index initialization |
| 110 | +* Fixed fallback if port speed not detected |
| 111 | +* Fixed tag offload recv for inlined data |
| 112 | +* Fixed PKEY index initialization |
| 113 | +* Disabled mlx5 ifaces on verbs MD |
| 114 | +#### TCP |
| 115 | +* Fixed flush(CANCEL) |
| 116 | +* Fixed close protocol when UCT EP pairs have only RX capability |
| 117 | +* Fixed query local/remote saddr |
| 118 | +#### GPU (CUDA, ROCM) |
| 119 | +* Fixed a bug in invalidating address range in CUDA_IPC |
| 120 | +* Fixed CUDA context caching and cleanup |
| 121 | +* Fixed ROCM initialization |
| 122 | +* Fixed ROCM components compilation |
| 123 | +* Fixed IPC tls reachability check |
| 124 | +* Fixed ROCM memory type detection |
| 125 | +* Use ROCM remote_agent if available |
| 126 | +#### KNEM |
| 127 | +* Fixed memory registration cost |
| 128 | +#### UCM |
| 129 | +* Fixed potential hang on init |
| 130 | +#### UCS |
| 131 | +* Fixed name shadow problem in CentOS6.x |
| 132 | +#### Tools |
| 133 | +* Print stream API limits and handle stream feature in ucx_info |
| 134 | +* Replaced ucp_ep_close_nb by ucp_ep_close_nbx in examples |
| 135 | +* Replaced completed field by checking UCS status in io_demo |
| 136 | +#### JAVA |
| 137 | +* Throw exception if ucp_mem_query failed |
| 138 | +#### GO |
| 139 | +* Disabled go bindings in rpmbuild |
| 140 | +* Fixed configure behavior if can't find go compiler |
| 141 | +* Standalone performance benchmark |
| 142 | +* Increased port range + make it dependent on agent_id |
| 143 | +* Check compiler minimum version |
| 144 | +* Set GOCACHE to a local directory that is cleared for each job in CI |
| 145 | +* Disabled module for goperftest |
| 146 | +* Fixed OOS build |
| 147 | + |
| 148 | +## 1.12.1 (March 21, 2022) |
| 149 | +#### Bugfixes |
| 150 | +* Fixed memory hooks for Cuda 11.5 |
| 151 | +* Fixed memory type cache merge |
| 152 | +* Fixed continuously triggering wakeup fd when keepalive is used |
| 153 | +* Fixed memtype cache fallback when memory hooks are not installed |
| 154 | +* Fixed parsing header flags of worker address |
| 155 | +* Fixed pipeline protocol when sending from host memory to GPU memory |
| 156 | +* Fixed transport progress not deactivated when all transport's connections are closed |
| 157 | +* Fixed progress loop in io_demo application |
| 158 | +* Fixed ROCm segfault when using internal_ops functions |
| 159 | +* Fixed ROCm memory hooks |
| 160 | +* Fixed performance regression on A64FX |
| 161 | +* Fixed DCT create failure with rdma-core v22 |
| 162 | +* Fixed golang bindings build |
| 163 | +* Fixed .deb package build on Ubuntu 22.04 |
| 164 | +* Fixed build on archlinux |
| 165 | + |
| 166 | +#### Important changes |
| 167 | +* If Cuda memory hooks on driver API cannot be installed, memory type cache and |
| 168 | + memory registration cache will be disabled. This may lead to lower performance |
| 169 | + of some applications on setups with NVIDIA GPUs, even if Cuda memory is not |
| 170 | + being used. Prior to this change, failing to install driver API hooks could |
| 171 | + lead to runtime errors or data corruption when Cuda memory is used and linked |
| 172 | + statically with cuda runtime. |
| 173 | + In order to revert to previous behavior (when the application is linked |
| 174 | + dynamically with cuda runtime), the user can set UCX_MEM_CUDA_HOOK_MODE=reloc. |
| 175 | + See more info in https://github.com/openucx/ucx/pull/7865. |
| 176 | + |
14 | 177 | ## 1.12.0 (January 12, 2022) |
15 | 178 | ### Features: |
16 | 179 | #### Core |
|
0 commit comments