Skip to content

Commit 4f554ab

Browse files
authored
Merge pull request #9244 from rakhmets/topic/news-1.15.0-rc1-rc2
NEWS: Updated NEWS for 1.15.0-rc1 and 1.15.0-rc2.
2 parents 5d4b390 + 7babd13 commit 4f554ab

File tree

1 file changed

+131
-1
lines changed

1 file changed

+131
-1
lines changed

NEWS

Lines changed: 131 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,139 @@
1111
### Features:
1212
### Bugfixes:
1313

14+
## 1.15.0-rc2 (July 27, 2023)
15+
### Features:
16+
#### RDMA CORE (IB, ROCE, etc.)
17+
* Implemented is_reachable_v2 for IB interfaces
18+
#### Build
19+
* Enabled build with binutils 2.40
20+
* Added versioned dependency to switch between packages with the same names
21+
22+
### Bugfixes:
23+
#### UCP
24+
* Fixed endpoint reconfiguration error due to wrong locality detection
25+
#### RDMA CORE (IB, ROCE, etc.)
26+
* Fixed performance degradation when indirect atomic key is not supported by the hardware
27+
* Fixed remote access error to strict-order key because of wrong offset
28+
#### GPU (CUDA, ROCM)
29+
* Fixed CUDA IPC performance degradation after libnuma removal
30+
1431
## 1.15.0-rc1 (May 10, 2023)
15-
TBD
32+
### Features:
33+
#### UCP
34+
* Added 2-stage pipeline protocol in the new protocol infrastructure
35+
* Added reset and abort functionality of rendezvous protocols in the new infrastructure
36+
* Added zero-copy rendezvous data send protocol in the new infrastructure
37+
* Added support for user memory handle in the new protocol infrastructure
38+
* Added option to force ODP registration for certain memory types
39+
* Enabled lock free memory region deregistration
40+
* Updated allow/deny transport list feature to control auxiliary transport selection
41+
* Multiple performance improvements of the new protocol infrastructure
42+
* Multiple improvements in error and debug messages
43+
#### UCT
44+
* Split UCT_MD_MKEY_PACK_FLAG_INVALIDATE into two flags for RMA and AMO
45+
* Added put_zcopy and get_zcopy scheme support for self transport
46+
* Added base implementation of is_reachable_v2 API using intra/inter flag
47+
* Introduced MD capability for non-blocking registration memory types
48+
#### RDMA CORE (IB, ROCE, etc.)
49+
* Added option to control CQE zipping per CQ RX/TX direction
50+
* Added option to specify how DCI selects port under RoCE LAG
51+
* Added hw_dcs to the list of policies to select DCI by an endpoint
52+
* Removed implicit on-demand paging
53+
* Added option to set RoCE lag dct port for response under queue affinity mode
54+
* Improved IB memlock limit logging
55+
#### UCS
56+
* Added ucs_string_buffer_rbrk() to split token
57+
#### GPU (CUDA, ROCM)
58+
* Added support for atomic reply_buffer on GPU memory
59+
* Added system device information for AMD GPUs
60+
* Improved performance estimation of gdr_copy transport
61+
* Added a simplistic implementation of performance estimation of cuda_ipc transport
62+
* Improved performance estimation of cuda_ipc on Hopper architecture
63+
* Added rcache parameters for rocm transports
64+
* Introduced dmabuf support for rocm transports
65+
* Implemented asynchronous progress for the zcopy operations in the rocm_copy transport
66+
* Added option to enable using cross-device dmabuf file descriptor for rocm
67+
#### Java
68+
* Added Java bindings for exported memh feature
69+
#### Tests
70+
* Added a rocm docker container for testing
71+
* Added option to send client_id in iodemo test
72+
* Added support for multiple connections to the same server in iodemo test
73+
* Added synchronization before exit to hello world examples
74+
#### Tools
75+
* Added user-side memcpy option for AM benchmarks in ucx_perftest
76+
* Added wireshark LUA dissectors for some UCX protocols
77+
#### Build
78+
* Added a separate xpmem deb subpackage
79+
* Added aarch64 support to the binary distribution pipeline
80+
* Removed dependency on libnuma
81+
82+
### Bugfixes:
83+
#### UCP
84+
* Fixed crash during connection manager cleanup
85+
* Fixed rkey index calculation for rendezvous protocol
86+
* Fixed rcache dump function
87+
* Removed logging from rkey unpack in release mode
88+
* Fixed dobule free of rkey in rendezvous protocol
89+
* Fixed rendezvous pipeline protocol error flow
90+
* Fixed error handling in rendezvous get zcopy protocol
91+
* Replay pending requests of wireup EP CM during connection establishment to prevent potential ordering issues and wrong configuration
92+
* Pass user-provided memory type to the function that checks whether the buffer can be sent inline or not
93+
* Avoid memory registration during UCP context initialization
94+
* Fixed CPU/device atomics selection in the new protocol infrastructure
95+
* Multiple fixes in the new protocol infrastructure information output
96+
#### UCT
97+
* Fixed exported memh packing
98+
* Fixed an error in checking return status of multi-threaded memory registration function
99+
#### RDMA CORE (IB, ROCE, etc.)
100+
* Added check for UAR support to memory domain opening
101+
* Fixed updating port counters for devx qp
102+
* Fixed ibv_create_cq error message on node without Infiniband
103+
* Fixed performance degradation due to using 2 paths on NDR400 by default
104+
* Removed unnecessary async lock which otherwise would block UD progress
105+
#### UCS
106+
* Fixed displaying wrong environment variable suggestions
107+
* Fixed VFS warning output
108+
* Fixed SEGV in ucs_debug_backtrace_next(), upon previous SEGV handling, due to ENOMEM situation
109+
* Fixed memory corruption when using UCX_MPOOL_FIFO=y
110+
#### UCM
111+
* Fixed mremap() override
112+
#### GPU (CUDA, ROCM)
113+
* Fixed usage of dmabuf when the buffer is not page-aligned
114+
* Removed async_cb from cuda_copy to avoid the issue with UCP worker async lock
115+
#### Java
116+
* Fixed leakage of jucx_request global references
117+
#### Documentation
118+
* Updated ucp_worker_release_address description
119+
#### Tests
120+
* Fixed wrong usage of ep_close in examples
121+
#### Tools
122+
* Removed support for librte from perf
123+
* Fixed worker flush deadlock when using multiple workers in ucx_perftest
124+
#### Build
125+
* Changed 'unsupported option' ICC command line warning to error
126+
* Removed never used fault-injection configuration option
127+
* Fixed obsolete macro warnings in new autoconf/libtool
128+
* Fixed building UCX with GCC 13
129+
* Fixed UCX RPM build on machines that have libxpmem-devel rpm from MLNX_OFED installation
130+
* Fixed ucx-rdmacm package requirements
131+
* Fixed compilation errors with armcc-22.1
132+
* Fixed passing port number to goperftest
16133

134+
## 1.14.1 (May 22, 2023)
135+
### Bugfixes:
136+
* Fixed ROCm to prevent the locking of host pinned memory
137+
* Added CUDA 12 based UCX builds to the release flow
138+
* Increased the maximal number of endpoint configurations
139+
* Fixed filter for a slow-lanes in selection logic
140+
* Fixed TCP transport bandwidth calculation
141+
* Fixed device detection for ROCM
142+
* Fixed compatibility with CUDA 12
143+
* Fixed rendezvous threshold for multi-path configurations
144+
* Fixed error message in case of static link
145+
* Fixed BlueField-3 detection
146+
* Multiple fixes for Azure CI pipeline
17147

18148
## 1.14.0 (March 13, 2023)
19149
### Features:

0 commit comments

Comments
 (0)