Skip to content

Commit 75f0c25

Browse files
authored
Merge pull request #8241 from evgeny-leksikov/v1.13.0-rc1
v1.13.0 update NEWS and AUTHORS
2 parents 5879c44 + 918a101 commit 75f0c25

File tree

2 files changed

+171
-0
lines changed

2 files changed

+171
-0
lines changed

AUTHORS

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ Devendar Bureddy <[email protected]>
1717
Devesh Sharma <[email protected]>
1818
Dmitry Gladkov <[email protected]>
1919
Doug Jacobsen <[email protected]>
20+
Edgar Gabrriel <[email protected]>
21+
Elad Guttel <[email protected]>
2022
Elad Persiko <[email protected]>
2123
Eugene Voronov <[email protected]>
2224
Evgeny Leksikov <[email protected]>
@@ -31,9 +33,11 @@ Howard Pritchard <[email protected]>
3133
Huaxiang Fan <[email protected]>
3234
Igor Ivanov <[email protected]>
3335
Ilya Nelkenbaum <[email protected]>
36+
Ivan Kochin <[email protected]>
3437
Jakir Kham <[email protected]>
3538
Jason Gunthorpe <[email protected]>
3639
Jeff Daily <[email protected]>
40+
Liang Jiakun <[email protected]>
3741
John Snyder <[email protected]>
3842
Jonas Zhou <[email protected]>
3943
Joseph Schuchart <[email protected]>
@@ -48,10 +52,13 @@ Manjunath Gorentla Venkata <[email protected]>
4852
Marek Schimara <[email protected]>
4953
Mark Allen <[email protected]>
5054
Matthew Baker <[email protected]>
55+
Matthias Diener <[email protected]>
5156
Mike Dubman <[email protected]>
5257
Mikhail Brinskiy <[email protected]>
58+
5359
Nathan Hjelm <[email protected]>
5460
Netanel Yosephian <[email protected]>
61+
Ofir Farjon <[email protected]>
5562
Olly Perks <[email protected]>
5663
5764
Pavan Balaji <[email protected]>
@@ -75,6 +82,7 @@ Stephen Richmond <[email protected]>
7582
Swen Boehm <[email protected]>
7683
Tony Curtis <[email protected]>
7784
Valentin Petrov <[email protected]>
85+
Vasily Philipov <[email protected]>
7886
Wenbin Lu <[email protected]>
7987
8088
Yossi Itigin <[email protected]>

NEWS

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,169 @@
1111
### Features:
1212
### Bugfixes:
1313

14+
## 1.13.0 (May 19, 2022)
15+
#### Features
16+
##### Core
17+
* Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints
18+
* Added support for UCX static libraries
19+
* Added profiling for rkey management routines
20+
* PCIe relaxed order enabled by default for AMD CPUs
21+
#### UCP
22+
* Added API to pass pre-registered memory handle to UCP operations
23+
* Added implementation of AM rendezvous protocol
24+
* Added 2-stage pipeline rendezvous protocol for GPU
25+
* Added support for fragment mem_type for v1 pipeline proto, disabled by default
26+
* Added active message support for proto v2
27+
* Added UCP memory registration cache
28+
* Improved adaptive progress - deactivate iface when all p2p lanes are destroyed
29+
* Added support for user memh in proto_v1
30+
* Added support for selecting local address when creating a client endpoint
31+
* Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE
32+
* Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter
33+
#### UCT
34+
* Introduced API uct_md_mkey_pack_v2
35+
* Introduced UCT iface features API
36+
* Introduced max_inflight_eps parameter in perf_attr API
37+
* Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer
38+
* Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking
39+
#### RDMA CORE (IB, ROCE, etc.)
40+
* Introduced NDR autorecognition
41+
* Introduced CQE zipping support
42+
* Set the default MAX_RD_ATOMIC to maximum value supported by the hardware
43+
#### ROCM
44+
* Increased maximum number of HSA agents
45+
#### UCS
46+
* Added topo module infrastructure
47+
* Added memtrack and rcache information to VFS
48+
#### Tools
49+
* Added support for pre-registered memory in ucx_perftest
50+
* Added loopback transport support for UCT perf tests
51+
### Bugfixes
52+
#### Core
53+
* Fixed not deallocating memory from ucp_mem_unmap if no rcache
54+
* Fixed versioning infrastructure
55+
* Multiple code improvements: refactoring, debug prints and assertions, etc.
56+
* Multiple improvements in build, test and docs infrastructure
57+
#### UCP
58+
* Resolving remote EP ID when creating local EP disabled by default
59+
* Multiple fixes in keepalive protocol
60+
* Fixed initialization request send state if software RMA/AMO in use
61+
* Fixed error handling in RMA and BW lanes selection logic
62+
* Fixed CM wireup fallback
63+
* Fixed occasional crash in finalize
64+
* Fixed AM proto flags
65+
* Fixed single zcopy proto initialization for AM
66+
* Fixed proto v2 selection, take into account user header length
67+
* Fixed selecting auxiliary transports when creating EP for sending EP_REMOVED
68+
* Fixed printing invalid configuration
69+
* Fixed allocation of indirect remote ID for internal EP if connected EP supports PEER_FAILURE
70+
* Fixed memh allocation when no rcache
71+
* Fixed protocol selection logic for UCP AM send
72+
* Fixed error handling flow for EP discard requests from pending queue
73+
* Fixed EP destroy flow
74+
* Fixed rsc_index for prereg_md_map
75+
* Fixed wireup error handling flow Create EP which send WIREUP_MSG/EP_REMOVED with AM lane only
76+
* Fixed probe for multi-fragment eager
77+
* Fixed alignment for AM rdesc init
78+
* Fixed perf estimation for proto v2
79+
* Fixed CM wireup with proto v2
80+
* Fixed EP discard flow during fast-forward
81+
* Fixed datatype issue in TAG send
82+
* Fixed EP refcount overflow
83+
* Fixed EP error handling flow
84+
* Fixed wire compatibility in address unpacking
85+
* Fixed ucp_ep_close_nb for failed endpoint when related requests have registered memory that should be invalidated
86+
* Fixed fragmented proto v2
87+
* Fixed UCP address v2 packing/unpacking and usage of seg_size
88+
* Fixed purge requests on failed endpoint
89+
* Fixed error handling of connecting p2p lanes during WIREUP phase
90+
* Fixed UCP endpoint use after free
91+
#### UCT
92+
* Fixed ABI break of uct_ep_params_t
93+
* Fixed common intra-node keepalive protocol
94+
* Fixed a typo UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEIVCE -> UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEVICE
95+
* Fixed potential crash on MD mem alloc
96+
* Disabled PEER_FAILURE capability for XPMEM
97+
#### RDMA CORE (IB, ROCE, etc.)
98+
* Fixed 2G aligned MR registration
99+
* Fixed FC_HARD_REQ resending
100+
* Fixed remote access to invalidated MR
101+
* Fixed max_rd_atomic_dc value for DV
102+
* Fixed DC handshake logic
103+
* Fixed error handling flows
104+
* Fixed flush(CANCEL) with UD and DC transports
105+
* Fixed multi-path handling for passive endpoint with UD transport
106+
* Fixed attributes for DV QP creation
107+
* Fixed device query
108+
* Fixed memory leak in case of disabling RDMA transport
109+
* Fixed dci->pool_index initialization
110+
* Fixed fallback if port speed not detected
111+
* Fixed tag offload recv for inlined data
112+
* Fixed PKEY index initialization
113+
* Disabled mlx5 ifaces on verbs MD
114+
#### TCP
115+
* Fixed flush(CANCEL)
116+
* Fixed close protocol when UCT EP pairs have only RX capability
117+
* Fixed query local/remote saddr
118+
#### GPU (CUDA, ROCM)
119+
* Fixed a bug in invalidating address range in CUDA_IPC
120+
* Fixed CUDA context caching and cleanup
121+
* Fixed ROCM initialization
122+
* Fixed ROCM components compilation
123+
* Fixed IPC tls reachability check
124+
* Fixed ROCM memory type detection
125+
* Use ROCM remote_agent if available
126+
#### KNEM
127+
* Fixed memory registration cost
128+
#### UCM
129+
* Fixed potential hang on init
130+
#### UCS
131+
* Fixed name shadow problem in CentOS6.x
132+
#### Tools
133+
* Print stream API limits and handle stream feature in ucx_info
134+
* Replaced ucp_ep_close_nb by ucp_ep_close_nbx in examples
135+
* Replaced completed field by checking UCS status in io_demo
136+
#### JAVA
137+
* Throw exception if ucp_mem_query failed
138+
#### GO
139+
* Disabled go bindings in rpmbuild
140+
* Fixed configure behavior if can't find go compiler
141+
* Standalone performance benchmark
142+
* Increased port range + make it dependent on agent_id
143+
* Check compiler minimum version
144+
* Set GOCACHE to a local directory that is cleared for each job in CI
145+
* Disabled module for goperftest
146+
* Fixed OOS build
147+
148+
## 1.12.1 (March 21, 2022)
149+
#### Bugfixes
150+
* Fixed memory hooks for Cuda 11.5
151+
* Fixed memory type cache merge
152+
* Fixed continuously triggering wakeup fd when keepalive is used
153+
* Fixed memtype cache fallback when memory hooks are not installed
154+
* Fixed parsing header flags of worker address
155+
* Fixed pipeline protocol when sending from host memory to GPU memory
156+
* Fixed transport progress not deactivated when all transport's connections are closed
157+
* Fixed progress loop in io_demo application
158+
* Fixed ROCm segfault when using internal_ops functions
159+
* Fixed ROCm memory hooks
160+
* Fixed performance regression on A64FX
161+
* Fixed DCT create failure with rdma-core v22
162+
* Fixed golang bindings build
163+
* Fixed .deb package build on Ubuntu 22.04
164+
* Fixed build on archlinux
165+
166+
#### Important changes
167+
* If Cuda memory hooks on driver API cannot be installed, memory type cache and
168+
memory registration cache will be disabled. This may lead to lower performance
169+
of some applications on setups with NVIDIA GPUs, even if Cuda memory is not
170+
being used. Prior to this change, failing to install driver API hooks could
171+
lead to runtime errors or data corruption when Cuda memory is used and linked
172+
statically with cuda runtime.
173+
In order to revert to previous behavior (when the application is linked
174+
dynamically with cuda runtime), the user can set UCX_MEM_CUDA_HOOK_MODE=reloc.
175+
See more info in https://github.com/openucx/ucx/pull/7865.
176+
14177
## 1.12.0 (January 12, 2022)
15178
### Features:
16179
#### Core

0 commit comments

Comments
 (0)