Skip to content

Releases: openucx/ucx

v1.19.1

27 Nov 16:38
a702467

Choose a tag to compare

1.19.1 (Sep 18, 2025)

Features:

UCP

  • Do not require transport memory support if rendezvous protocol is not used

Build

  • Added CUDA 13 support to the release pipeline
  • Added Rocky OS support to the release pipeline

Bugfixes:

UCS

  • Fixed Netlink fetch mechanism

v1.19.1-rc2

21 Oct 14:42
a702467

Choose a tag to compare

v1.19.1-rc2 Pre-release
Pre-release

1.19.1 (Oct 21, 2025)

Features:

UCP

  • Do not require transport memory support if rendezvous protocol is not used

Build

  • Added CUDA 13 support to the release pipeline
  • Added Rocky OS support to the release pipeline

Bugfixes:

UCS

  • Fixed Netlink fetch mechanism

v1.19.1-rc1

21 Sep 13:36
41180bd

Choose a tag to compare

v1.19.1-rc1 Pre-release
Pre-release

1.19.1 (Sep 18, 2025)

Features:

UCP

  • Do not require transport memory support if rendezvous protocol is not used

Build

  • Added CUDA 13 support to the release pipeline

v1.19.0

06 Aug 12:23
e463614

Choose a tag to compare

1.19.0 (August 6, 2025)

Features:

UCP

  • Enabled multi-GPU support within a single process
  • Added dynamic selection between strong and weak fences in RMA flush operations
  • Improved endpoint reconfiguration capabilities
  • Added All2All lane selection for multi-NIC-GPU systems
  • Improved rkey debug info when config cache limit is reached
  • Improved UCP protocol selection based on available memory types
  • Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA)
  • Improved RNDV performance with device-local staging buffers
  • Enabled error handling for RMA get_offload protocols

UCT

  • Defined uct_rkey_unpack_v2 API to support passing sys-dev

RDMA CORE (IB, ROCE, etc.)

  • Added SRD transport support in EFA with reordering, AM, and control operations
  • Removed XGVMI BF2 support (umem)
  • Removed device memory indirect key
  • Fixed VFS objects for DCIs and pools
  • Added routing table cache to the reachability check
  • Fixed strict order usage in IB auxiliary rkeys
  • Improved various init logging messages

CUDA

  • Added multi-context support for remote key unpacking to CUDA IPC
  • Added context switching aware resource management to CUDA IPC
  • Use buffer ID to detect VA recycling in CUDA IPC
  • Added support for allocating CUDA memory on specific system devices
  • Added multi-device support in CUDA copy
  • Improved protocol lane selection for GPU memory operations
  • Relaxed CUDA context requirements in CUDA copy
  • Added deadlock prevention in CUDA copy
  • Added support for address range detection for VMM
  • Enabled memory attributes query after switching CUDA GPU
  • Added multi-GPU send tests for CUDA transports
  • Removed host-to-host performance estimation from CUDA copy transport
  • Replaced cuCtxCreate by cuDevicePrimaryCtxRetain
  • Improved various init logging messages

ROCM

  • Added control parameters for IPC handle cache and signal pool size
  • Optimized ROCm memory type detection with caching

UCS

  • Removed compilation warnings

Tools

  • Added name filter option (-F 'str') to ucx_info for config and feature dumps
  • Improved ucx_info input validation

Bugfixes:

UCP

  • Made UCX_TLS=^ib disable all transports including auxiliary
  • Fixed send request status handling
  • Fixed performance degradation in RNDV by optimizing md cache updates
  • Fixed protocol selection when first lane is filtered out by fragment size
  • Fixed rkey selection by using memory registration flag

UCT

RDMA CORE (IB, ROCE, etc.)

  • Improved reliability of DC transport by adding DCI validation and separating connection logic
  • Fixed segfault in DC fence operation

GPU (CUDA, ROCM)

  • Updated ROCm configuration for ROCm 6.3 compatibility
  • Fixed system device detection for CUDA async memory operations
  • Fixed legacy type detection during CUDA IPC mpack
  • Fixed CUDA IPC RMA operations by using correct context for local buffers

UCS

  • Use UCS function for counting leading zeros on x86 architecture
  • Fixed a compilation warning

Shared Memory

  • Fixed FIFO availability check for sm transport

Documentation

  • Fixed open-mpi clone instruction

Build

  • Fixed enum-int-mismatch warnings with GCC 15

v1.19.0-rc2

22 Jul 08:17
13ae265

Choose a tag to compare

v1.19.0-rc2 Pre-release
Pre-release

1.19.0 (June 18, 2025)

Features:

UCP

  • Enabled multi-GPU support within a single process
  • Added dynamic selection between strong and weak fences in RMA flush operations
  • Improved endpoint reconfiguration capabilities
  • Added All2All lane selection for multi-NIC-GPU systems
  • Improved rkey debug info when config cache limit is reached
  • Improved UCP protocol selection based on available memory types
  • Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA)
  • Improved RNDV performance with device-local staging buffers
  • Enabled error handling for RMA get_offload protocols

UCT

  • Defined uct_rkey_unpack_v2 API to support passing sys-dev

RDMA CORE (IB, ROCE, etc.)

  • Added SRD transport support in EFA with reordering, AM, and control operations
  • Removed XGVMI BF2 support (umem)
  • Removed device memory indirect key
  • Fixed VFS objects for DCIs and pools
  • Added routing table cache to the reachability check
  • Fixed strict order usage in IB auxiliary rkeys
  • Improved various init logging messages

CUDA

  • Added multi-context support for remote key unpacking to CUDA IPC
  • Added context switching aware resource management to CUDA IPC
  • Use buffer ID to detect VA recycling in CUDA IPC
  • Added support for allocating CUDA memory on specific system devices
  • Added multi-device support in CUDA copy
  • Improved protocol lane selection for GPU memory operations
  • Relaxed CUDA context requirements in CUDA copy
  • Added deadlock prevention in CUDA copy
  • Added support for address range detection for VMM
  • Enabled memory attributes query after switching CUDA GPU
  • Added multi-GPU send tests for CUDA transports
  • Removed host-to-host performance estimation from CUDA copy transport
  • Replaced cuCtxCreate by cuDevicePrimaryCtxRetain
  • Improved various init logging messages

ROCM

  • Added control parameters for IPC handle cache and signal pool size
  • Optimized ROCm memory type detection with caching

UCS

  • Removed compilation warnings

Tools

  • Added name filter option (-F 'str') to ucx_info for config and feature dumps
  • Improved ucx_info input validation

Bugfixes:

UCP

  • Made UCX_TLS=^ib disable all transports including auxiliary
  • Fixed send request status handling
  • Fixed performance degradation in RNDV by optimizing md cache updates
  • Fixed protocol selection when first lane is filtered out by fragment size
  • Fixed rkey selection by using memory registration flag

UCT

RDMA CORE (IB, ROCE, etc.)

  • Improved reliability of DC transport by adding DCI validation and separating connection logic
  • Fixed segfault in DC fence operation

GPU (CUDA, ROCM)

  • Updated ROCm configuration for ROCm 6.3 compatibility
  • Fixed system device detection for CUDA async memory operations
  • Fixed legacy type detection during CUDA IPC mpack
  • Fixed CUDA IPC RMA operations by using correct context for local buffers

UCS

  • Use UCS function for counting leading zeros on x86 architecture
  • Fixed a compilation warning

Shared Memory

  • Fixed FIFO availability check for sm transport

Documentation

  • Fixed open-mpi clone instruction

Build

  • Fixed enum-int-mismatch warnings with GCC 15

v1.19.0-rc1

24 Jun 12:22
71a4b63

Choose a tag to compare

v1.19.0-rc1 Pre-release
Pre-release

1.19.0 (June 18, 2025)

Features:

UCP

  • Enabled multi-GPU support within a single process
  • Added dynamic selection between strong and weak fences in RMA flush operations
  • Improved endpoint reconfiguration capabilities
  • Added All2All lane selection for multi-NIC-GPU systems
  • Improved rkey debug info when config cache limit is reached
  • Improved UCP protocol selection based on available memory types
  • Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA)
  • Improved RNDV performance with device-local staging buffers
  • Enabled error handling for RMA get_offload protocols

UCT

  • Defined uct_rkey_unpack_v2 API to support passing sys-dev

RDMA CORE (IB, ROCE, etc.)

  • Added SRD transport support in EFA with reordering, AM, and control operations
  • Removed XGVMI BF2 support (umem)
  • Removed device memory indirect key
  • Fixed VFS objects for DCIs and pools
  • Added routing table cache to the reachability check
  • Fixed strict order usage in IB auxiliary rkeys
  • Improved various init logging messages

CUDA

  • Added multi-context support for remote key unpacking to CUDA IPC
  • Added context switching aware resource management to CUDA IPC
  • Use buffer ID to detect VA recycling in CUDA IPC
  • Added support for allocating CUDA memory on specific system devices
  • Added multi-device support in CUDA copy
  • Improved protocol lane selection for GPU memory operations
  • Relaxed CUDA context requirements in CUDA copy
  • Added deadlock prevention in CUDA copy
  • Added support for address range detection for VMM
  • Enabled memory attributes query after switching CUDA GPU
  • Added multi-GPU send tests for CUDA transports
  • Removed host-to-host performance estimation from CUDA copy transport
  • Replaced cuCtxCreate by cuDevicePrimaryCtxRetain
  • Improved various init logging messages

ROCM

  • Added control parameters for IPC handle cache and signal pool size
  • Optimized ROCm memory type detection with caching

UCS

  • Removed compilation warnings

Tools

  • Added name filter option (-F 'str') to ucx_info for config and feature dumps
  • Improved ucx_info input validation

Bugfixes:

UCP

  • Made UCX_TLS=^ib disable all transports including auxiliary
  • Fixed send request status handling
  • Fixed performance degradation in RNDV by optimizing md cache updates
  • Fixed protocol selection when first lane is filtered out by fragment size
  • Fixed rkey selection by using memory registration flag

UCT

RDMA CORE (IB, ROCE, etc.)

  • Improved reliability of DC transport by adding DCI validation and separating connection logic
  • Fixed segfault in DC fence operation

GPU (CUDA, ROCM)

  • Updated ROCm configuration for ROCm 6.3 compatibility
  • Fixed system device detection for CUDA async memory operations
  • Fixed legacy type detection during CUDA IPC mpack
  • Fixed CUDA IPC RMA operations by using correct context for local buffers

UCS

  • Use UCS function for counting leading zeros on x86 architecture
  • Fixed a compilation warning

Shared Memory

  • Fixed FIFO availability check for sm transport

Documentation

  • Fixed open-mpi clone instruction

Build

  • Fixed enum-int-mismatch warnings with GCC 15

v1.18.1

28 Apr 16:20
d9aa565

Choose a tag to compare

1.18.1 (April 28, 2025)

Features:

CUDA

  • Added config keys to update cuda_copy bandwidth for coherent platforms
  • Improved cache invalidation of memory allocated using CUDA memory pool

AZP

  • Added Ubuntu 24.04 to build and release pipeline

Bugfixes:

UCP

  • Fixed assertion failure when maximum lane fragment is smaller than AM header
  • Fixed potential active message user header use after free with protocol reconfiguration

CUDA

  • Fixed registration of CUDA Fabric memory allocated by UCT
  • Fixed VA recycling check of memory allocated using VMM and CUDA memory pool

RDMA CORE (IB, ROCE, etc.)

  • Do not use ConnectX-8 SMI subdevices for communication
  • Fixed remote access error by disabling ODP when the device supports DDP
  • Fixed configuration logic by disabling DDP when AR is disabled

UCM

  • Fixed crash with bistro hooks for CUDA 12.9 on amd64

v1.18.1 RC3

17 Apr 17:02
938ffcd

Choose a tag to compare

v1.18.1 RC3 Pre-release
Pre-release

1.18.1-rc3 (April 17, 2025)

Bugfixes:

UCM

  • Fixed crash with bistro hooks for CUDA 12.9 on amd64

v1.18.1 RC2

09 Apr 16:12
81baeb1

Choose a tag to compare

v1.18.1 RC2 Pre-release
Pre-release

1.18.1-rc2 (April 9, 2025)

Features:

CUDA

  • Added config keys to update cuda_copy bandwidth for coherent platforms
  • Improved cache invalidation of memory allocated using CUDA memory pool

Bugfixes:

UCP

  • Fixed assertion failure when maximum lane fragment is smaller than AM header

CUDA

  • Fixed registration of CUDA Fabric memory allocated by UCT
  • Fixed VA recycling check of memory allocated using VMM and CUDA memory pool

RDMA CORE (IB, ROCE, etc.)

  • Do not use ConnectX-8 SMI subdevices for communication
  • Fixed remote access error by disabling ODP when the device supports DDP
  • Fixed configuration logic by disabling DDP when AR is disabled

v1.18.1 RC1

21 Feb 22:58
3ed7241

Choose a tag to compare

v1.18.1 RC1 Pre-release
Pre-release

1.18.1-rc1 (February 20, 2025)

Features:

AZP

  • Added Ubuntu 24.04 to build and release pipeline

Bugfixes:

UCP

  • Fixed potential active message user header use after free with protocol reconfiguration