Software emulation being used for remote memory write instead of zero-copy

### Describe the bug
I am trying to use RDMA over the `srd` transport but am finding that UCX_PROTO_INFO=y shows that it is using "software emulation" instead of "zero-copy" when using NIXL. 

See below:
```
 +--------------------------------+-------------------------------------------------------------+
 | ucp_context_0 inter-node cfg#3 | remote memory write by ucp_put* from host memory to cuda    |
+--------------------------------+------------------------------------------+------------------+
 |                         0..inf | software emulation                       | srd/rdmap135s0:1 |
+--------------------------------+------------------------------------------+------------------+
```

However inside the same Pod when I run `ucx_perftest` it shows `zero-copy`
```
+---------------------------+---------------------------------------------------------------------------------------------------------------+
| perftest inter-node cfg#2 | remote memory write by ucp_put* from host memory to cuda                                                      |
   +---------------------------+-----------+---------------------------------------------------------------------------------------------------+
|                   0..1651 | copy-in   | srd/rdmap85s0:1                                                                                   |
 |                 1652..inf | zero-copy | 25% on srd/rdmap85s0:1, 25% on srd/rdmap86s0:1, 25% on srd/rdmap87s0:1 and 25% on srd/rdmap88s0:1 |
   +---------------------------+-----------+---------------------------------------------------------------------------------------------------+
```


What is the difference between the two configs so that `ucx_perftest` uses SRD with RDMA but Nixl using UCX backend uses software emulation?

### Steps to Reproduce
- Command line
- UCX version used (from github branch XX or release YY) + UCX configure flags (can be checked by `ucx_info -v`)
- Using commit: `7ec95b95e524a87e81cac92f5ca8523e3966b16b`
- **Any UCX environment variables used**

- name: UCX_RNDV_THRESH
        value: "inf"
- name: UCX_MAX_COMPONENT_MDS
        value: "32"
- name: UCX_MAX_RMA_LANES
        value: "4"
- name: UCX_PROTO_INFO
        value: "y"
- name: UCX_RNDV_SCHEME
        value: "put_zcopy"

### Setup and versions
- OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
   - `cat /etc/issue` or `cat /etc/redhat-release` + `uname -a`
   -> Running inside Kubernetes on Ubuntu 24.04
   - For Nvidia Bluefield SmartNIC include `cat /etc/mlnx-release` (the string identifies software and firmware setup)
- For RDMA/IB/RoCE related issues:
    - Driver version:
        - `rpm -q rdma-core` or `rpm -q libibverbs`
            -> rdma-core = rdma-core/noble-updates,now 50.0-2ubuntu0.2 amd64
            -> libibverbs = libibverbs1/noble-updates,now 50.0-2ubuntu0.2 amd64 [installed]
        - or: MLNX_OFED version `ofed_info -s`
   - HW information from `ibstat` or `ibv_devinfo -vv` command
     - 
```
hca_id: rdmap137s0
        transport:                      unspecified (4)
        fw_ver:                         0.0.0.0
        node_guid:                      398e:ae8a:0001:1400
        sys_image_guid:                 0000:0000:0000:0000
        vendor_id:                      0x1d0f
        vendor_part_id:                 61346
        hw_ver:                         0xEFA2
        phys_port_cnt:                  1
        max_mr_size:                    0x3000000000
        page_size_cap:                  0xfffff000
        max_qp:                         256
        max_qp_wr:                      4096
        device_cap_flags:               0x00000000
        max_sge:                        2
        max_sge_rd:                     1
        max_cq:                         512
        max_cqe:                        32768
        max_mr:                         262144
        max_pd:                         256
        max_qp_rd_atom:                 0
        max_ee_rd_atom:                 0
        max_res_rd_atom:                0
        max_qp_init_rd_atom:            0
        max_ee_init_rd_atom:            0
        atomic_cap:                     ATOMIC_NONE (0)
        max_ee:                         0
        max_rdd:                        0
        max_mw:                         0
        max_raw_ipv6_qp:                0
        max_raw_ethy_qp:                0
        max_mcast_grp:                  0
        max_mcast_qp_attach:            0
        max_total_mcast_qp_attach:      0
        max_ah:                         1024
        max_fmr:                        0
        max_srq:                        0
        max_pkeys:                      1
        local_ca_ack_delay:             0
        general_odp_caps:
        rc_odp_caps:
                                        NO SUPPORT
        uc_odp_caps:
                                        NO SUPPORT
        ud_odp_caps:
                                        NO SUPPORT
        xrc_odp_caps:
                                        NO SUPPORT
        completion_timestamp_mask not supported
        core clock not supported
        device_cap_flags_ex:            0x0
        tso_caps:
                max_tso:                        0
        rss_caps:
                max_rwq_indirection_tables:                     0
                max_rwq_indirection_table_size:                 0
                rx_hash_function:                               0x0
                rx_hash_fields_mask:                            0x0
        max_wq_type_rq:                 0
        packet_pacing_caps:
                qp_rate_limit_min:      0kbps
                qp_rate_limit_max:      0kbps
        tag matching not supported
        num_comp_vectors:               32
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x01
                        link_layer:             Unspecified
                        max_msg_sz:             0x22e0
                        port_cap_flags:         0x00000000
                        port_cap_flags2:        0x0000
                        max_vl_num:             1 (1)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           1
                        gid_tbl_len:            1
                        subnet_timeout:         0
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           50.0 Gbps (64)
                        GID[  0]:               fe80:0000:0000:0000:0491:8aff:feae:8e39
```
- For GPU related issues:
  - GPU type
  - Cuda: 
      - Drivers version
      - Check if peer-direct is loaded: `lsmod|grep nv_peer_mem` and/or gdrcopy: `lsmod|grep gdrdrv`

### Additional information (depending on the issue)
- OpenMPI version
- Output of `ucx_info -d` to show transports and devices recognized by UCX
```
Transport: srd
#         Device: rdmap162s0:1
#           Type: network
#  System device: rdmap162s0 (22)
#
#      capabilities:
#            bandwidth: 23571.39/ppn + 0.00 MB/sec
#              latency: 620 nsec
#             overhead: 75 nsec
#            put_bcopy: <= 4K
#            put_zcopy: <= 1G, up to 1 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 4K
#            get_zcopy: <= 1G, up to 1 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 21
#             am_bcopy: <= 4085
#             am_zcopy: <= 4085, up to 1 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 4085
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 11 bytes
#        iface address: 3 bytes
#       error handling: peer failure
```
- Configure result - config.log
- Log file - configure UCX with "--enable-logging" - and run with "UCX_LOG_LEVEL=data"


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Software emulation being used for remote memory write instead of zero-copy #10950

Describe the bug

Steps to Reproduce

Setup and versions

HW information from `ibstat` or `ibv_devinfo -vv` command

Additional information (depending on the issue)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Software emulation being used for remote memory write instead of zero-copy #10950

Description

Describe the bug

Steps to Reproduce

Setup and versions

HW information from ibstat or ibv_devinfo -vv command

Additional information (depending on the issue)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

HW information from `ibstat` or `ibv_devinfo -vv` command