Extremely big latency when enabling shared memory

## Bug report

**Required Info:**

- Operating System:
  - Ubuntu 22.04
- Installation type:
  - From humble binaries
- Version or commit hash:
  - Humble last
- Client library (if applicable):
  - rclcpp
- Relevant hardware info
```bash
memory         15GiB System memory
processor      11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz 16 cores
display        GA107M [GeForce RTX 3050 Mobile]
```

#### Summary

I’ve been testing the performance of CycloneDDS with shared memory (SHM) enabled to demonstrate the performance improvements it should provide. I already encountered latencies I was not expecting using just CycloneDDS + iceoryx standalone, which I have reported in this [issue](https://github.com/eclipse-cyclonedds/cyclonedds/issues/2155). 

I've encountered two further issues only happening in rmw_cyclonedds:

1. The performance gain is almost non-existent. The SHM transport seems as performant as the COPY one, while in my CycloneDDS standalone tests, the latency for SHARED was half of the one for COPY. 
2. There is a latency explosion when increasing the number of subscribers, going well over the 200ms. In my testing machine, this happens around 18 subs. Please see the next section for the exact details of the test. 

#### Steps to reproduce issue

I used the [APEX performance test package](https://gitlab.com/ApexAI/performance_test) and [this dockerfile](https://gitlab.com/ApexAI/performance_test/-/blob/2.3.1/dockerfiles/Dockerfile.CycloneDDS?ref_type=tags) they provide (you can follow the instructions to spin up the docker directly there). 

I defined the following test:

```yaml
---
experiments:
  -
    process_configuration: INTER_PROCESS
    execution_strategy: INTER_THREAD
    sample_transport: 
      - BY_COPY
      - SHARED_MEMORY
      - LOANED_SAMPLES
    msg: Array8m
    pubs: 1
    subs: 
     - 1
     - 2
     - 4
     - 6
     - 8
     - 10
     - 12
     - 14
     - 16
     - 18
     - 20
     - 22
     - 24
     - 26
     - 28
     - 30
     - 32
    rate: 30
    reliability: BEST_EFFORT
    durability: VOLATILE
    history: KEEP_LAST
    history_depth: 5
    max_runtime: 30
    ignore_seconds: 5
```

That you can run (after installing the performance package) with:

```bash
ros2 run performance_report runner --configs definition.yaml
```

This test measures the average latency between a publisher publishing an 8MB payload (around the size of a typical pointcloud or a Full HD image) at 30Hz and an increasing number of subscribers, for the [three different available transports](https://gitlab.com/ApexAI/performance_test/-/tree/master/performance_test#:~:text=are%20currently%20implemented%3A-,Eclipse%20Cyclone%20DDS,-Eclipse%20Cyclone%20DDS) available for Cyclone: UDP(copy), SHM, and SHM with loaned samples. The QoS are the ones specified in the YAML. 

#### Expected results

A quite low latency as expected from Iceoryx mostly independent of the number of subscribers. Alternatively, if [this](https://github.com/eclipse-cyclonedds/cyclonedds/issues/2155) is indeed an issue, at least the same results that were obtained there. **For reference**:

![image](https://github.com/user-attachments/assets/6940aa92-318e-4565-8040-e0b39b246be5)

#### Actual results

A negligible performance gain for less than 10 subs that continues worsening until **exploding** around 24 subs.

![image](https://github.com/user-attachments/assets/c73f66a9-756b-46b2-9707-49237fd89a62)

I also ran the test in one of our robots and the explosion happens even sooner, so it seems to be related to CPU capabilities (as they have the same 16GB of RAM):

![image](https://github.com/user-attachments/assets/07d9223e-960e-4fc8-9450-b0d409800802)

Both the reference plot and the results plots were run in the same machine on the same conditions, just one after the other, with the same [memory pool configuration](https://gitlab.com/ApexAI/performance_test/-/blob/4d91978a1aa28d7963836f6d96be351b27d36413/dockerfiles/roudi_config.toml). At first I thought it may be running out of RAM or starting swapping, but I can confirm that doesn't seem to be the case. 

Do you guys have any ideas why this may happen and why it only seems to happen in the RMW integration of Cyclone?

Many thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extremely big latency when enabling shared memory #525

Bug report

Summary

Steps to reproduce issue

Expected results

Actual results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extremely big latency when enabling shared memory #525

Description

Bug report

Summary

Steps to reproduce issue

Expected results

Actual results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions