Skip to content

Conversation

@shijin-aws
Copy link
Contributor

@shijin-aws shijin-aws commented Nov 25, 2025

commit 10ca04b introduced a bug that uses a
16 bit cq data intermediate variable when passing to
device. uint16_t is too small and
will silently corrupt when application is using more
than 16 bits while efa provider supports 32 bits.
This patch fixes it.

commit 10ca04b introduced a bug that introduces an
16 bit cq data intermediate variable when passing to
device. uint16_t is too small and
will silently corrupt when application is using more
than 16 bits while efa provider supports 32 bits.

This patch fixes it.

Signed-off-by: Shi Jin <[email protected]>
@shijin-aws shijin-aws requested a review from a team November 25, 2025 03:02
@shijin-aws shijin-aws changed the title prov/efa: Fix cq data size in efa-rdm pkt post prov/efa, fabtests: Fix cq data size in efa-rdm pkt post, fix cq_data test bugs Nov 25, 2025
@shijin-aws shijin-aws requested review from a team and j-xiong and removed request for a team November 25, 2025 03:44
@shijin-aws
Copy link
Contributor Author

@j-xiong can u review the fabtests change

The current run_test() has a bug, that the
cq data check and cq data size check are setting
the same return code. That means when a cq data
check failed but cq data size check succeeds,
the return code will be still 0. This patch
fixes this issue by making the test return
error directly when any check failed.

Signed-off-by: Shi Jin <[email protected]>
@a-szegel
Copy link
Contributor

Hey @j-xiong, Why did this PR fail intel CI?

@shijin-aws
Copy link
Contributor Author

the appveyor failure looks unrelated

 name:   rdm_atomic -o all -p tcp
  timestamp: Tue 11/25/2025 17:17:47.13
  result: Fail
  time:   13
  server_cmd: C:\projects\libfabric\fabtests\x64\Debug-v142\rdm_atomic -o all -p tcp -I 10 -s 127.0.0.1
  server_stdout:
hmem allocation error: Not enough space
  client_cmd: C:\projects\libfabric\fabtests\x64\Debug-v142\rdm_atomic -o all -p tcp -I 10 -s 127.0.0.1 127.0.0.1
  client_stdout:
hmem allocation error: Not enough space
- name:   rdm_atomic -o all -v -p tcp
  timestamp: Tue 11/25/2025 17:17:57.13
  result: Fail
  time:   10
  server_cmd: C:\projects\libfabric\fabtests\x64\Debug-v142\rdm_atomic -o all -v -p tcp -I 10 -s 127.0.0.1
  server_stdout:
hmem allocation error: Not enough space
  client_cmd: C:\projects\libfabric\fabtests\x64\Debug-v142\rdm_atomic -o all -v -p tcp -I 10 -s 127.0.0.1 127.0.0.1
  client_stdout:
hmem allocation error: Not enough space
- name:   rdm_cntr_pingpong -p tcp

@aingerson
Copy link
Contributor

@a-szegel Unrelated mpich test suite failure with tcp - bad file descriptor. You can ignore

@shijin-aws shijin-aws merged commit 0682ba4 into ofiwg:main Nov 25, 2025
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants