Skip to content

Fix for CR-1186978 #8273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 8, 2024
Merged

Fix for CR-1186978 #8273

merged 4 commits into from
Jul 8, 2024

Conversation

aktondak
Copy link
Collaborator

@aktondak aktondak commented Jul 4, 2024

Problem solved by the commit

This fixes the issue that an "abort" was thrown by windows when spawning more than 8 processes of xrt-smi simultaneously.

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

CR-1186978

How problem was solved, alternative solutions (if any) and why they were rejected

The problem was solved by guarding the opening of hardware context and kernel creation in a try catch block. We do not support more than 8 contexts of hardware and the tool should throw an exception in such cases, instead of an abort from OS.

What has been tested and how, request additional testing if necessary

Tested with the test-case used in the CR. The test-case now fails with tool erroring the following instead of abort:
Test 1 [00c5:00:01.1] : df-bw
Description : Run bandwidth test on data fabric
Xclbin : C:\Windows\System32\DriverStore\FileRepository\ipukmddrv.inf_amd64_d029eeb7affac76f\validate_17f0_10.xclbin
Details : Kernel name is 'DPU_PDI_0'
DPU-Sequence : C:\Windows\System32\DriverStore\FileRepository\ipukmddrv.inf_amd64_d029eeb7affac76f\DPU_Sequence/df_bw.txt
Details : Buffer size: '1'GB
No. of iterations: '600'
Error(s) : Command failed to complete successfully
(ERT_CMD_STATE_ERROR)
Test Status : [FAILED]

@gbuildx
Copy link
Collaborator

gbuildx commented Jul 4, 2024

Can one of the admins verify this patch?

Copy link
Collaborator

@AShivangi AShivangi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this catch for tct one and all column tests as well!

Copy link
Collaborator

@AShivangi AShivangi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@stsoe stsoe merged commit 54b1a03 into Xilinx:master Jul 8, 2024
17 checks passed
@aktondak aktondak deleted the CR-1186978 branch July 24, 2024 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants