Skip to content

Conversation

@aktondak
Copy link
Collaborator

@aktondak aktondak commented Dec 18, 2025

Problem solved by the commit

This PR adds correct error handling when archive for xrt-smi validate is not found. There are following cases :

  1. When shim doesn't provide an archive path (alveo tests) : No error, simply return nullptr
  2. When shim provides a path (ryzenAI tests) and archive is not found at the said path : throw error with steps to install the required archive. This is handled by report_missing_archive API.
  3. Shim provides a path and archive exists : return archive pointer

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

None. Required for correct error handling when upstreaming the XRT package

How problem was solved, alternative solutions (if any) and why they were rejected

Solved via correct error handling

Risks (if any) associated the changes in the commit

None

What has been tested and how, request additional testing if necessary

Tested on linux after manually deleting the installed xrt_smi_strix.a archive. Following is the error seen when the archive is not found :

aktondak@xsjstrix45:/proj/rdi/staff/aktondak/xdna/xdna-driver/xrt/build$ xrt-smi validate 
WARNING: User doesn't have admin permissions to set performance mode. Running validate in Default mode
Error: Archive not found: amdxdna/bins/xrt_smi_strx.a
To install the required archive, run:
  smi_install_archive.sh xrt_smi_strx.a 2.21.0
Or download manually from:
  https://github.com/Xilinx/VTD/tree/main/archive
And place it in: /home/aktondak/.local/share/xrt/2.21.0/amdxdna/bins/
XRT build version: 2.21.0
Build hash: 1701c2b96ea62df00e7de3936d519096e09ebce1
Build date: 2025-12-22 12:06:20
Git branch: archive_update
PID: 1475895
UID: 50084377
[Mon Dec 22 20:43:36 2025 GMT]
HOST: xsjstrix45
EXE: /opt/xilinx/xrt/bin/unwrapped/xrt-smi
[xrt-smi] ERROR: No such file or directory 'amdxdna/bins/xrt_smi_strx.a'

Documentation impact (if any)

None

Signed-off-by: Akshay Tondak <[email protected]>
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@aktondak aktondak requested a review from stsoe December 18, 2025 23:06
} catch (const std::exception& /*e*/) {
// Continue without archive - this is not a fatal error
} catch (const std::exception& e) {
std::cerr << "Error opening archive: " << e.what() << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to expand on this error and direct the user what to do. The code (platform_path) already is checks a few locations, one being custom to the current XRT build

  • $HOME/.local/share/xrt/<major>.<minor>.<patch>/amdxdna/bins

I suggest, but I am not sure what is best, that we make the catch Linux specific and direct the user to run an install script from VTD to copy the proper archive to to this version specific location. Let discuss.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, based on our offline discussion, we will ship a Linux specific bash script with the npu deb package which this error should point to. The script will run a "wget" to download correct version of archive from the VTD repository based on the installed version of XRT

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Let's do what you suggested. Can you add a script under core/tools/xbutil2 that can install the proper archive under $HOME/.local/share/xrt/<major>.<minor>.<patch>/amdxdna/bins? I will sort out where we install this script and then we can amend the message in the catch statement accordingly. I think you need some code in XBUtilities to compute the proper arguments to the script, like XRT major.minor.patch from a compile time define from CMake and the base archive name?

@aktondak aktondak added the do not merge hold off on merging label Dec 19, 2025
parthash0804 and others added 10 commits December 22, 2025 11:03
* add preemption and gemm test for aie4

Signed-off-by: AShivangi <[email protected]>

* fix lint errors

Signed-off-by: AShivangi <[email protected]>

---------

Signed-off-by: AShivangi <[email protected]>
* Event trace updates

Signed-off-by: Akshay Tondak <[email protected]>

* build fix and clang updates

Signed-off-by: Akshay Tondak <[email protected]>

* Re add sudo requirement

Signed-off-by: Akshay Tondak <[email protected]>

* update error handling

Signed-off-by: Akshay Tondak <[email protected]>

* Removing sudo requirememt and remove redundant try catch

Signed-off-by: Akshay Tondak <[email protected]>

---------

Signed-off-by: Akshay Tondak <[email protected]>
Co-authored-by: Akshay Tondak <[email protected]>
When executing a vector of runs or a runlist, it is the responsibility
of the caller to make sure the runlist is idle.

The profile's main execution routine for iterations inserts the
necessary wait in between iterations of the underlying runlist.

Signed-off-by: Soren Soe <[email protected]>
Added --queue-limit (-l) to control max number of pending jobs
in xrt-runner queue when running in script mode.

```
xrt-runner --help
usage: xrt-runner.exe [options]
 [--recipe <recipe.json>] recipe file to run
 [--profile <profile.json>] execution profile
 [--iterations <number>] override all profile iterations
 [--script <script>] runner script, enables multi-threaded execution
 [--threads <number>] number of threads to use when running script (default: #jobs)
 [--queue-limit <number>] max jobs in job queue running script (default: threads or jobs)
 [--dir <path>] directory containing artifacts (default: current dir)
 [--mode <latency|throughput|validate>] execute only specified mode (default: all)
 [--verbose <val>] set XRT verbosity level to specified value (default: 0)
 [--progress] show progress (same as --verbose 6)
 [--report [<file>]] output runner metrics to <file> or use stdout for no <file> or '-'

% xrt-runner.exe --recipe recipe.json --profile profile.json [--iterations <num>] [--dir <path>]
% xrt-runner.exe --script runner.json [--threads <num>] [--iterations <num>] [--dir <path>]
Note, [--threads <number>] overrides the default number, where default is the number of
jobs in the runner script.

Note, [--queue-limit <num>] sets the maximum number of pending jobs when running in
script mode.  If not set, the max number is set to 2 * number of threads used.

Note, [--iterations <num>] overrides iterations in profile.json, but not in runner script.
If the runner script specifies iterations for a recipe/profile pair, then this value is
sticky for that recipe/profile pair.

Note, [--mode <latency|throughput|validate>] filters execution sections in profile.json such
only specified modes are executed. If the runner script specifies a mode for a recipe/profile
pair, then this value is sticky for that recipe/profile pair.
```

Signed-off-by: Soren Soe <[email protected]>
* xbtracer: Link against abseil for protobuf 22+

Protobuf 22+ uses abseil for logging internally. On systems with newer
protobuf (e.g., Arch Linux with protobuf 33.0), the xbtracer tools fail
to link with undefined references to absl::log_internal symbols.

This fix:
- Detects protobuf >= 4.0 (corresponds to protobuf 22+ with new versioning)
- Finds abseil and links xbtracer targets against absl::log_internal_check_op
  and absl::log_internal_message when needed

Signed-off-by: Kashif Rasul <[email protected]>

* Remove seemingly redundant explicit link with ${XBTRACER_ABSL_LIBS}

Signed-off-by: Soren Soe <[email protected]>

---------

Signed-off-by: Kashif Rasul <[email protected]>
Signed-off-by: Soren Soe <[email protected]>
Co-authored-by: Soren Soe <[email protected]>
Signed-off-by: Akshay Tondak <[email protected]>
@aktondak aktondak requested a review from jvillarre as a code owner December 22, 2025 19:28
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@aktondak aktondak removed the request for review from jvillarre December 22, 2025 20:01
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Akshay Tondak <[email protected]>
@aktondak aktondak requested a review from stsoe December 22, 2025 22:21
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Collaborator

@stsoe stsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One suggestion related to inclusion of version.h.

#include "common/sysinfo.h"
#include "common/smi.h"
#include "common/module_loader.h"
#include "xrt/detail/version.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you are using xrt_build_version, but the problem with version.h is that it regenerates with different timestamp every time we run cmake, this breaks ccache. We have probably want to move xrt_build_version from version.h.in to version-slim.h.in and then have your code include version-slim.h. Can you try that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it to vesion-slim.h.ln. I see the incremental builds and not showing cmake prints anymore. Should've noticed that earlier

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Akshay Tondak <[email protected]>
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge hold off on merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants