Skip to content

Conversation

@ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Jun 4, 2025

My main motivation here is making it easier to run either fetch_artifacts.py or install_rocm_from_artifacts.py on developer machines to pull down larger, more complete sets of artifacts. The existing code had an explicit list list of artifact names and component types that would be downloaded, with special case handling for "base only" and "no tests".

Now,

  • (Edit: this was load bearing... needs some dependency modeling to solve more generally) Artifact names available for download are derived from the scraped index file, rather than a hardcoded list
  • Components (lib, dev, test) are considered based on the file name patterns parsed by the ArtifactName utility class
  • Target families are treated as "generic" or target specific using the same ArtifactName utility class

While I was modifying the files, I also made these changes too:

  • More type hints
  • New flags: --verbose and --dry-run
  • Line-wrapping and style tweaks for documentation/comments
  • Renamed --build-dir to --output-dir

This change couldn't easily be split into multiple commits due to how interconnected these scripts are and how some documentation is repetitive. I've at least added a few new unit tests so future changes can be made with more confidence. I could split into smaller PRs on request, though many changes overlap.

Comment on lines -142 to -153
base_artifacts = [
"core-runtime_run",
"core-runtime_lib",
"sysdeps_lib",
"base_lib",
"amd-llvm_run",
"amd-llvm_lib",
"core-hip_lib",
"core-hip_dev",
"rocprofiler-sdk_lib",
"host-suite-sparse_lib",
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... this is a workaround for us not having the artifact dependency graph modeled.

When using these scripts locally, I typically want "everything", maybe excluding certain components (e.g. tests, docs) and maybe excluding certain names (e.g. miopen).

When running on CI for tests of an individual project, we want that project's test artifacts and whatever they depend on.

ScottTodd added a commit that referenced this pull request Jun 16, 2025
Splitting this change off from #772.

* Replaced tuple with `ArtifactDownloadRequest` dataclass. We can more
easily attach metadata to this, use it in unit tests, and pass it
between more functions without needing to remember what the tuple
represents.
* Renamed some functions and variables to reflect what they _do_, not
how they are implemented. This makes it easier to change implementations
later without also needing to change the function names.
ScottTodd added a commit that referenced this pull request Jun 16, 2025
Splitting this change off from #772.

This will make it easier to work with artifacts that are on S3 and other
non-local locations, where we'll have a URL and a filename, not a Path.
We can also see about using
[`cloudpathlib`](https://cloudpathlib.drivendata.org/stable/), "A Python
library with classes that mimic `pathlib.Path`'s interface for URIs from
different cloud storage services".
ScottTodd added a commit that referenced this pull request Jun 17, 2025
Splitting off more changes from
#772.

I'm hoping to reuse this script for local rocm Python wheel building,
particularly on Windows: #827, so
I'll have a few more refactors coming soon. We should also be able to
use the script on Linux after some workflow refactors here:
https://github.com/ROCm/TheRock/blob/fc8767523fc639a17ba2fd9a5b5137eca334617f/.github/workflows/release_portable_linux_packages.yml#L136-L155

See also discussion on #779.
ScottTodd added a commit that referenced this pull request Jun 27, 2025
I have more ideas for how to refactor here (see
#772), but this is a reasonably
non-invasive way to download all artifacts from a workflow run without
affecting existing workflows that only want to download a subset.

A few things I'm grappling with on the design side here, which might
warrant a new script that could replace this one:

* List available workflow runs, release builds, etc. (from S3
directories, an index page, or the github API?)
* Enumerate files available for a given run (artifacts archives,
tarballs, python wheels)
* Filter files (based on target, category like "run" vs "test" vs
"doc"), optionally tracking dependencies
* Download files (staging in temp dirs? caching?)
* Extract files (as needed), optionally deleting the originals if not
cached
* Install files (into a venv for wheels)
@ScottTodd
Copy link
Member Author

I've landed a few parts of this in smaller commits already. Closing stale PR.

@ScottTodd ScottTodd closed this Jul 10, 2025
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Jul 10, 2025
@ScottTodd ScottTodd deleted the users/scotttodd/s3-fetching-qol branch July 10, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants