Skip to content

Commit eb109a6

Browse files
ScottToddclaude
andauthored
Introduce tests for artifact structure (#3802)
## Motivation Progress on #3796. This adds a more comprehensive set of tests using the actual built artifacts, unlike the unit test in #3765 that just does static analysis of the artifact descriptors (toml files). > [!IMPORTANT] > The tests are currently FAILING, so I didn't hook them up to any workflows yet. > > Proposed plan: > 1. (This PR) Land the tests, runnable via workflow_dispatch > 2. Fix the artifact definitions / population code to the tests start passing > 3. Integrate the tests into existing CI workflows Running these tests continuously will guard against future regressions that would otherwise go unnoticed or be difficult to debug (race conditions!). ## Technical Details Files in artifacts should be unique across all artifacts. If they are not, we use extra storage and can run into race conditions during artifact extraction/flattening. This adds two key tests: 1. Cross-artifact collision testing (e.g. 'blas' vs 'support') 2. Within-artifact collision testing (e.g. 'blas_dev' vs 'blas_test') ## Test Plan * Fetch various sets of artifacts with `fetch_artifacts.py` locally and run the tests locally ```bash set THEROCK_ARTIFACTS_DIR=D:/scratch/claude/artifacts/22681848469-linux-gfx94X-dcgpu python -m pytest tests/test_artifact_structure.py -v ``` * Triggered test runs with workflow_dispatch in my fork ## Test Result | Run | Format | Fetch | Validate | Result | |---|---|---|---|---| | [22743296445](https://github.com/ScottTodd/TheRock/actions/runs/22743296445) | classic `.tar.xz` | 74s | 169s | 2 FAILED | | [22743345907](https://github.com/ScottTodd/TheRock/actions/runs/22743345907) | kpack `.tar.zst` | 12s | 18s | 2 FAILED | > [!NOTE] > * These are on github-hosted runners which may have slower network access to S3 than our eventual S3 hosted runners > * The multi-arch tests using .zst are much faster than the classic tests using .xz Cross-artifact logs snippet: ``` E Failed: Found 29 cross-artifact collision(s) across 193 archives (see #3796): E include/mxDataGenerator/DataGenerator.hpp E - blas (blas_dev_gfx94X-dcgpu.tar.xz) E - support (support_dev_generic.tar.xz) E include/mxDataGenerator/PreSwizzle.hpp E - blas (blas_dev_gfx94X-dcgpu.tar.xz) E - support (support_dev_generic.tar.xz) ``` (27 files in `include/mxDataGenerator/*`, 9 files in `lib/cmake/mxDataGenerator/*`, all shared between `blas_dev` and `support_dev`) Within-artifact logs snippet: ``` E Failed: Found within-artifact component collisions in 10 artifact(s) (4682 total files). Components should be disjoint (see #3796): E base (33 files): E share/rocprofiler-register/tests/CMakeLists.txt [run, test] E share/rocprofiler-register/tests/amdhip/CMakeLists.txt [run, test] E share/rocprofiler-register/tests/amdhip/amdhip.cpp [run, test] E share/rocprofiler-register/tests/amdhip/amdhip.hpp [run, test] E share/rocprofiler-register/tests/bin/CMakeLists.txt [run, test] E ... and 28 more E blas (1 files): E share/hipsparse/test/hipsparse_clientmatrices.cmake [dev, test] E core-ocl (2 files): E share/opencl/ocltst/liboclperf.so [lib, test] E share/opencl/ocltst/liboclruntime.so [lib, test] ``` Within-artifact summary: ``` ┌─────────────────────┬─────────────┬───────────────────┐ │ Artifact │ Components │ Overlapping files │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocprofiler-compute │ lib vs test │ 4,139 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocprofiler-sdk │ run vs test │ 452 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocprofiler-sdk │ lib vs test │ 10 │ ├─────────────────────┼─────────────┼───────────────────┤ │ base │ run vs test │ 33 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocprofiler-compute │ run vs test │ 22 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocrtst │ run vs test │ 12 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rdc │ lib vs test │ 4 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocdecode │ dev vs test │ 3 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocgdb │ dev vs test │ 2 │ ├─────────────────────┼─────────────┼───────────────────┤ │ core-ocl │ lib vs test │ 2 │ ├─────────────────────┼─────────────┼───────────────────┤ │ blas │ dev vs test │ 1 │ ├─────────────────────┼─────────────┼───────────────────┤ │ miopen │ run vs test │ 1 │ ├─────────────────────┼─────────────┼───────────────────┤ │ rocprofiler-compute │ doc vs test │ 1 │ └─────────────────────┴─────────────┴───────────────────┘ ``` ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <[email protected]>
1 parent 222992d commit eb109a6

File tree

2 files changed

+384
-0
lines changed

2 files changed

+384
-0
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Copyright Advanced Micro Devices, Inc.
2+
# SPDX-License-Identifier: MIT
3+
4+
name: Test Artifact Structure
5+
6+
on:
7+
workflow_dispatch:
8+
inputs:
9+
artifact_group:
10+
type: string
11+
description: "Artifact group (e.g. gfx94X-dcgpu, gfx110X-all)"
12+
required: true
13+
amdgpu_targets:
14+
type: string
15+
description: "Individual GPU targets for split artifacts (e.g. gfx942, gfx1100)"
16+
default: ""
17+
artifact_run_id:
18+
type: string
19+
description: "Run ID to fetch artifacts from (defaults to current run)"
20+
default: ""
21+
workflow_call:
22+
inputs:
23+
artifact_group:
24+
type: string
25+
required: true
26+
amdgpu_targets:
27+
type: string
28+
default: ""
29+
artifact_run_id:
30+
type: string
31+
default: ""
32+
33+
permissions:
34+
contents: read
35+
36+
jobs:
37+
test_artifact_structure:
38+
name: "Validate artifact structure (${{ inputs.artifact_group }})"
39+
runs-on: ${{ github.repository_owner == 'ROCm' && 'azure-linux-scale-rocm' || 'ubuntu-24.04' }}
40+
env:
41+
ARTIFACT_RUN_ID: "${{ inputs.artifact_run_id != '' && inputs.artifact_run_id || github.run_id }}"
42+
ARTIFACT_GROUP: ${{ inputs.artifact_group }}
43+
AMDGPU_TARGETS: ${{ inputs.amdgpu_targets }}
44+
ARTIFACTS_DIR: ${{ github.workspace }}/artifacts
45+
steps:
46+
- name: Checkout repository
47+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
48+
49+
- name: Setting up Python
50+
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
51+
with:
52+
python-version: "3.12"
53+
54+
- name: Install test requirements
55+
run: |
56+
pip install -r requirements-test.txt
57+
58+
- name: Fetch artifact archives (no extract)
59+
env:
60+
GITHUB_TOKEN: ${{ github.token }}
61+
run: |
62+
python ./build_tools/fetch_artifacts.py \
63+
--run-id=${ARTIFACT_RUN_ID} \
64+
--artifact-group=${ARTIFACT_GROUP} \
65+
--amdgpu-targets=${AMDGPU_TARGETS} \
66+
--output-dir=${ARTIFACTS_DIR} \
67+
--no-extract
68+
69+
- name: Validate artifact structure
70+
env:
71+
THEROCK_ARTIFACTS_DIR: ${{ env.ARTIFACTS_DIR }}
72+
run: |
73+
python -m pytest tests/test_artifact_structure.py \
74+
-v --log-cli-level=info --timeout=300

tests/test_artifact_structure.py

Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
# Copyright Advanced Micro Devices, Inc.
2+
# SPDX-License-Identifier: MIT
3+
4+
"""Structural validation of artifact archives.
5+
6+
These tests scan artifact archives WITHOUT extracting them and check invariants
7+
that the build system should maintain. They run on CPU (no GPU required) and
8+
are designed to catch issues before artifacts are installed or tested on GPU
9+
runners.
10+
11+
Tests:
12+
- Cross-artifact overlaps: no two artifacts should produce files that
13+
flatten to the same path (causes silent overwrites or race conditions,
14+
see https://github.com/ROCm/TheRock/issues/3758).
15+
- Within-artifact component overlaps: different components (lib, run,
16+
test, etc.) of the same artifact should contain disjoint files (the
17+
component scanner should enforce this via the extends chain).
18+
- Manifest validation: every archive should have artifact_manifest.txt
19+
as its first member.
20+
21+
Usage:
22+
THEROCK_ARTIFACTS_DIR=/path/to/archives \\
23+
python -m pytest tests/test_artifact_structure.py -v --log-cli-level=info
24+
25+
THEROCK_ARTIFACTS_DIR should point to a directory containing artifact archives
26+
(*.tar.zst or *.tar.xz) as produced by the build pipeline.
27+
"""
28+
29+
import dataclasses
30+
import logging
31+
import os
32+
import sys
33+
from collections import defaultdict
34+
from pathlib import Path
35+
36+
import pytest
37+
38+
THIS_DIR = Path(__file__).resolve().parent
39+
sys.path.insert(0, str(THIS_DIR.parent / "build_tools"))
40+
41+
from _therock_utils.artifacts import ArtifactName, _open_archive_for_read
42+
43+
logger = logging.getLogger(__name__)
44+
45+
ARTIFACTS_DIR = os.getenv("THEROCK_ARTIFACTS_DIR", "")
46+
47+
48+
@pytest.fixture(scope="session")
49+
def artifacts_dir() -> Path:
50+
if not ARTIFACTS_DIR:
51+
pytest.skip("THEROCK_ARTIFACTS_DIR not set")
52+
path = Path(ARTIFACTS_DIR).resolve()
53+
if not path.is_dir():
54+
pytest.fail(f"THEROCK_ARTIFACTS_DIR is not a directory: {path}")
55+
return path
56+
57+
58+
def list_archive_files(archive_path: Path) -> tuple[list[str], list[str]]:
59+
"""List the flattened file paths in an artifact archive.
60+
61+
Reads artifact_manifest.txt (must be the first tar member) to get basedir
62+
prefixes, then strips those prefixes from each file member to produce
63+
the flattened output path.
64+
65+
Returns:
66+
(manifest_prefixes, flattened_file_paths)
67+
"""
68+
prefixes: list[str] = []
69+
flattened: list[str] = []
70+
71+
with _open_archive_for_read(archive_path) as tf:
72+
# Extract prefixes from the artifact manifest.
73+
manifest_member = tf.next()
74+
if manifest_member is None or manifest_member.name != "artifact_manifest.txt":
75+
raise ValueError(
76+
f"{archive_path.name}: expected artifact_manifest.txt as first member, "
77+
f"got {manifest_member.name if manifest_member else 'empty archive'}"
78+
)
79+
with tf.extractfile(manifest_member) as mf:
80+
# The artifact_manifest.txt will contain prefix lines like:
81+
# math-libs/BLAS/hipBLAS-common/stage
82+
# math-libs/BLAS/rocRoller/stage
83+
# math-libs/BLAS/hipBLAS/stage
84+
prefixes = [
85+
line for line in mf.read().decode().splitlines() if line.strip()
86+
]
87+
88+
# Strip prefixes from each file member to produce flattened paths.
89+
# For example:
90+
# prefix: math-libs/BLAS/hipBLAS/stage
91+
# member: math-libs/BLAS/hipBLAS/stage/include/hipblas/hipblas.h
92+
# flattened: include/hipblas/hipblas.h
93+
while member := tf.next():
94+
if member.isdir():
95+
continue
96+
name = member.name
97+
for prefix in prefixes:
98+
prefix_slash = prefix + "/"
99+
if name.startswith(prefix_slash):
100+
flattened_path = name[len(prefix_slash) :]
101+
if flattened_path:
102+
flattened.append(flattened_path)
103+
break
104+
105+
return prefixes, flattened
106+
107+
108+
def discover_archives(artifacts_dir: Path) -> list[Path]:
109+
"""Find all artifact archives in a directory."""
110+
archives = []
111+
for ext in ("*.tar.zst", "*.tar.xz"):
112+
archives.extend(artifacts_dir.glob(ext))
113+
return sorted(archives)
114+
115+
116+
@dataclasses.dataclass
117+
class ArchiveInfo:
118+
"""Metadata and flattened file listing for a single artifact archive."""
119+
120+
artifact_name: str
121+
component: str
122+
filename: str
123+
flattened_paths: set[str]
124+
125+
126+
@dataclasses.dataclass
127+
class CrossArtifactOverlap:
128+
"""A flattened file path found in multiple artifacts."""
129+
130+
path: str
131+
sources: list[tuple[str, str]] # (artifact_name, archive_filename)
132+
133+
134+
@dataclasses.dataclass
135+
class ComponentOverlap:
136+
"""Files duplicated across components within one artifact."""
137+
138+
artifact_name: str
139+
overlaps: dict[str, list[str]] # path -> [component_name, ...]
140+
141+
142+
@pytest.fixture(scope="session")
143+
def archive_index(artifacts_dir: Path) -> list[ArchiveInfo]:
144+
"""Scan all archives once and return a flat list of ArchiveInfo."""
145+
archives = discover_archives(artifacts_dir)
146+
if not archives:
147+
pytest.skip(f"No artifact archives found in {artifacts_dir}")
148+
149+
index: list[ArchiveInfo] = []
150+
skipped: list[str] = []
151+
152+
for archive_path in archives:
153+
archive_name = archive_path.name
154+
an = ArtifactName.from_filename(archive_name)
155+
if an is None:
156+
logger.warning("Skipping unrecognized archive: %s", archive_name)
157+
skipped.append(archive_name)
158+
continue
159+
160+
logger.info("Listing %s ...", archive_name)
161+
try:
162+
_prefixes, flattened_paths = list_archive_files(archive_path)
163+
except Exception:
164+
logger.exception("Failed to read %s", archive_name)
165+
skipped.append(archive_name)
166+
continue
167+
168+
index.append(
169+
ArchiveInfo(
170+
artifact_name=an.name,
171+
component=an.component,
172+
filename=archive_name,
173+
flattened_paths=set(flattened_paths),
174+
)
175+
)
176+
177+
if skipped:
178+
logger.warning("Skipped %d archives: %s", len(skipped), skipped)
179+
180+
artifact_names = {a.artifact_name for a in index}
181+
logger.info(
182+
"Indexed %d archives across %d artifacts", len(index), len(artifact_names)
183+
)
184+
return index
185+
186+
187+
def _format_cross_artifact_overlaps(overlaps: list[CrossArtifactOverlap]) -> str:
188+
"""Format cross-artifact overlaps into readable summary."""
189+
lines = []
190+
for overlap in sorted(overlaps, key=lambda o: o.path):
191+
lines.append(f" {overlap.path}")
192+
for label, archive in sorted(overlap.sources):
193+
lines.append(f" - {label} ({archive})")
194+
return "\n".join(lines)
195+
196+
197+
def _format_component_overlaps(overlaps: list[ComponentOverlap]) -> str:
198+
"""Format within-artifact component overlaps into readable summary."""
199+
lines = []
200+
total = 0
201+
for overlap in sorted(overlaps, key=lambda o: o.artifact_name):
202+
total += len(overlap.overlaps)
203+
lines.append(f" {overlap.artifact_name} ({len(overlap.overlaps)} files):")
204+
for fpath in sorted(overlap.overlaps):
205+
comps = sorted(set(overlap.overlaps[fpath]))
206+
lines.append(f" {fpath} [{', '.join(comps)}]")
207+
return total, "\n".join(lines)
208+
209+
210+
class TestArtifactStructure:
211+
"""Structural validation of artifact archives."""
212+
213+
def test_no_cross_artifact_overlaps(self, archive_index: list[ArchiveInfo]):
214+
"""No two artifacts should contain files that flatten to the same path.
215+
216+
This catches both same-basedir overlaps (like #3758) and the subtler
217+
cross-basedir case where different stage dirs install identically-named
218+
files (e.g., two subprojects both installing "bin/sequence.yaml").
219+
220+
See https://github.com/ROCm/TheRock/issues/3796
221+
"""
222+
# For each flattened path, track which artifacts contain it.
223+
# flattened_path -> { artifact_name: set of archive filenames }
224+
path_artifacts: dict[str, dict[str, set[str]]] = defaultdict(
225+
lambda: defaultdict(set)
226+
)
227+
228+
for info in archive_index:
229+
for fpath in info.flattened_paths:
230+
path_artifacts[fpath][info.artifact_name].add(info.filename)
231+
232+
# A cross-artifact overlap is a path claimed by more than one artifact
233+
# (e.g. "blas", "support").
234+
overlaps: list[CrossArtifactOverlap] = []
235+
for fpath, artifact_archives in path_artifacts.items():
236+
if len(artifact_archives) > 1:
237+
sources = []
238+
for artifact_name, archives in artifact_archives.items():
239+
for archive in archives:
240+
sources.append((artifact_name, archive))
241+
overlaps.append(CrossArtifactOverlap(path=fpath, sources=sources))
242+
243+
if overlaps:
244+
summary = _format_cross_artifact_overlaps(overlaps)
245+
pytest.fail(
246+
f"Found {len(overlaps)} cross-artifact overlap(s) across "
247+
f"{len(archive_index)} archives "
248+
f"(see https://github.com/ROCm/TheRock/issues/3796):\n{summary}"
249+
)
250+
251+
logger.info(
252+
"Checked %d unique paths across %d archives, no cross-artifact overlaps",
253+
len(path_artifacts),
254+
len(archive_index),
255+
)
256+
257+
def test_no_within_artifact_component_overlaps(
258+
self, archive_index: list[ArchiveInfo]
259+
):
260+
"""Components of the same artifact should contain disjoint files.
261+
262+
The component scanner (artifact_builder.py) enforces disjointness via
263+
the extends chain (lib -> run -> dbg -> dev -> doc). However, the
264+
"test" component has no extends, so it can re-claim files already taken
265+
by other components.
266+
267+
See https://github.com/ROCm/TheRock/issues/3796
268+
"""
269+
# Group archives by artifact name, merging target variants per component.
270+
# artifact_name -> component -> set of flattened paths
271+
by_artifact: dict[str, dict[str, set[str]]] = defaultdict(
272+
lambda: defaultdict(set)
273+
)
274+
for info in archive_index:
275+
by_artifact[info.artifact_name][info.component].update(info.flattened_paths)
276+
277+
# A component overlap is a path claimed by more than one component
278+
# within a single artifact name (e.g. "blas_run", "blas_test").
279+
overlaps: list[ComponentOverlap] = []
280+
for artifact_name, component_files in by_artifact.items():
281+
component_names = sorted(component_files.keys())
282+
# fpath -> list of component names that contain it
283+
artifact_overlaps: dict[str, list[str]] = {}
284+
for i in range(len(component_names)):
285+
for j in range(i + 1, len(component_names)):
286+
c1, c2 = component_names[i], component_names[j]
287+
for fpath in component_files[c1] & component_files[c2]:
288+
artifact_overlaps.setdefault(fpath, []).extend([c1, c2])
289+
290+
if artifact_overlaps:
291+
overlaps.append(
292+
ComponentOverlap(
293+
artifact_name=artifact_name, overlaps=artifact_overlaps
294+
)
295+
)
296+
297+
if overlaps:
298+
total, summary = _format_component_overlaps(overlaps)
299+
pytest.fail(
300+
f"Found within-artifact component overlaps in "
301+
f"{len(overlaps)} artifact(s) ({total} total files). "
302+
f"Components (_run, _test, etc.) should be disjoint "
303+
f"(see https://github.com/ROCm/TheRock/issues/3796):\n{summary}"
304+
)
305+
306+
artifact_names = {a.artifact_name for a in archive_index}
307+
logger.info(
308+
"Checked %d artifacts, no within-artifact component overlaps",
309+
len(artifact_names),
310+
)

0 commit comments

Comments
 (0)