Make it possible to manually trigger `nightly` tests based on a `branch` build #377

jameslamb · 2025-06-04T23:55:19Z

Description

On branch-25.08 across RAPIDS, as of this writing if you try to do the following:

merge a PR
manually trigger the test.yaml workflow, setting RAPIDS_SHA to the commit from that just-merged PR

The wrong CI artifacts will be downloaded, which means the tests will not actually be testing the intended state of the code, and might even outright fail.

It should be possible to do this.

Benefits of this work

Makes it possible to get a successful run of the nightly tests after merging relevant fixes, without having to wait for the next scheduled run.

Benefits of this:

failing nightlies can block CI on PRs (since Make nightly failures more visible to developers build-planning#127)
nightly tests generally cover a wider range of OS / Python / CUDA versions than PR CI, so you want to know as soon as possible if a change has really fixed them
- this can also be determined in PRs, to be fair, using the changes from make matrix type configurable #308

Acceptance Criteria

it is possible to manually trigger the nightly test workflow (test.yaml) and have it use artifacts produced from a recent branch build

Approach

Let's start with what "wrong CI artifacts will be downloaded" means.

When RAPIDS CI workloads run, they need to find the relevant just-built-in-CI conda packages or wheels to download.
That's done via rapids-github-run-id (rapidsai/gha-tools/tools/rapids-github-run-id).

It's important that this resolve to exactly 1 run ID, to uniquely identify a single artifact to download.

branch and nightly builds can be identical on most of the characteristics that GitHub CLI allows you to filter workflow runs by:

repo
commit SHA
branch
workflow file (by convention, build.yaml)

So today, rapids-github-run-id has logic like "build.yaml runs triggered by a push must be branch builds, build.yaml runs triggered by a workflow_dispatch must be nightly build"

That is fine under normal operation, by convention:

branch builds are triggered by push when a PR is merged
nightly builds are triggered by workflow_dispatch triggered by a nightly scheduled job (rapidsai/workflows - nightly-pipeline-trigger.yaml)

But in the scenario described in this issue, there is not a workflow_dispatch-triggered build.yaml run for that just-created commit SHA, which leads to rapids-github-run-id not find any runs!

Until rapidsai/gha-tools#192 is fixed, that means that the latest artifacts produced from any successful build.yaml run on that branch will be used. After that is fixed, this will result in a loud "could not find a run ID" type of error.

@ajschmidt8 and I talked about the fragility of relying on the event type for distinguishing between build types here: rapidsai/gha-tools#164 (comment)

I think the most reliable way to support this pattern is to do as @ajschmidt8 suggested there... make it possible to circumvent all of this by directly providing a run ID as an input when triggering test.yaml workflows.

This could look something like this:

make rapids-github-run-id look for an env variable like RAPIDS_GITHUB_RUN_ID, and just return that if it's present and non-empty
add an input to all the test.yaml workflows allowing users to specify that on calls

Notes

N/A

The text was updated successfully, but these errors were encountered:

jameslamb · 2025-06-10T18:22:52Z

Some other design ideas that came out of an offline discussion with @bdice

nightly test runs could install from rapidsai-nightly / nightly PyPI instead of downloading CI artifacts
rapids-github-run-id could be modified to not differentiate between branch and nightly builds, and just take like "the latest build.yaml run on branch {branch} at commit {sha}"
- (would need to think about the implications of that though)

bdice · 2025-06-10T18:36:39Z

nightly test runs could install from rapidsai-nightly / nightly PyPI instead of downloading CI artifacts

This might be fragile, as it introduces a weird race condition with PRs that might have been merged around the same time as nightly builds. It could be nice to have some way to use rapidsai-nightly / nightly PyPI in a manually triggered test job, but it's probably not a hard requirement.

rapids-github-run-id could be modified to not differentiate between branch and nightly builds, and just take like "the latest build.yaml run on branch {branch} at commit {sha}"

I really just want this for branch builds, not nightlies.

The primary need I have is a way to fix check-nightly-ci. Once we admin-merge a PR that is supposed to fix CI, we need a way to manually run test.yaml against the builds made from merging that PR, to unblock check-nightly-ci.

vyasr · 2025-06-10T21:52:54Z

I'm confused. What would resolving this issue allow that we cannot already do by specifying a build_type of branch in test.yaml, i.e. why isn't rapidsai/build-planning#147 enough? Is the issue that the switch to Github artifacts from downloads.rapids.ai broke this functionality?

bdice · 2025-06-10T22:02:44Z

Yes, artifacts broke the previous functionality. They tie artifacts to a run ID, rather than a commit.

vyasr · 2025-06-10T22:33:14Z

OK got it so we've gone back to having the same problem that we used to as a result. That makes sense.

jameslamb added the feature request New feature or request label Jun 4, 2025

jameslamb self-assigned this Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make it possible to manually trigger `nightly` tests based on a `branch` build #377

Make it possible to manually trigger `nightly` tests based on a `branch` build #377

jameslamb commented Jun 4, 2025 •

edited

Loading

jameslamb commented Jun 10, 2025

Uh oh!

bdice commented Jun 10, 2025 •

edited

Loading

Uh oh!

vyasr commented Jun 10, 2025 •

edited

Loading

Uh oh!

bdice commented Jun 10, 2025 •

edited

Loading

Uh oh!

vyasr commented Jun 10, 2025

Uh oh!

Make it possible to manually trigger nightly tests based on a branch build #377

Make it possible to manually trigger nightly tests based on a branch build #377

Comments

jameslamb commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Benefits of this work

Acceptance Criteria

Approach

Notes

jameslamb commented Jun 10, 2025

Uh oh!

bdice commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdice commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr commented Jun 10, 2025

Uh oh!

Make it possible to manually trigger `nightly` tests based on a `branch` build #377

Make it possible to manually trigger `nightly` tests based on a `branch` build #377

jameslamb commented Jun 4, 2025 •

edited

Loading

bdice commented Jun 10, 2025 •

edited

Loading

vyasr commented Jun 10, 2025 •

edited

Loading

bdice commented Jun 10, 2025 •

edited

Loading