Skip to content

Commit 8a7cf41

Browse files
committed
Implement Github action
Implements a Github action. Refer gh_action/README.md for its usage. Ref packagecontrol/thecrawl#66 Ref packagecontrol/thecrawl#166
1 parent 6c09806 commit 8a7cf41

File tree

7 files changed

+550
-0
lines changed

7 files changed

+550
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ dist/
1717
*.sublime-workspace
1818
st_package_reviewer/_version.py
1919
uv.lock
20+
.thecrawl/

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,11 @@ reported by the tool,
1616
[refer to the wiki][wiki].
1717

1818

19+
## Usage as a GitHub Action
20+
21+
See gh_action/README.md for how to run this as a composite action that runs on channel/registry PRs.
22+
23+
1924
## Installation
2025

2126
Requires **Python 3.13**.

gh_action/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# PR Channel Action
2+
3+
This composite action diffs a Package Control channel registry between a PR’s base and head commits, crawls only the changed and added packages using your thecrawl, downloads each release archive, and runs `st_package_reviewer` on the extracted contents. The job fails if any crawl, download, unzip, or review step fails.
4+
5+
## Inputs
6+
7+
- `pr` (required): Full PR URL, e.g. `https://github.com/wbond/package_control_channel/pull/9236`.
8+
- `file` (optional): Path to the channel or repository file inside the repo. Default: `repository.json`.
9+
- `thecrawl` (optional): Path to a local `thecrawl` repo, or a git URL to clone a fork/branch/commit. Default: `https://github.com/packagecontrol/thecrawl`
10+
11+
You can pin a ref with `@ref` for HTTPS URLs, e.g.:
12+
- `https://github.com/packagecontrol/thecrawl.git@feature-branch`
13+
- `https://github.com/packagecontrol/[email protected]`
14+
- `https://github.com/packagecontrol/thecrawl.git@abc1234`
15+
16+
## Example Usage
17+
18+
```yaml
19+
name: Channel Diff and Review
20+
on:
21+
pull_request:
22+
paths:
23+
- 'repository.json'
24+
25+
jobs:
26+
diff-and-review:
27+
runs-on: ubuntu-latest
28+
steps:
29+
- uses: actions/checkout@v4
30+
- name: Diff and review changed/added packages
31+
uses: ./gh_action
32+
with:
33+
pr: ${{ github.event.pull_request.html_url }}
34+
file: repository.json
35+
# thecrawl: ../thecrawl # optional path
36+
# thecrawl: https://github.com/packagecontrol/thecrawl@my-branch # optional URL with ref
37+
```
38+
39+
## Notes
40+
41+
- The action ensures `uv` is available via `astral-sh/setup-uv`. GitHub’s hosted runners include `gh` (GitHub CLI) by default.
42+
- If `thecrawl` is not provided, the action clones `https://github.com/packagecontrol/thecrawl`.
43+
- Network access is required to fetch raw files, zipballs, and the GitHub API. For GitHub zipball downloads, the action falls back to `gh api` if `curl` fails.
44+
45+
46+
## What It Does
47+
48+
- Resolves base/head repos and SHAs via `gh pr view`.
49+
- Builds a registry JSON at both SHAs using your local or cloned `thecrawl` (`uv run -m scripts.generate_registry`).
50+
- Diffs registries by package name; prints Removed/Changed/Added to stderr and emits changed+added names to stdout.
51+
- For each changed/added package:
52+
- Runs `uv run -m scripts.crawl --registry <target-registry> --workspace <ws.json> --name <pkg>`.
53+
- Reads the workspace JSON and downloads each release zip.
54+
- Unpacks the zip and runs `uv run st_package_reviewer <extracted_dir>`.
55+
- Aggregates failures and fails the job if any occurred.

gh_action/action.sh

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
usage() {
5+
cat >&2 <<EOF
6+
Usage: $0 --pr <pr_url> [--file <path>] [--thecrawl <path-or-url[@ref]>]
7+
8+
Arguments:
9+
--pr GitHub Pull Request URL (e.g. https://github.com/wbond/package_control_channel/pull/9236)
10+
--file Path within the repo to the channel JSON (default: repository.json)
11+
--thecrawl Path to local thecrawl repo or URL to clone (supports @ref to pin, default: https://github.com/packagecontrol/thecrawl)
12+
13+
Requires: gh, uv
14+
EOF
15+
}
16+
17+
PR_URL=""
18+
REL_PATH="repository.json"
19+
THECRAWL="https://github.com/packagecontrol/thecrawl"
20+
21+
while [[ $# -gt 0 ]]; do
22+
case "$1" in
23+
--pr)
24+
PR_URL="$2"; shift 2;;
25+
--file)
26+
REL_PATH="$2"; shift 2;;
27+
--thecrawl)
28+
THECRAWL="$2"; shift 2;;
29+
-h|--help)
30+
usage; exit 0;;
31+
*)
32+
echo "Unknown argument: $1" >&2; usage; exit 2;;
33+
esac
34+
done
35+
36+
if [[ -z "$PR_URL" ]]; then
37+
echo "Error: --pr is required" >&2; usage; exit 2
38+
fi
39+
40+
if ! command -v gh >/dev/null 2>&1; then
41+
echo "Error: gh (GitHub CLI) is required" >&2; exit 2
42+
fi
43+
if ! command -v uv >/dev/null 2>&1; then
44+
echo "Error: uv is required" >&2; exit 2
45+
fi
46+
47+
# Robust ZIP downloader with fallback to gh for GitHub zipball URLs
48+
download_zip() {
49+
local url="$1" dest="$2"
50+
mkdir -p "$(dirname "$dest")"
51+
rm -f "$dest.part" "$dest"
52+
# First try curl with retries
53+
if curl -fSL --retry 3 --retry-all-errors --connect-timeout 15 --max-time 600 \
54+
-o "$dest.part" "$url"; then
55+
mv "$dest.part" "$dest"
56+
return 0
57+
fi
58+
rm -f "$dest.part"
59+
# Fallback for codeload.github.com/<owner>/<repo>/zip/<ref>
60+
if [[ "$url" =~ ^https://codeload\.github\.com/([^/]+)/([^/]+)/zip/(.+)$ ]]; then
61+
local owner="${BASH_REMATCH[1]}" repo="${BASH_REMATCH[2]}" ref="${BASH_REMATCH[3]}"
62+
echo " curl failed; using gh api zipball for $owner/$repo@$ref" >&2
63+
if gh api -H "Accept: application/octet-stream" \
64+
"repos/${owner}/${repo}/zipball/${ref}" > "$dest.part"; then
65+
mv "$dest.part" "$dest"
66+
return 0
67+
fi
68+
rm -f "$dest.part"
69+
fi
70+
return 1
71+
}
72+
73+
# Normalize relative path (strip leading ./)
74+
REL_PATH="${REL_PATH#./}"
75+
76+
echo "Resolving PR metadata via gh: $PR_URL" >&2
77+
78+
# Derive base repo from PR URL (owner/repo)
79+
BASE_NWO=$(echo "$PR_URL" | awk -F/ '{print $4"/"$5}')
80+
# Head repo from PR data (may be same as base)
81+
HEAD_NWO=$(gh pr view "$PR_URL" --json headRepository -q '.headRepository.nameWithOwner')
82+
BASE_SHA=$(gh pr view "$PR_URL" --json baseRefOid -q .baseRefOid)
83+
HEAD_SHA=$(gh pr view "$PR_URL" --json headRefOid -q .headRefOid)
84+
85+
if [[ -z "$BASE_NWO" || -z "$BASE_SHA" || -z "$HEAD_SHA" ]]; then
86+
echo "Error: failed to resolve PR details via gh" >&2
87+
echo " PR: $PR_URL" >&2
88+
echo " base nwo: ${BASE_NWO:-<empty>}" >&2
89+
echo " base sha: ${BASE_SHA:-<empty>}" >&2
90+
echo " head nwo: ${HEAD_NWO:-<empty>} (may match base)" >&2
91+
echo " head sha: ${HEAD_SHA:-<empty>}" >&2
92+
echo "Hint:" >&2
93+
echo " - Commands used: 'gh pr view <url> --json baseRefOid,headRefOid,headRepository'" >&2
94+
exit 2
95+
fi
96+
97+
# Fallback: if HEAD_NWO is empty, assume same as base (same-repo PR)
98+
if [[ -z "$HEAD_NWO" ]]; then
99+
HEAD_NWO="$BASE_NWO"
100+
fi
101+
102+
BASE_URL="https://raw.githubusercontent.com/${BASE_NWO}/${BASE_SHA}/${REL_PATH}"
103+
HEAD_URL="https://raw.githubusercontent.com/${HEAD_NWO}/${HEAD_SHA}/${REL_PATH}"
104+
105+
echo "Base URL: $BASE_URL" >&2
106+
echo "Target URL: $HEAD_URL" >&2
107+
108+
# Locate or clone thecrawl
109+
resolve_crawler_path() {
110+
if [[ -n "$THECRAWL" ]]; then
111+
if [[ "$THECRAWL" =~ ^https?:// || "$THECRAWL" =~ ^git@ ]]; then
112+
local repo_path="${GITHUB_WORKSPACE:-$PWD}/.thecrawl"
113+
# For HTTPS URLs, allow trailing @ref
114+
local url_base="$THECRAWL"
115+
local ref=""
116+
if [[ "$url_base" =~ ^https?://.+@.+$ ]]; then
117+
ref="${url_base##*@}"
118+
url_base="${url_base%*@$ref}"
119+
fi
120+
121+
if [[ -d "$repo_path/.git" ]]; then
122+
# Existing clone: update remote and optionally checkout ref
123+
git -C "$repo_path" remote set-url origin "$url_base" >/dev/null 2>&1 || true
124+
if [[ -n "$ref" ]]; then
125+
echo "Checking out thecrawl ref '$ref' in $repo_path" >&2
126+
git -C "$repo_path" fetch --depth 1 origin "$ref" >&2
127+
git -C "$repo_path" checkout -q FETCH_HEAD >&2
128+
fi
129+
echo "$repo_path"; return
130+
fi
131+
132+
if [[ -n "$ref" ]]; then
133+
echo "Cloning thecrawl $url_base at ref '$ref' into $repo_path" >&2
134+
git init -q "$repo_path" >&2
135+
git -C "$repo_path" remote add origin "$url_base" >&2
136+
git -C "$repo_path" fetch --depth 1 origin "$ref" >&2
137+
git -C "$repo_path" checkout -q FETCH_HEAD >&2
138+
else
139+
echo "Cloning thecrawl from $url_base into $repo_path" >&2
140+
git clone --depth 1 "$url_base" "$repo_path" >&2
141+
fi
142+
echo "$repo_path"; return
143+
fi
144+
echo "$THECRAWL"; return
145+
fi
146+
echo "Error: could not resolve thecrawl path" >&2
147+
return 2
148+
}
149+
150+
CRAWLER_REPO=$(resolve_crawler_path)
151+
if [[ ! -d "$CRAWLER_REPO" ]]; then
152+
echo "Error: could not find or clone thecrawl" >&2
153+
exit 2
154+
fi
155+
156+
echo "Using thecrawl at: $CRAWLER_REPO" >&2
157+
158+
TMPDIR=$(mktemp -d)
159+
trap 'rm -rf "$TMPDIR"' EXIT
160+
161+
BASE_REG="$TMPDIR/base_registry.json"
162+
HEAD_REG="$TMPDIR/head_registry.json"
163+
164+
echo "Generating base registry…" >&2
165+
(cd "$CRAWLER_REPO" && uv run -m scripts.generate_registry -c "$BASE_URL" -o "$BASE_REG")
166+
167+
echo "Generating target registry…" >&2
168+
(cd "$CRAWLER_REPO" && uv run -m scripts.generate_registry -c "$HEAD_URL" -o "$HEAD_REG")
169+
170+
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
171+
# Invoke Python diff to print results and collect changed+added package names
172+
mapfile -t PKGS < <(python3 "$SCRIPT_DIR/diff_repository.py" --base-file "$BASE_REG" --target-file "$HEAD_REG" --print-changed-added \
173+
| tr -d '\r' \
174+
| sed '/^$/d')
175+
176+
if [[ ${#PKGS[@]} -eq 0 ]]; then
177+
echo "No changed or added packages to crawl." >&2
178+
exit 0
179+
fi
180+
181+
echo "Crawling ${#PKGS[@]} package(s) from target registry…" >&2
182+
failures=0
183+
for pkg in "${PKGS[@]}"; do
184+
[[ -z "$pkg" ]] && continue
185+
echo "- Crawling: $pkg" >&2
186+
# Use workspace file output for robust parsing
187+
wsdir="$TMPDIR/workspaces"
188+
mkdir -p "$wsdir"
189+
wsfile="$wsdir/${pkg}.json"
190+
set +e
191+
(cd "$CRAWLER_REPO" && uv run -m scripts.crawl --registry "$HEAD_REG" --workspace "$wsfile" --name "$pkg" 2> >(cat >&2))
192+
STATUS=$?
193+
set -e
194+
if [[ $STATUS -ne 0 || ! -s "$wsfile" ]]; then
195+
echo " ! Crawl failed for $pkg" >&2
196+
failures=$((failures+1))
197+
continue
198+
fi
199+
200+
# Extract release URLs (and versions) from workspace
201+
mapfile -t RELS < <(python3 "$SCRIPT_DIR/parse_workspace.py" "$wsfile" "$pkg")
202+
if [[ ${#RELS[@]} -eq 0 ]]; then
203+
echo " ! No releases found for $pkg" >&2
204+
failures=$((failures+1))
205+
continue
206+
fi
207+
208+
i=0
209+
for rec in "${RELS[@]}"; do
210+
url="${rec%%$'\t'*}"
211+
ver="${rec#*$'\t'}"
212+
# if no tab present, ver==url; fix that
213+
if [[ "$ver" == "$url" ]]; then ver=""; fi
214+
215+
i=$((i+1))
216+
disp_ver="$ver"
217+
[[ -z "$disp_ver" ]] && disp_ver="r$i"
218+
# sanitize for filesystem path
219+
safe_ver=$(printf "%s" "$disp_ver" | tr -d '\r' | sed 's/[^A-Za-z0-9._-]/_/g')
220+
221+
workdir="$TMPDIR/review/$pkg/$safe_ver"
222+
mkdir -p "$workdir"
223+
224+
zipfile="$workdir/pkg.zip"
225+
echo " Downloading release $disp_ver: $url" >&2
226+
if ! download_zip "$url" "$zipfile"; then
227+
echo " ! Download failed for $pkg@$disp_ver" >&2
228+
failures=$((failures+1))
229+
continue
230+
fi
231+
232+
echo " Unpacking…" >&2
233+
# Prefer unzip; fallback to Python zipfile
234+
if command -v unzip >/dev/null 2>&1; then
235+
if ! unzip -q -o "$zipfile" -d "$workdir"; then
236+
echo " ! Unzip failed for $pkg@$disp_ver" >&2
237+
failures=$((failures+1))
238+
continue
239+
fi
240+
else
241+
python3 - "$zipfile" "$workdir" <<'PY'
242+
import sys, zipfile, os
243+
zf = zipfile.ZipFile(sys.argv[1])
244+
zf.extractall(sys.argv[2])
245+
PY
246+
if [[ $? -ne 0 ]]; then
247+
echo " ! Unzip failed for $pkg@$disp_ver (python)" >&2
248+
failures=$((failures+1))
249+
continue
250+
fi
251+
fi
252+
253+
# Determine the top-level extracted directory
254+
topdir=$(find "$workdir" -mindepth 1 -maxdepth 1 -type d | head -n1)
255+
if [[ -z "$topdir" ]]; then
256+
echo " ! Could not locate extracted folder for $pkg@$disp_ver" >&2
257+
failures=$((failures+1))
258+
continue
259+
fi
260+
261+
echo " Reviewing with st_package_reviewer: $topdir" >&2
262+
if ! uv run st_package_reviewer "$topdir"; then
263+
echo " ! Review failed for $pkg@$disp_ver" >&2
264+
failures=$((failures+1))
265+
continue
266+
fi
267+
done
268+
done
269+
270+
if [[ $failures -gt 0 ]]; then
271+
echo "Completed crawling with $failures failure(s)." >&2
272+
exit 1
273+
else
274+
echo "Completed crawling successfully." >&2
275+
exit 0
276+
fi

gh_action/action.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: Package Reviewer PR action
2+
description: Review a repository or channel PR
3+
inputs:
4+
pr:
5+
description: Pull Request URL (e.g. https://github.com/wbond/package_control_channel/pull/123)
6+
required: true
7+
file:
8+
description: Path to channel JSON within the repo
9+
default: repository.json
10+
required: false
11+
thecrawl:
12+
description: "Optional path to a local thecrawl repo, or a URL to clone (supports @ref to pin)"
13+
default: https://github.com/packagecontrol/thecrawl
14+
required: false
15+
runs:
16+
using: composite
17+
steps:
18+
- name: Ensure uv is available
19+
uses: astral-sh/setup-uv@v3
20+
21+
- name: Run Package Reviewer
22+
shell: bash
23+
run: |
24+
set -euo pipefail
25+
"${{ github.action_path }}/action.sh" --pr "${{ inputs.pr }}" --file "${{ inputs.file }}" --thecrawl "${{ inputs.thecrawl }}"

0 commit comments

Comments
 (0)