Skip to content

Commit dff7c70

Browse files
authored
initial commit
1 parent 1219e34 commit dff7c70

File tree

2 files changed

+313
-1
lines changed

2 files changed

+313
-1
lines changed

.github/workflows/metricx.yml

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
name: Record clone metrics
2+
3+
on:
4+
workflow_call:
5+
inputs:
6+
observed-repo:
7+
description: "Repository to fetch clone metrics from (OWNER/REPO)"
8+
required: false
9+
default: ${{ github.repository }}
10+
type: string
11+
metrics-repo:
12+
description: "Target repo for storing metrics (OWNER/REPO)"
13+
required: true
14+
type: string
15+
secrets:
16+
METRICS_PAT:
17+
required: true
18+
19+
permissions:
20+
contents: read
21+
22+
jobs:
23+
record:
24+
runs-on: ubuntu-latest
25+
steps:
26+
- name: Install jq
27+
run: sudo apt-get update && sudo apt-get install -y jq
28+
29+
- name: Fetch clone stats
30+
env:
31+
GH_TOKEN: ${{ secrets.METRICS_PAT }}
32+
OBSERVED_REPO: ${{ inputs.observed-repo }}
33+
run: |
34+
curl -L \
35+
-D headers.txt \
36+
-o clones.json \
37+
-H "Authorization: Bearer $GH_TOKEN" \
38+
-H "Accept: application/vnd.github+json" \
39+
-H "X-GitHub-Api-Version: 2022-11-28" \
40+
https://api.github.com/repos/$OBSERVED_REPO/traffic/clones
41+
42+
CLONES=$(jq -r '.count // 0' clones.json)
43+
UNIQUES=$(jq -r '.uniques // 0' clones.json)
44+
45+
echo "CLONES=$CLONES" >> $GITHUB_ENV
46+
echo "UNIQUES=$UNIQUES" >> $GITHUB_ENV
47+
48+
- name: Debug auth identity
49+
env:
50+
GH_TOKEN: ${{ secrets.METRICS_PAT }}
51+
run: |
52+
curl -s \
53+
-H "Authorization: Bearer $GH_TOKEN" \
54+
https://api.github.com/user | jq '{login, id}'
55+
56+
- name: Debug repo access
57+
env:
58+
GH_TOKEN: ${{ secrets.METRICS_PAT }}
59+
OBSERVED_REPO: ${{ inputs.observed-repo }}
60+
run: |
61+
curl -s \
62+
-H "Authorization: Bearer $GH_TOKEN" \
63+
https://api.github.com/repos/$OBSERVED_REPO | jq '{full_name, private}'
64+
65+
- name: Debug clone API
66+
run: |
67+
echo "===== clones.json ====="
68+
cat clones.json
69+
echo "======================"
70+
71+
72+
- name: Append and Deduplicate metrics CSV
73+
run: |
74+
set -e
75+
76+
# 1. Setup Repo
77+
git clone https://x-access-token:${{ secrets.METRICS_PAT }}@github.com/${{ inputs.metrics-repo }}.git metrics-temp
78+
cd metrics-temp
79+
git config user.name "metrics-bot"
80+
git config user.email "[email protected]"
81+
82+
# 2. Setup File
83+
FILE="data/$(echo '${{ github.repository }}' | tr '/' '__').csv"
84+
mkdir -p data
85+
if [ ! -f "$FILE" ]; then
86+
echo "date,repository,clones,uniques" > "$FILE"
87+
fi
88+
89+
REPO="${{ github.repository }}"
90+
91+
# 3. GET THE LATEST DATE FROM JSON
92+
# This looks at the last entry in the clones array and grabs the first 10 chars (YYYY-MM-DD)
93+
# Note: the last recorded date is not necessarily today's date
94+
LATEST_JSON_DATE=$(jq -r '.clones[-1].timestamp[0:10]' ../clones.json)
95+
96+
# If for some reason the JSON is empty, fallback to system date to avoid errors
97+
if [ "$LATEST_JSON_DATE" == "null" ] || [ -z "$LATEST_JSON_DATE" ]; then
98+
LATEST_JSON_DATE=$(date +%F)
99+
fi
100+
101+
# 4. APPEND ALL DATA (Raw)
102+
# Get daily stats from clones.json and append to $FILE
103+
jq -r --arg REPO "$REPO" '.clones[] | [.timestamp[0:10], $REPO, .count, .uniques] | @csv' ../clones.json >> "$FILE"
104+
# Append the summary using the LATEST_JSON_DATE and the Tilde (~) for sorting
105+
jq -r --arg REPO "$REPO" --arg DATE "$LATEST_JSON_DATE" \
106+
'[ "\($DATE)~ 14-day total", $REPO, .count, .uniques ] | @csv' ../clones.json >> "$FILE"
107+
108+
# 5. DEDUPLICATE AND SORT DESCENDING (The "Magic" Step)
109+
# Check if the first line is actually the header. If not, we add it.
110+
if ! head -n 1 "$FILE" | grep -q "^date,"; then
111+
# No header found? Prepend it to the file.
112+
SED_HEADER="date,repository,clones,uniques"
113+
sed -i "1i $SED_HEADER" "$FILE"
114+
fi
115+
116+
# Save the header to the tmp file first
117+
head -n 1 "$FILE" > "$FILE.tmp"
118+
# take everything EXCEPT the header (tail -n +2), deduplicate with awk, and sort desc
119+
# -F, : Use comma as separator
120+
tail -n +2 "$FILE" | awk -F, '{a[$1,$2]=$0} END {for (i in a) print a[i]}' | sort -t, -k1,1r >> "$FILE.tmp"
121+
122+
mv "$FILE.tmp" "$FILE"
123+
124+
# 6. Commit and Push
125+
git add "$FILE"
126+
git commit -m "metrics: record and deduplicate stats for $REPO" || true
127+
git push
128+
129+

README.md

Lines changed: 184 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,185 @@
1-
# github-clone-archiver
1+
# Metrics Workflow 📊
2+
23
This is a reusable GitHub Actions workflow designed to automate the collection and archival of repository clone statistics. It solves the "14-day limit" problem of GitHub's native traffic insights by persisting data into a central repository.
4+
5+
## 🏗 Architecture
6+
7+
The system uses a **three-repo architecture** to maintain security and organization:
8+
9+
1. **Workflows Repo (`metrics-workflows`)**: Central hub containing the reusable logic (`metrics.yml`).
10+
2. **Observed Repo(s)**: The repositories being tracked. Each triggers the workflow on a schedule.
11+
3. **Observer Repo (`metrics-database`)**: The central storage where `.csv` files are maintained and updated.
12+
13+
---
14+
15+
### 🔍 How it Works
16+
17+
GitHub only keeps traffic data (clones and visitors) for **14 days**. This workflow acts as a "Data Logger":
18+
19+
1. It wakes up every day and asks the GitHub API for the clone history of the **Observed Repo**.
20+
2. It compares this data with the existing logs in your **Observer Repo**.
21+
3. It appends only the most recent data and updates the "14-day total" summary.
22+
4. It deduplicates the file and sorts it so the most recent stats are always at the top.
23+
24+
> [!IMPORTANT]
25+
> **Access Requirement:** You must be the owner or a collaborator with appropriate permissions on the **Observed Repos**. This workflow requires an authorized Personal Access Token (PAT) to read traffic data that is otherwise hidden from the public.
26+
27+
28+
---
29+
30+
## 🔐 Security & Permissions
31+
32+
To allow a workflow running in an **Observed Repo** to write data to the **Observer Repo**, specific permissions must be configured via a Fine-Grained Personal Access Token (PAT).
33+
34+
### 1. The Personal Access Token (PAT)
35+
36+
Create a token named **"Metrics Workflow"** in your [Developer Settings](https://github.com/settings/tokens?type=beta) under Personal Access Tokens/Fine-grained tokens:
37+
38+
* **Repository access**:
39+
* Select **Only select repositories**.
40+
* Include all **Observed Repositories** AND the **Observer Repository**.
41+
42+
43+
* **Permissions**:
44+
* `Administration`: Read-only (required for some traffic API metadata).
45+
* `Metadata`: Read-only.
46+
* `Contents`: **Read & Write** (Required to push `.csv` updates).
47+
48+
49+
50+
### 2. Repository Secrets
51+
52+
In **each** Observed Repo (Settings > Secrets and variables > Actions), add the following secret:
53+
54+
* **Name**: `METRICS_PAT`
55+
* **Value**: Paste the token generated above.
56+
57+
> [!NOTE]
58+
> While it may seem like the Observer repo doesn't need a "separate" PAT, it is actually covered by the **"Metrics Workflow" PAT** you created. Because that single token has "Write" access to the Observer repo, it can push the data once the workflow finishes gathering it.
59+
60+
---
61+
62+
## 🚀 Usage
63+
64+
To implement this in an **Observed Repo**, create a file at `.github/workflows/metrics.yml`:
65+
66+
```yaml
67+
name: Collect Metrics
68+
69+
on:
70+
schedule:
71+
- cron: '0 0 * * *' # Runs daily at midnight
72+
workflow_dispatch: # Allows manual triggering
73+
74+
jobs:
75+
update-metrics:
76+
# This is required for the runner to operate within the Observed repo
77+
permissions:
78+
contents: read
79+
uses: myspace/workflows-repo/.github/workflows/metrics.yml@main
80+
with:
81+
metrics-repo: myspace/observer-repo
82+
secrets:
83+
METRICS_PAT: ${{ secrets.METRICS_PAT }}
84+
85+
```
86+
87+
This is your caller workflow for the observed repository.
88+
89+
---
90+
91+
## 📈 Data Structure
92+
93+
The workflow generates/updates a CSV file named after the repository (e.g., `myspace_observed-repo.csv`).
94+
95+
### Sorting Logic:
96+
97+
The CSV is automatically deduplicated and sorted in **reverse chronological order** (newest first).
98+
99+
* **Daily Stats**: Recorded as `YYYY-MM-DD`.
100+
* **14-day Totals**: Recorded as `YYYY-MM-DD~ 14-day total`.
101+
102+
The use of the tilde (`~`) ensures that in a descending sort, the **Total** summary for a specific day appears immediately **above** the individual daily stats for that same day.
103+
104+
---
105+
106+
## 🛠 Maintenance
107+
108+
* **Adding Repos**: To track a new repository, simply add the `METRICS_PAT` secret to the new repo and create the caller workflow.
109+
* **Data Integrity**: The workflow uses `awk` to ensure that if it runs multiple times in one day, only the most recent (most complete) data point is saved, preventing duplicates.
110+
111+
---
112+
113+
## 🛡️🔐 Single Token vs. High Security
114+
115+
This workflow requires cross-repository permissions. You can choose between a **Standard** setup (the current setup, easier to maintain) or a **High Security** setup (follows the Principle of Least Privilege).
116+
117+
### Option 1: Standard Setup (Single Token)
118+
119+
Recommended for solo developers or small setups.
120+
121+
* **Token Name**: `Metrics-Unified-Token`
122+
* **Scope**: All Observed Repos **AND** the Observer Repo.
123+
* **Permissions**:
124+
* `Metadata`: Read-only
125+
* `Administration`: Read-only
126+
* `Contents`: **Read & Write**
127+
128+
129+
* **Workflow Secret**: Store as `METRICS_PAT` in all Observed repos.
130+
131+
### Option 2: High Security Setup (Dual Token)
132+
133+
Recommended for teams or sensitive source code. This ensures the "Writer" token cannot be used to modify source code in your Observed repositories.
134+
135+
#### A. The "Traffic Reader" Token
136+
137+
* **Scope**: All **Observed Repos** only.
138+
* **Permissions**: `Metadata` (Read), `Administration` (Read).
139+
* **Usage**: Used by the workflow to fetch clone data from the GitHub API.
140+
* **Secret Name**: `READER_PAT`
141+
142+
#### B. The "Database Writer" Token
143+
144+
* **Scope**: The **Observer Repo** only.
145+
* **Permissions**: `Contents` (Read & Write).
146+
* **Usage**: Used by the workflow to `git push` the CSV file.
147+
* **Secret Name**: `WRITER_PAT`
148+
149+
---
150+
151+
## 🛡️🚀 Usage (High Security Example)
152+
153+
If you choose the **High Security** route, update your caller workflow in the Observed Repo as follows:
154+
155+
```yaml
156+
jobs:
157+
update-metrics:
158+
uses: myspace/workflows-repo/.github/workflows/metrics.yml@main
159+
with:
160+
metrics-repo: myspace/observer-repo
161+
secrets:
162+
# We pass the Writer token to the reusable workflow
163+
# so it can push to the central database repo
164+
METRICS_PAT: ${{ secrets.WRITER_PAT }}
165+
166+
```
167+
168+
> [!TIP]
169+
> **Why do we pass the Writer token?** > The GitHub Actions default `GITHUB_TOKEN` can read the current repo's traffic. By passing the `WRITER_PAT` as the `METRICS_PAT` secret, the workflow gains the specific authority needed to write to the **Observer Repo** without needing permission to write to your source code.
170+
171+
---
172+
173+
## 🛡️🗂️ Permission Table Reference
174+
175+
| Permission | Requirement | Reason |
176+
| --- | --- | --- |
177+
| `Metadata` | Read | Basic repository access |
178+
| `Administration` | Read | Required to access `/traffic/clones` API |
179+
| `Contents` | Read/Write | Required to push `.csv` changes to Observer Repo |
180+
181+
---
182+
183+
### How to verify your permissions
184+
185+
If the workflow fails with a `403 Forbidden` error during the **push** phase, check that your PAT (the one passed to `METRICS_PAT`) has `Contents: Write` access specifically for the **Observer Repo**.

0 commit comments

Comments
 (0)