Skip to content

Commit e24471e

Browse files
committed
Initial commit
1 parent 6608f30 commit e24471e

File tree

63 files changed

+3151
-507
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+3151
-507
lines changed
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# Restore Checkpoint Chain Action
2+
3+
A reusable GitHub Actions composite action for restoring build outputs from checkpoint tarballs using progressive restoration.
4+
5+
## Purpose
6+
7+
When using checkpoint-based builds with GitHub Actions cache:
8+
1. **On first run**: Build creates checkpoint tarballs in `build/{mode}/checkpoints/` and `build/shared/checkpoints/`
9+
2. **On cache hit**: Build may be skipped to save time if latest checkpoint exists
10+
3. **Progressive restoration**: Walks backward through checkpoint chain to find latest valid checkpoint
11+
4. **Resumable builds**: Restores from any checkpoint and resumes building remaining checkpoints
12+
13+
## Usage
14+
15+
```yaml
16+
- name: Restore build output from checkpoint chain
17+
id: restore-checkpoint
18+
uses: ./.github/actions/restore-checkpoint
19+
with:
20+
package-name: 'onnxruntime-builder'
21+
build-mode: ${{ steps.build-mode.outputs.mode }}
22+
checkpoint-chain: 'finalized,wasm-synced,wasm-released,wasm-compiled,source-cloned'
23+
cache-hit: ${{ steps.checkpoint-cache.outputs.cache-hit }}
24+
cache-valid: ${{ steps.validate-cache.outputs.cache_valid }}
25+
26+
- name: Build (if needed)
27+
if: steps.restore-checkpoint.outputs.needs-build == 'true'
28+
run: pnpm --filter onnxruntime-builder build --prod
29+
```
30+
31+
## Inputs
32+
33+
| Input | Required | Description | Example |
34+
|-------|----------|-------------|---------|
35+
| `package-name` | Yes | Package name in `packages/` directory | `onnxruntime-builder` |
36+
| `build-mode` | Yes | Build mode (dev or prod) | `prod` |
37+
| `checkpoint-chain` | Yes | Comma-separated list of checkpoints (newest to oldest) | `finalized,wasm-synced,wasm-compiled` |
38+
| `cache-hit` | Yes | Whether checkpoint cache was hit (`true`/`false`) | `${{ steps.cache.outputs.cache-hit }}` |
39+
| `cache-valid` | Yes | Whether checkpoint validation passed (`true`/`false`) | `${{ steps.validate.outputs.cache_valid }}` |
40+
41+
## Outputs
42+
43+
| Output | Description | Values |
44+
|--------|-------------|--------|
45+
| `restored` | Whether any checkpoint was successfully restored | `true` or `false` |
46+
| `checkpoint-restored` | Name of the checkpoint that was restored | Checkpoint name or empty |
47+
| `checkpoint-index` | Index of restored checkpoint in chain | `0` (newest) to `N-1` (oldest), or `-1` if none |
48+
| `needs-build` | Whether build needs to run to complete remaining checkpoints | `true` or `false` |
49+
50+
## How It Works
51+
52+
### Progressive Restoration Algorithm
53+
54+
1. **Parse checkpoint chain**: Splits comma-separated list into array
55+
2. **Walk backward** through chain (newest → oldest):
56+
- Check if checkpoint exists
57+
- Verify tarball integrity
58+
- If valid, restore and break
59+
3. **Extract checkpoint** to output directory
60+
4. **Determine if build needed**:
61+
- Index 0 (newest): Build can be skipped
62+
- Index > 0 (older): Build must run to complete remaining checkpoints
63+
64+
### Checkpoint Locations
65+
66+
- **Shared checkpoints**: `build/shared/checkpoints/` (e.g., `source-cloned`)
67+
- **Mode-specific checkpoints**: `build/{mode}/checkpoints/` (e.g., `finalized`, `wasm-compiled`)
68+
69+
Currently only `source-cloned` is shared across dev/prod modes.
70+
71+
## Example Scenarios
72+
73+
### Scenario 1: Complete Cache (finalized found)
74+
```
75+
Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned
76+
Found: finalized (index 0)
77+
Result: restored=true, needs-build=false
78+
Action: Skip build entirely
79+
```
80+
81+
### Scenario 2: Partial Cache (wasm-compiled found)
82+
```
83+
Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned
84+
Found: wasm-compiled (index 2)
85+
Result: restored=true, needs-build=true
86+
Action: Build runs to create wasm-synced → finalized
87+
```
88+
89+
### Scenario 3: Early Cache (source-cloned found)
90+
```
91+
Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned
92+
Found: source-cloned (index 3)
93+
Result: restored=true, needs-build=true
94+
Action: Build runs to create wasm-compiled → wasm-synced → finalized
95+
```
96+
97+
### Scenario 4: No Cache
98+
```
99+
Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned
100+
Found: none
101+
Result: restored=false, needs-build=true
102+
Action: Build runs from scratch
103+
```
104+
105+
## Package-Specific Checkpoint Chains
106+
107+
| Package | Checkpoint Chain |
108+
|---------|------------------|
109+
| **ONNX Runtime** (dev) | `finalized,wasm-synced,wasm-released,wasm-compiled,source-cloned` |
110+
| **ONNX Runtime** (prod) | `finalized,wasm-synced,wasm-optimized,wasm-released,wasm-compiled,source-cloned` |
111+
| **Yoga Layout** (dev) | `finalized,wasm-synced,wasm-released,wasm-compiled,source-configured,source-cloned` |
112+
| **Yoga Layout** (prod) | `finalized,wasm-synced,wasm-optimized,wasm-released,wasm-compiled,source-configured,source-cloned` |
113+
| **Models** | `finalized,quantized,converted,downloaded` |
114+
| **Node.js Smol** | `finalized,binary-compressed,binary-stripped,binary-released,source-patched,source-cloned` |
115+
116+
Note: ONNX and Yoga include `wasm-optimized` only in prod mode.
117+
118+
## Expected Checkpoint Structure
119+
120+
Checkpoints should contain a `Final/` directory with build outputs:
121+
122+
```
123+
finalized.tar.gz
124+
└── Final/
125+
├── output.wasm
126+
├── output.mjs
127+
└── output.js
128+
```
129+
130+
## Error Handling
131+
132+
The action will fail with detailed error messages if:
133+
- No valid checkpoints found in chain
134+
- All tarballs are corrupted
135+
- Extraction fails
136+
- Output directory is invalid
137+
138+
## Benefits
139+
140+
### Progressive Restoration
141+
- **Partial cache hits are useful**: Don't waste intermediate checkpoints
142+
- **Resumable builds**: Continue from any point in the pipeline
143+
- **Faster iterations**: Skip completed phases even if final checkpoint is missing
144+
145+
### Consistency
146+
- **Single restoration logic**: Shared across all workflows
147+
- **Maintainability**: Update in one place
148+
- **Debugging**: Detailed logging shows which checkpoint was used
149+
150+
### Efficiency
151+
- **Maximize cache utilization**: Use any valid checkpoint, not just the final one
152+
- **Reduce build times**: Skip unnecessary rebuild of early phases
153+
- **CI cost savings**: Less compute time = lower costs
154+
155+
## Migration Notes
156+
157+
This action replaced the older single-checkpoint restoration pattern. All packages now use progressive restoration with standardized checkpoint naming:
158+
159+
- All final checkpoints are named **`finalized`** (previously `wasm-finalized`, `quantized`, etc.)
160+
- All restoration happens through checkpoint chains
161+
- No separate Final output caches (checkpoint-only caching)
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
name: 'Restore Checkpoint Chain'
2+
description: 'Restores build output from checkpoint tarballs, walking backward to find the latest valid checkpoint'
3+
author: 'Socket Security'
4+
5+
inputs:
6+
package-name:
7+
description: 'Package name (e.g., onnxruntime-builder, yoga-layout-builder, models, node-smol-builder)'
8+
required: true
9+
build-mode:
10+
description: 'Build mode (dev or prod)'
11+
required: true
12+
checkpoint-chain:
13+
description: 'Ordered list of checkpoints to try (newest to oldest), comma-separated (e.g., "finalized,wasm-synced,wasm-released,wasm-compiled")'
14+
required: true
15+
cache-hit:
16+
description: 'Whether checkpoint cache was hit (true/false)'
17+
required: true
18+
cache-valid:
19+
description: 'Whether checkpoint cache validation passed (true/false)'
20+
required: true
21+
22+
outputs:
23+
restored:
24+
description: 'Whether any checkpoint was restored (true/false)'
25+
value: ${{ steps.restore.outputs.restored }}
26+
checkpoint-restored:
27+
description: 'Name of the checkpoint that was restored (empty if none)'
28+
value: ${{ steps.restore.outputs.checkpoint_restored }}
29+
checkpoint-index:
30+
description: 'Index of restored checkpoint (0=newest, higher=older)'
31+
value: ${{ steps.restore.outputs.checkpoint_index }}
32+
needs-build:
33+
description: 'Whether build needs to run to complete remaining checkpoints (true/false)'
34+
value: ${{ steps.restore.outputs.needs_build }}
35+
36+
runs:
37+
using: 'composite'
38+
steps:
39+
- name: Restore build output from checkpoint chain
40+
id: restore
41+
if: inputs.cache-hit == 'true' && inputs.cache-valid == 'true'
42+
shell: bash
43+
run: |
44+
set -e
45+
echo "🔄 Restoring build output from checkpoint chain..."
46+
echo ""
47+
48+
PACKAGE_NAME="${{ inputs.package-name }}"
49+
BUILD_MODE="${{ inputs.build-mode }}"
50+
CHECKPOINT_CHAIN="${{ inputs.checkpoint-chain }}"
51+
52+
MODE_CHECKPOINT_DIR="packages/${PACKAGE_NAME}/build/${BUILD_MODE}/checkpoints"
53+
SHARED_CHECKPOINT_DIR="packages/${PACKAGE_NAME}/build/shared/checkpoints"
54+
OUTPUT_DIR="packages/${PACKAGE_NAME}/build/${BUILD_MODE}/out"
55+
56+
echo "📦 Package: ${PACKAGE_NAME}"
57+
echo "🔧 Build mode: ${BUILD_MODE}"
58+
echo "📁 Mode checkpoint directory: ${MODE_CHECKPOINT_DIR}"
59+
echo "📁 Shared checkpoint directory: ${SHARED_CHECKPOINT_DIR}"
60+
echo "📤 Output directory: ${OUTPUT_DIR}"
61+
echo ""
62+
63+
# Parse checkpoint chain into array (comma-separated)
64+
IFS=',' read -ra CHECKPOINTS <<< "$CHECKPOINT_CHAIN"
65+
66+
echo "🔗 Checkpoint chain (newest → oldest):"
67+
INDEX=0
68+
for CHECKPOINT in "${CHECKPOINTS[@]}"; do
69+
echo " [$INDEX] ${CHECKPOINT}"
70+
INDEX=$((INDEX + 1))
71+
done
72+
echo ""
73+
74+
# Walk backward through checkpoint chain to find latest valid one
75+
RESTORED_CHECKPOINT=""
76+
RESTORED_INDEX=-1
77+
78+
INDEX=0
79+
for CHECKPOINT in "${CHECKPOINTS[@]}"; do
80+
# source-cloned is in shared directory, all others are mode-specific
81+
if [ "${CHECKPOINT}" = "source-cloned" ]; then
82+
CHECKPOINT_FILE="${SHARED_CHECKPOINT_DIR}/${CHECKPOINT}.tar.gz"
83+
else
84+
CHECKPOINT_FILE="${MODE_CHECKPOINT_DIR}/${CHECKPOINT}.tar.gz"
85+
fi
86+
87+
echo "🔍 Checking checkpoint [$INDEX]: ${CHECKPOINT}"
88+
89+
# Check if checkpoint exists
90+
if [ ! -f "${CHECKPOINT_FILE}" ]; then
91+
echo " ⏭️ Not found, trying next..."
92+
INDEX=$((INDEX + 1))
93+
continue
94+
fi
95+
96+
echo " ✓ Found: ${CHECKPOINT_FILE}"
97+
98+
# Verify tarball integrity
99+
if ! gzip -t "${CHECKPOINT_FILE}" 2>/dev/null; then
100+
echo " ⚠️ Corrupted, trying next..."
101+
INDEX=$((INDEX + 1))
102+
continue
103+
fi
104+
105+
echo " ✓ Integrity verified"
106+
107+
# This is our restoration point!
108+
RESTORED_CHECKPOINT="${CHECKPOINT}"
109+
RESTORED_INDEX=${INDEX}
110+
111+
echo ""
112+
echo "✅ Found valid checkpoint: ${CHECKPOINT} (index ${INDEX})"
113+
break
114+
115+
INDEX=$((INDEX + 1))
116+
done
117+
118+
# Check if we found any checkpoint
119+
if [ -z "${RESTORED_CHECKPOINT}" ]; then
120+
echo ""
121+
echo "❌ No valid checkpoints found in chain"
122+
echo " Available mode checkpoints:"
123+
ls -lh "${MODE_CHECKPOINT_DIR}" 2>/dev/null || echo " (mode checkpoint directory not found)"
124+
echo " Available shared checkpoints:"
125+
ls -lh "${SHARED_CHECKPOINT_DIR}" 2>/dev/null || echo " (shared checkpoint directory not found)"
126+
echo ""
127+
echo "restored=false" >> $GITHUB_OUTPUT
128+
echo "checkpoint_restored=" >> $GITHUB_OUTPUT
129+
echo "checkpoint_index=-1" >> $GITHUB_OUTPUT
130+
echo "needs_build=true" >> $GITHUB_OUTPUT
131+
exit 1
132+
fi
133+
134+
echo ""
135+
echo "📦 Restoring from checkpoint: ${RESTORED_CHECKPOINT}"
136+
echo ""
137+
138+
# Show tarball contents
139+
if [ "${RESTORED_CHECKPOINT}" = "source-cloned" ]; then
140+
CHECKPOINT_FILE="${SHARED_CHECKPOINT_DIR}/${RESTORED_CHECKPOINT}.tar.gz"
141+
else
142+
CHECKPOINT_FILE="${MODE_CHECKPOINT_DIR}/${RESTORED_CHECKPOINT}.tar.gz"
143+
fi
144+
echo "📋 Checkpoint contents:"
145+
tar -tzf "${CHECKPOINT_FILE}" | head -20
146+
TOTAL_FILES=$(tar -tzf "${CHECKPOINT_FILE}" | wc -l | tr -d ' ')
147+
if [ "${TOTAL_FILES}" -gt 20 ]; then
148+
echo "... (${TOTAL_FILES} total files)"
149+
fi
150+
echo ""
151+
152+
# Extract checkpoint
153+
echo "📦 Extracting checkpoint to ${OUTPUT_DIR}..."
154+
mkdir -p "${OUTPUT_DIR}"
155+
tar -xzf "${CHECKPOINT_FILE}" -C "${OUTPUT_DIR}"
156+
echo "✅ Checkpoint extracted successfully"
157+
echo ""
158+
159+
# Determine if build needs to run
160+
NEEDS_BUILD="false"
161+
if [ ${RESTORED_INDEX} -gt 0 ]; then
162+
NEEDS_BUILD="true"
163+
echo "⚙️ Build will run to complete remaining checkpoints:"
164+
REMAINING_INDEX=0
165+
for CHECKPOINT in "${CHECKPOINTS[@]}"; do
166+
if [ ${REMAINING_INDEX} -lt ${RESTORED_INDEX} ]; then
167+
echo " • ${CHECKPOINT} (will be created)"
168+
fi
169+
REMAINING_INDEX=$((REMAINING_INDEX + 1))
170+
done
171+
else
172+
echo "✅ Latest checkpoint restored - build can be skipped"
173+
fi
174+
echo ""
175+
176+
# Check what was extracted
177+
if [ -d "${OUTPUT_DIR}" ]; then
178+
echo "📁 Extracted output:"
179+
find "${OUTPUT_DIR}" -type f | head -20
180+
FILES_COUNT=$(find "${OUTPUT_DIR}" -type f | wc -l | tr -d ' ')
181+
if [ "${FILES_COUNT}" -gt 20 ]; then
182+
echo "... (${FILES_COUNT} total files)"
183+
fi
184+
echo ""
185+
fi
186+
187+
echo "✅ Restoration complete"
188+
echo " Checkpoint: ${RESTORED_CHECKPOINT}"
189+
echo " Index: ${RESTORED_INDEX} (0=newest, ${#CHECKPOINTS[@]}-1=oldest)"
190+
echo " Needs build: ${NEEDS_BUILD}"
191+
echo ""
192+
193+
echo "restored=true" >> $GITHUB_OUTPUT
194+
echo "checkpoint_restored=${RESTORED_CHECKPOINT}" >> $GITHUB_OUTPUT
195+
echo "checkpoint_index=${RESTORED_INDEX}" >> $GITHUB_OUTPUT
196+
echo "needs_build=${NEEDS_BUILD}" >> $GITHUB_OUTPUT
197+
198+
- name: Skip restoration (build will run from scratch)
199+
if: inputs.cache-hit != 'true' || inputs.cache-valid != 'true'
200+
shell: bash
201+
run: |
202+
echo "⏭️ Skipping checkpoint restoration (build will run from scratch)"
203+
echo " Cache hit: ${{ inputs.cache-hit }}"
204+
echo " Cache valid: ${{ inputs.cache-valid }}"
205+
echo ""
206+
echo "restored=false" >> $GITHUB_OUTPUT
207+
echo "checkpoint_restored=" >> $GITHUB_OUTPUT
208+
echo "checkpoint_index=-1" >> $GITHUB_OUTPUT
209+
echo "needs_build=true" >> $GITHUB_OUTPUT

0 commit comments

Comments
 (0)