|
| 1 | +# Restore Checkpoint Chain Action |
| 2 | + |
| 3 | +A reusable GitHub Actions composite action for restoring build outputs from checkpoint tarballs using progressive restoration. |
| 4 | + |
| 5 | +## Purpose |
| 6 | + |
| 7 | +When using checkpoint-based builds with GitHub Actions cache: |
| 8 | +1. **On first run**: Build creates checkpoint tarballs in `build/{mode}/checkpoints/` and `build/shared/checkpoints/` |
| 9 | +2. **On cache hit**: Build may be skipped to save time if latest checkpoint exists |
| 10 | +3. **Progressive restoration**: Walks backward through checkpoint chain to find latest valid checkpoint |
| 11 | +4. **Resumable builds**: Restores from any checkpoint and resumes building remaining checkpoints |
| 12 | + |
| 13 | +## Usage |
| 14 | + |
| 15 | +```yaml |
| 16 | +- name: Restore build output from checkpoint chain |
| 17 | + id: restore-checkpoint |
| 18 | + uses: ./.github/actions/restore-checkpoint |
| 19 | + with: |
| 20 | + package-name: 'onnxruntime-builder' |
| 21 | + build-mode: ${{ steps.build-mode.outputs.mode }} |
| 22 | + checkpoint-chain: 'finalized,wasm-synced,wasm-released,wasm-compiled,source-cloned' |
| 23 | + cache-hit: ${{ steps.checkpoint-cache.outputs.cache-hit }} |
| 24 | + cache-valid: ${{ steps.validate-cache.outputs.cache_valid }} |
| 25 | + |
| 26 | +- name: Build (if needed) |
| 27 | + if: steps.restore-checkpoint.outputs.needs-build == 'true' |
| 28 | + run: pnpm --filter onnxruntime-builder build --prod |
| 29 | +``` |
| 30 | +
|
| 31 | +## Inputs |
| 32 | +
|
| 33 | +| Input | Required | Description | Example | |
| 34 | +|-------|----------|-------------|---------| |
| 35 | +| `package-name` | Yes | Package name in `packages/` directory | `onnxruntime-builder` | |
| 36 | +| `build-mode` | Yes | Build mode (dev or prod) | `prod` | |
| 37 | +| `checkpoint-chain` | Yes | Comma-separated list of checkpoints (newest to oldest) | `finalized,wasm-synced,wasm-compiled` | |
| 38 | +| `cache-hit` | Yes | Whether checkpoint cache was hit (`true`/`false`) | `${{ steps.cache.outputs.cache-hit }}` | |
| 39 | +| `cache-valid` | Yes | Whether checkpoint validation passed (`true`/`false`) | `${{ steps.validate.outputs.cache_valid }}` | |
| 40 | + |
| 41 | +## Outputs |
| 42 | + |
| 43 | +| Output | Description | Values | |
| 44 | +|--------|-------------|--------| |
| 45 | +| `restored` | Whether any checkpoint was successfully restored | `true` or `false` | |
| 46 | +| `checkpoint-restored` | Name of the checkpoint that was restored | Checkpoint name or empty | |
| 47 | +| `checkpoint-index` | Index of restored checkpoint in chain | `0` (newest) to `N-1` (oldest), or `-1` if none | |
| 48 | +| `needs-build` | Whether build needs to run to complete remaining checkpoints | `true` or `false` | |
| 49 | + |
| 50 | +## How It Works |
| 51 | + |
| 52 | +### Progressive Restoration Algorithm |
| 53 | + |
| 54 | +1. **Parse checkpoint chain**: Splits comma-separated list into array |
| 55 | +2. **Walk backward** through chain (newest → oldest): |
| 56 | + - Check if checkpoint exists |
| 57 | + - Verify tarball integrity |
| 58 | + - If valid, restore and break |
| 59 | +3. **Extract checkpoint** to output directory |
| 60 | +4. **Determine if build needed**: |
| 61 | + - Index 0 (newest): Build can be skipped |
| 62 | + - Index > 0 (older): Build must run to complete remaining checkpoints |
| 63 | + |
| 64 | +### Checkpoint Locations |
| 65 | + |
| 66 | +- **Shared checkpoints**: `build/shared/checkpoints/` (e.g., `source-cloned`) |
| 67 | +- **Mode-specific checkpoints**: `build/{mode}/checkpoints/` (e.g., `finalized`, `wasm-compiled`) |
| 68 | + |
| 69 | +Currently only `source-cloned` is shared across dev/prod modes. |
| 70 | + |
| 71 | +## Example Scenarios |
| 72 | + |
| 73 | +### Scenario 1: Complete Cache (finalized found) |
| 74 | +``` |
| 75 | +Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned |
| 76 | +Found: finalized (index 0) |
| 77 | +Result: restored=true, needs-build=false |
| 78 | +Action: Skip build entirely |
| 79 | +``` |
| 80 | +
|
| 81 | +### Scenario 2: Partial Cache (wasm-compiled found) |
| 82 | +``` |
| 83 | +Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned |
| 84 | +Found: wasm-compiled (index 2) |
| 85 | +Result: restored=true, needs-build=true |
| 86 | +Action: Build runs to create wasm-synced → finalized |
| 87 | +``` |
| 88 | +
|
| 89 | +### Scenario 3: Early Cache (source-cloned found) |
| 90 | +``` |
| 91 | +Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned |
| 92 | +Found: source-cloned (index 3) |
| 93 | +Result: restored=true, needs-build=true |
| 94 | +Action: Build runs to create wasm-compiled → wasm-synced → finalized |
| 95 | +``` |
| 96 | +
|
| 97 | +### Scenario 4: No Cache |
| 98 | +``` |
| 99 | +Checkpoint chain: finalized,wasm-synced,wasm-compiled,source-cloned |
| 100 | +Found: none |
| 101 | +Result: restored=false, needs-build=true |
| 102 | +Action: Build runs from scratch |
| 103 | +``` |
| 104 | +
|
| 105 | +## Package-Specific Checkpoint Chains |
| 106 | +
|
| 107 | +| Package | Checkpoint Chain | |
| 108 | +|---------|------------------| |
| 109 | +| **ONNX Runtime** (dev) | `finalized,wasm-synced,wasm-released,wasm-compiled,source-cloned` | |
| 110 | +| **ONNX Runtime** (prod) | `finalized,wasm-synced,wasm-optimized,wasm-released,wasm-compiled,source-cloned` | |
| 111 | +| **Yoga Layout** (dev) | `finalized,wasm-synced,wasm-released,wasm-compiled,source-configured,source-cloned` | |
| 112 | +| **Yoga Layout** (prod) | `finalized,wasm-synced,wasm-optimized,wasm-released,wasm-compiled,source-configured,source-cloned` | |
| 113 | +| **Models** | `finalized,quantized,converted,downloaded` | |
| 114 | +| **Node.js Smol** | `finalized,binary-compressed,binary-stripped,binary-released,source-patched,source-cloned` | |
| 115 | +
|
| 116 | +Note: ONNX and Yoga include `wasm-optimized` only in prod mode. |
| 117 | +
|
| 118 | +## Expected Checkpoint Structure |
| 119 | +
|
| 120 | +Checkpoints should contain a `Final/` directory with build outputs: |
| 121 | +
|
| 122 | +``` |
| 123 | +finalized.tar.gz |
| 124 | +└── Final/ |
| 125 | + ├── output.wasm |
| 126 | + ├── output.mjs |
| 127 | + └── output.js |
| 128 | +``` |
| 129 | +
|
| 130 | +## Error Handling |
| 131 | +
|
| 132 | +The action will fail with detailed error messages if: |
| 133 | +- No valid checkpoints found in chain |
| 134 | +- All tarballs are corrupted |
| 135 | +- Extraction fails |
| 136 | +- Output directory is invalid |
| 137 | +
|
| 138 | +## Benefits |
| 139 | +
|
| 140 | +### Progressive Restoration |
| 141 | +- **Partial cache hits are useful**: Don't waste intermediate checkpoints |
| 142 | +- **Resumable builds**: Continue from any point in the pipeline |
| 143 | +- **Faster iterations**: Skip completed phases even if final checkpoint is missing |
| 144 | +
|
| 145 | +### Consistency |
| 146 | +- **Single restoration logic**: Shared across all workflows |
| 147 | +- **Maintainability**: Update in one place |
| 148 | +- **Debugging**: Detailed logging shows which checkpoint was used |
| 149 | +
|
| 150 | +### Efficiency |
| 151 | +- **Maximize cache utilization**: Use any valid checkpoint, not just the final one |
| 152 | +- **Reduce build times**: Skip unnecessary rebuild of early phases |
| 153 | +- **CI cost savings**: Less compute time = lower costs |
| 154 | +
|
| 155 | +## Migration Notes |
| 156 | +
|
| 157 | +This action replaced the older single-checkpoint restoration pattern. All packages now use progressive restoration with standardized checkpoint naming: |
| 158 | +
|
| 159 | +- All final checkpoints are named **`finalized`** (previously `wasm-finalized`, `quantized`, etc.) |
| 160 | +- All restoration happens through checkpoint chains |
| 161 | +- No separate Final output caches (checkpoint-only caching) |
0 commit comments