Skip to content

Commit 5c5b84e

Browse files
authored
Use cached PyTorch wheels on MacOS jobs (#9484)
One of the current drawback of using pinned PyTorch commit on CI is that we need to build PyTorch wheel on all MacOS jobs because it doesn't have Docker image. Building PyTorch wheel is usually not too bad because we have sccache in place to make the compilation faster. However, it's still slower than using a prebuilt wheel, and sccache is also not available on GitHub MacOS runner `macos-latest-xlarge` (no access to S3). As all MacOS jobs are building exactly the same PyTorch wheel, the proposal here is to cache the wheel on S3 `gha-artifacts` bucket which is publicly readable, i.e. https://gha-artifacts.s3.us-east-1.amazonaws.com/cached_artifacts/pytorch/executorch/pytorch_wheels/Darwin/311/torch-2.7.0a0%2Bgit295f2ed-cp311-cp311-macosx_14_0_arm64.whl. The job can check for matching wheel from S3 and use it instead. If there is no such wheel, it will continue building PyTorch normally. Once a new wheel is built and if the runner has write access to S3, it will upload the wheel so that other jobs can pick it up going forward. ### Testing All CI jobs pass (failures are pre-existing from trunk). Here are some quick number on how this helps reduce the durations of different MacOS jobs. * Apple workflow: * build-benchmark-app: [BEFORE](https://github.com/pytorch/executorch/actions/runs/14002229786/job/39210715922) ~80m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214390843) ~44m * build-frameworks-ios: [BEFORE](https://github.com/pytorch/executorch/actions/runs/14002229786/job/39210732212) ~80m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214394644) ~ 44m * build-demo-ios: [BEFORE](https://github.com/pytorch/executorch/actions/runs/14003433493/job/39213882743) ~ 55m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001343158/job/39214390955) ~23m * Apple perf workflow: * build-benchmark-app: [BEFORE](https://github.com/pytorch/executorch/actions/runs/13982706236/job/39208203350) ~80m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001347585/job/39214401072) ~48m * export model (llama): [BEFORE](https://github.com/pytorch/executorch/actions/runs/13982706236/job/39150917351) ~30m → [AFTER](https://github.com/pytorch/executorch/actions/runs/14001347585/job/39214401617) ~13m * All MacOS jobs in pull and trunk: * BEFORE ~417 on commit b195ed9 → AFTER ~268m Overall, I'm seeing the duration for all MacOS jobs reducing by close to 2x. This is very useful to reduce the cost running MacOS jobs (remember the budget request to OSS team because of the $$$ GitHub MacOS runners)
1 parent 60280d9 commit 5c5b84e

File tree

3 files changed

+42
-6
lines changed

3 files changed

+42
-6
lines changed

.ci/scripts/utils.sh

+39-5
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,46 @@ install_pytorch_and_domains() {
6060
# Fetch the target commit
6161
pushd pytorch || return
6262
git checkout "${TORCH_VERSION}"
63-
git submodule update --init --recursive
6463

65-
export USE_DISTRIBUTED=1
66-
# Then build and install PyTorch
67-
python setup.py bdist_wheel
68-
pip install "$(echo dist/*.whl)"
64+
local system_name=$(uname)
65+
if [[ "${system_name}" == "Darwin" ]]; then
66+
local platform=$(python -c 'import sysconfig; import platform; v=platform.mac_ver()[0].split(".")[0]; platform=sysconfig.get_platform().split("-"); platform[1]=f"{v}_0"; print("_".join(platform))')
67+
fi
68+
local python_version=$(python -c 'import platform; v=platform.python_version_tuple(); print(f"{v[0]}{v[1]}")')
69+
local torch_release=$(cat version.txt)
70+
local torch_short_hash=${TORCH_VERSION:0:7}
71+
local torch_wheel_path="cached_artifacts/pytorch/executorch/pytorch_wheels/${system_name}/${python_version}"
72+
local torch_wheel_name="torch-${torch_release}%2Bgit${torch_short_hash}-cp${python_version}-cp${python_version}-${platform:-}.whl"
73+
74+
local cached_torch_wheel="https://gha-artifacts.s3.us-east-1.amazonaws.com/${torch_wheel_path}/${torch_wheel_name}"
75+
# Cache PyTorch wheel is only needed on MacOS, Linux CI already has this as part
76+
# of the Docker image
77+
local torch_wheel_not_found=0
78+
if [[ "${system_name}" == "Darwin" ]]; then
79+
pip install "${cached_torch_wheel}" || torch_wheel_not_found=1
80+
else
81+
torch_wheel_not_found=1
82+
fi
83+
84+
# Found no such wheel, we will build it from source then
85+
if [[ "${torch_wheel_not_found}" == "1" ]]; then
86+
echo "No cached wheel found, continue with building PyTorch at ${TORCH_VERSION}"
87+
88+
git submodule update --init --recursive
89+
USE_DISTRIBUTED=1 python setup.py bdist_wheel
90+
pip install "$(echo dist/*.whl)"
91+
92+
# Only AWS runners have access to S3
93+
if command -v aws && [[ -z "${GITHUB_RUNNER:-}" ]]; then
94+
for wheel_path in dist/*.whl; do
95+
local wheel_name=$(basename "${wheel_path}")
96+
echo "Caching ${wheel_name}"
97+
aws s3 cp "${wheel_path}" "s3://gha-artifacts/${torch_wheel_path}/${wheel_name}"
98+
done
99+
fi
100+
else
101+
echo "Use cached wheel at ${cached_torch_wheel}"
102+
fi
69103

70104
# Grab the pinned audio and vision commits from PyTorch
71105
TORCHAUDIO_VERSION=$(cat .github/ci_commit_pins/audio.txt)

.github/workflows/_unittest.yml

+2
Original file line numberDiff line numberDiff line change
@@ -49,4 +49,6 @@ jobs:
4949
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
5050
script: |
5151
set -eux
52+
# This is needed to get the prebuilt PyTorch wheel from S3
53+
${CONDA_RUN} --no-capture-output pip install awscli==1.37.21
5254
.ci/scripts/unittest-macos.sh --build-tool "${{ inputs.build-tool }}" --build-mode "${{ inputs.build-mode }}" --editable "${{ inputs.editable }}"

.github/workflows/trunk.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ jobs:
228228
name: test-coreml-delegate
229229
uses: pytorch/test-infra/.github/workflows/macos_job.yml@main
230230
with:
231-
runner: macos-13-xlarge
231+
runner: macos-latest-xlarge
232232
python-version: '3.11'
233233
submodules: 'true'
234234
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}

0 commit comments

Comments
 (0)