-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Bug description
I have trained a model with MPS backend; loading the checkpoint seems impossible inside a docker container (python:3.10-slim-trixie, with almost only PyTorch installed). I am able to load the model correctly outside the container.
I am loading the model using:
model.load_from_checkpoint(self.best_model_path, map_location=torch.device("cpu))
But when performing the .to(device)
operation an error occurs, it seems like it is not able to convert the parameters correctly.
Reference: pytorch/pytorch#160846
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
# Model trained on MPS, loaded in CPU-only docker container
from pytorch_forecasting import NHiTS
model = NHiTS().load_from_checkpoint(path, map_location=torch.device("cpu"))
Error messages and logs
# Error messages and logs here please
File "/opt/program/predictor.py", line 69, in predict
clf = cls.get_model()
File "/opt/program/predictor.py", line 58, in get_model
cls.model = model.load_from_checkpoint(model_path, device=torch.device("cpu"))
File "/usr/local/lib/python3.10/site-packages/o5_fcst/models/pytorch_model.py", line 144, in load_from_checkpoint
model = self.model_class.load_from_checkpoint(self.best_model_path, map_location=device)
File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/utilities/model_helpers.py", line 125, in wrapper
return self.method(cls, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1662, in load_from_checkpoint
loaded = _load_from_checkpoint(
File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/core/saving.py", line 99, in _load_from_checkpoint
return model.to(device)
File "/usr/local/lib/python3.10/site-packages/lightning/fabric/utilities/device_dtype_mixin.py", line 55, in to
return super().to(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1369, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 928, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/site-packages/torchmetrics/metric.py", line 907, in _apply
_dummy_tensor = fn(torch.zeros(1, device=self.device))
File "/usr/local/lib/python3.10/site-packages/torch/utils/_device.py", line 103, in torch_function
return func(*args, **kwargs)
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, SparseCsrMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradMAIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastMTIA, AutocastMAIA, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
Environment
Current environment
Collecting environment information...
PyTorch-Lightning: 2.5.0
PyTorch version: 2.8.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.6 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.5)
CMake version: Could not collect
Libc version: N/A
Python version: 3.10.18 (main, Aug 8 2025, 16:50:16) [Clang 20.1.4 ] (64-bit runtime)
Python platform: macOS-15.6-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M4 Pro
More info
No response
cc @lantiga