-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting PyTorch GPU compatibility on Apple Silicon chips #914
Supporting PyTorch GPU compatibility on Apple Silicon chips #914
Comments
Ideally yes, SB3 should support that device too (not a big change), but seems like it would, at the moment, require some operation-call changes to fully support. Those need to be addressed first (or wait till torch has equal functions for all platforms), but the changes should not interfere with existing code at all; this could spur up lots of hidden changes otherwise. |
@ryanrudes could you test again? (there was a PyTorch release recently) And maybe test with PyTorch nightly build, it apparently works: DLR-RM/rl-baselines3-zoo#267 I will update the |
from stable_baselines3 import PPO
import gym
env = gym.make("Pendulum-v1")
ppo = PPO("MlpPolicy", env, device="mps")
ppo.learn(total_timesteps=1000) With current pip version of PyTorch (1.12), it raises the following exception:
Following the suggestion of the traceback ( So it is not completely stable for the moment. We may have to wait until the next release... |
thanks @qgallouedec for the feedback =) We do need to wait for more coverage yes, issue is here: pytorch/pytorch#77764 |
The bug no longer occurs with the new version of PyTorch (1.12.1) |
def obs_as_tensor(
obs: Union[np.ndarray, Dict[Union[str, int], np.ndarray]], device: th.device
) -> Union[th.Tensor, TensorDict]:
"""
Moves the observation to the given device.
:param obs:
:param device: PyTorch device
:return: PyTorch tensor of the observation on a desired device.
"""
if isinstance(obs, np.ndarray):
return th.as_tensor(obs, dtype=th.float32).to(device)
elif isinstance(obs, dict):
return {key: th.as_tensor(_obs).to(device) for (key, _obs) in obs.items()}
else:
raise Exception(f"Unrecognized type of observation {type(obs)}") its fixed the problem converting float64 to float32! but after on traing model
so than add |
For the moment, consider that SB3 is not compatible with MPS. But we are working on it: #951 Have you seen something in the documentation about float64 and MPS? (You can answer in the PR conversation.) |
yes MPS framework For a more extensive list of which data types do and don’t run: Avoid Float64 on all Apple devices. Even if the hardware supports Double physically (AMD or Intel), the Metal API doesn’t let you access it. https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf |
This modified version of obs_as_tensor should work. Make these changes in the stable_baselines3/common/utils.py The modified obs_as_tensor function should now automatically convert the observation to float32 if the device is an MPS device. Original def obs_as_tensor(
obs: Union[np.ndarray, Dict[Union[str, int], np.ndarray]], device: th.device
) -> Union[th.Tensor, TensorDict]:
"""
Moves the observation to the given device.
:param obs:
:param device: PyTorch device
:return: PyTorch tensor of the observation on a desired device.
"""
if isinstance(obs, np.ndarray):
return th.as_tensor(obs, device=device)
elif isinstance(obs, dict):
return {key: th.as_tensor(_obs, device=device) for (key, _obs) in obs.items()}
else:
raise Exception(f"Unrecognized type of observation {type(obs)}") a workaround def obs_as_tensor(obs: Union[np.ndarray, Dict[Union[str, int], np.ndarray]], device: th.device) -> Union[th.Tensor, TensorDict]:
"""
Moves the observation to the given device.
:param obs:
:param device: PyTorch device
:return: PyTorch tensor of the observation on a desired device.
"""
dtype = th.float32 if device.type == "mps" else None
if isinstance(obs, np.ndarray):
return th.as_tensor(obs, device=device, dtype=dtype)
elif isinstance(obs, dict):
return {key: th.as_tensor(_obs, device=device, dtype=dtype) for (key, _obs) in obs.items()}
else:
raise ValueError(f"Unsupported observation format: {obs}") Although it works normally, the CPU continues to be more performant than the MPS. Honestly, I don't know if this workaround is worth it, but it worked nonetheless. |
Sorry @traderpedroso, but I don't see the difference between the workaround and the original code. |
Apologies, I hadn't noticed that I duplicated the functions. I have now updated the code. Thank you for pointing it out. |
@traderpedroso thanks for creating this issue. The M1 pro already comes with quite a lot of CPU's which distribute training nicely. I was wondering if you have done any benchmarks and observed any significant performance improvements with |
I must admit that I was profoundly disheartened by the limitations of MPs, particularly due to the lack of support. After conducting numerous tests, I discerned that for reinforcement learning, CPUs have consistently proven to be the optimal choice, or at most, TPUs. However, when it comes to leveraging GPUs from Nvidia, AMD, or MPs, their performance has been largely indistinguishable in my experience. Nevertheless, when combining PyTorch with MPs for NLP and image processing tasks, we are able to witness an exhilarating performance boost, as exemplified below. import sys
import platform
import torch
import pandas as pd
import sklearn as sk
has_gpu = torch.cuda.is_available()
has_mps = getattr(torch,'has_mps',False)
device = "mps" if getattr(torch,'has_mps',False) \
else "cuda" if torch.cuda.is_available() else "cpu"
print(f"Python Platform: {platform.platform()}")
print(f"PyTorch Version: {torch.__version__}")
print()
print(f"Python {sys.version}")
print(f"Pandas {pd.__version__}")
print(f"Scikit-Learn {sk.__version__}")
print("GPU is", "available" if has_gpu else "NOT AVAILABLE")
print("MPS (Apple Metal) is", "AVAILABLE" if has_mps else "NOT AVAILABLE")
print(f"Target device is {device}")
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
from torchvision import datasets, transforms
EPOCHS = 5
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
def main():
print("PyTorch version:", torch.__version__)
print("Torchvision version:", torchvision.__version__)
# device = torch.device("mps")
print("Using Device: ", device)
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=64, shuffle=True)
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
for epoch in range(1, EPOCHS + 1):
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
if __name__ == "__main__":
main() |
Same problem here really it's painful to not being able to use the mps correctly today :( def obs_as_tensor(
To avoid error : TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. but I have no idea if the mps is able to speed up the PPO models ... |
Any updates on this feature request? Is it possible to use MPS with stable baselines3 now? |
Could the release of MLX play any role for improving performance of stable-baselines3 on Apple silicon? This post discusses how come MLX is not implemented within Pytorch, instead of as an alternative to Pytorch. Unfortunately (?) it seems like stable-baselines3 would need to support use of MLX in addition to Pytorch to harvest the benefits. |
Please have a look at the PR and the other comments, you can give it a try using
If you want to have a performance boost (not only on Apple silicon), I would recommend you to have a look at SBX (SB3 + Jax): https://github.com/araffin/sbx |
🚀 Feature
PyTorch recently released support for GPU acceleration using the Apple Silicon chips. This should be supported in stable-baselines3 by the
"mps"
device (I believe).Minimal Example
The Mac Silicon GPU device is not automatically recognized by stable-baselines at the moment, so it defaults to
"cpu"
. If you try to force it to use the"mps"
device, this stack trace appears.The text was updated successfully, but these errors were encountered: