#24 adds a multi-GPU PyTorch example that demonstrates how to use Distributed Data Parallel training. However, training with multiple GPUs does not speed up training in the example. See #24 (comment)
It would be worthwhile to monitor the training more closely, for instance the GPU utilization, to understand why this is the case.