Investigate timing in multi-GPU example

#24 adds a multi-GPU PyTorch example that demonstrates how to use Distributed Data Parallel training.  However, training with multiple GPUs does not speed up training in the example.  See https://github.com/CHTC/templates-GPUs/pull/24#issuecomment-1249509118

It would be worthwhile to monitor the training more closely, for instance the GPU utilization, to understand why this is the case.