Will edit the code to add the capability for it to use CUDA GPU (As I am trying to use this as a benchmark)