Description
As GPUs get increasingly more compute power in lower precision (down to four bits!) compared to double precision, we have to consider how to make efficient use of future machines with pySDC.
Reducing the precision in the implementation is simple, but the questions are:
- Where can we reduce the floating point precision with minimal impact on the accuracy of the solution?
- Where can we actually gain something by reduced floating point precision?
Note that the cost of individual arithmetic operations is limited from below by kernel launch cost, see for instance the discussion in this article. Therefore, I doubt that we win much by simply switching everything to lower precision.
A good starting point would be to choose a problem with an iterative linear solver that is launched as a single kernel. Then, we can start by doing only the implicit solves in single precision, or half precision, or four bit precision, and see what we gain. Possibly, we have to increase the precision between SDC iterations and probably we have to choose quite large problem resolution to see the difference.
The scope of this project appears well suited to a bachelor thesis, an internship, or the likes. If you or anyone you know would find this interesting, please get in touch!
There is no need for a deep understanding of SDC or Python. Basic proficiency with the latter and a low level of fear of maths is sufficient. This would be a nice opportunity to get to know GPU programming in Python.