-
Notifications
You must be signed in to change notification settings - Fork 64
Description
I find the following sentence in README in commit 8876629:
Line 23 in 8876629
| * Sorting on GPU. Sorting is currently done on the CPU asynchronously at a lower framerate (~10 fps), which increases how often you'll see pops especially when the viewpoint changes quickly |
which was removed in commit 2213483:
Line 13 in 2213483
| ### TODO |
However, it seems sorting still happens on the CPU side:
| orderAndDepthTempSort.sort { $0.depth > $1.depth } |
The performance of CPU sort is N*log(N). Here are some performance report (on M2 Pro MacBook Pro, release mode, random numbers):
〇 UInt32
1000000: 0.071 s
10000000: 0.759 s
〇 Float32
1000000 0.084 s
10000000 0.934 s
impeding real-time rendering of 3D gaussians...
It seems that radix sort on GPU for Metal is missing. (https://developer.apple.com/forums/thread/105886)
I did find an implementation on Apple Silicon, but it is difficult for me to understand. (https://github.com/ShoYamanishi/AppleNumericalComputing/tree/main/05_radix_sort#54-metal-radix-sort-implementations)
Will there be someone implementing the sorting on GPU to improve the performance? Or is there some library we can directly adopt to accelerate?