Skip to content

Any progress or hint on implementing GPU sorting? #39

@Yang-Xijie

Description

@Yang-Xijie

I find the following sentence in README in commit 8876629:

* Sorting on GPU. Sorting is currently done on the CPU asynchronously at a lower framerate (~10 fps), which increases how often you'll see pops especially when the viewpoint changes quickly

which was removed in commit 2213483:

### TODO

However, it seems sorting still happens on the CPU side:

orderAndDepthTempSort.sort { $0.depth > $1.depth }


The performance of CPU sort is N*log(N). Here are some performance report (on M2 Pro MacBook Pro, release mode, random numbers):

〇 UInt32
1000000: 0.071 s
10000000: 0.759 s
〇 Float32
1000000 0.084 s
10000000 0.934 s

impeding real-time rendering of 3D gaussians...


It seems that radix sort on GPU for Metal is missing. (https://developer.apple.com/forums/thread/105886)

I did find an implementation on Apple Silicon, but it is difficult for me to understand. (https://github.com/ShoYamanishi/AppleNumericalComputing/tree/main/05_radix_sort#54-metal-radix-sort-implementations)

Will there be someone implementing the sorting on GPU to improve the performance? Or is there some library we can directly adopt to accelerate?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions