Toy path tracer I did in 2018 for my own learning purposes. Somewhat based on Peter Shirley's Ray Tracing in One Weekend minibook (highly recommended!), and on Kevin Beason's smallpt.
I decided to write blog posts about things I discover as I do this:
- Part 0: Intro
- Part 1: Initial C++ and walkthrough
- Part 2: Fix stupid performance issue
- Part 3: C#, Unity and Burst
- Part 4: Correctness fixes and Mitsuba
- Part 5: simple GPU version via Metal
- Part 6: simple GPU version via D3D11
- Part 7: initial C++ SIMD & SoA
- Part 8: SSE SIMD for HitSpheres
- Part 9: ryg optimizes my code
- Part 10: Update all implementations to match
- Part 11: Buffer-oriented approach on CPU
- Part 12: Buffer-oriented approach on GPU D3D11
- Part 13: GPU thread group data optimization
- Part 14: Make it run on iOS
- Part 15: A bunch of path tracing links
- Part 16: Unity C# Burst optimization
- Part 17: WebAssembly
Note: it can only do spheres, no bounding volume hierachy of any sorts, a lot of stuff hardcoded.
Performance numbers in Mray/s on a scene with ~50 spheres and two light sources, running on the CPU:
| Language | Approach | Ryzen 5950 | AMD TR1950 | MBP 2021 | MBP 2018 | MBA 2020 | iPhone 11 | iPhone X | iPhone SE |
|---|---|---|---|---|---|---|---|---|---|
| C++ | SIMD Intrinsics | 281.0 | 187.0 | 105.4 | 74.0 | 32.3 | 26.4 | 12.9 | 8.5 |
| Scalar | 141.2 | 100.0 | 84.8 | 35.7 | 15.9 | ||||
| WebAssembly (no threads, no SIMD) | 8.4 | 5.0 | 8.1 | 5.6 | |||||
| C# | Unity Burst "manual" SIMD | 227.2 | 133.0 | 103.7 | 60.0 | 29.7 | |||
| Unity Burst | 82.0 | 36.0 | |||||||
| Unity (Editor) | 6.5 | 3.4 | |||||||
| Unity (player Mono) | 6.7 | 3.5 | |||||||
| Unity (player IL2CPP) | 39.1 | 63.8 | 17.2 | ||||||
| .NET 6.0 | 91.5 | 53.0 | 40.9 | ||||||
| .NET Core 2.0 | 86.1 | 53.0 | 23.6 | ||||||
| Mono --llvm | 35.1 | 22.0 | |||||||
| Mono | 23.6 | 3.6 | 6.1 |
More detailed specs of the machines above are:
Ryzen 5950: AMD Ryzen 5950X (3.4GHz, 16c/32t), Visual Studio 2022.AMD TR1950: AMD ThreadRipper 1950X (3.4GHz, SMT disabled - 16c/16t), Visual Studio 2017.MBP 2021: Apple MacBook Pro M1 Max (8+2 cores), Xcode 13.2.MBP 2018: Apple MacBook Pro mid-2018 (Core i9 2.9GHz, 6c/12t).MBA 2020: Apple MacBook Air 2020 (Core i7 1.2GHz, 4c/8t).iPhone 11: A13 chip.iPhone X: A11 chip.iPhone SE: A9 chip.
Software versions:
- Unity 2021.3.16. Burst 1.6.6 (safety checks off). C# testing in editor, Release mode.
- Mono 6.12.
And on the GPU, via a compute shader in D3D11 or Metal depending on the platform:
| GPU | Perf |
|---|---|
| D3D11 | |
| GeForce RTX 3080Ti | 3920 |
| GeForce GTX 1080Ti | 1854 |
| Metal | |
| MBP 2024 (M4 Max) | 1680 |
| MBP 2021 (M1 Max) | 1065 |
| MBP 2018 (Radeon Pro 560X) | 246 |
| MBA 2020 (Iris Plus) | 201 |
| iPhone 11 Pro (A13) | 80 |
| iPhone X (A11) | 46 |
| iPhone SE (A9) | 20 |
A lot of stuff in the implementation is totally suboptimal or using the tech in a "wrong" way. I know it's just a simple toy, ok :)
- C++ projects:
- Windows (Visual Studio 2017) in
Cpp/Windows/ToyPathTracer.sln. DX11 Win32 app that displays result as a fullscreen CPU-updated or GPU-rendered texture. Pressing G toggles between GPU and CPU tracing, A toggles animation, P toggles progressive accumulation. - Mac/iOS (Xcode 10) in
Cpp/Apple/ToyPathTracer.xcodeproj. Metal app that displays result as a fullscreen CPU-updated or GPU-rendered texture. Pressing G toggles between GPU and CPU tracing, A toggles animation, P toggles progressive accumulation. Should work on both Mac (Test Mactarget) and iOS (Test iOStarget). - WebAssembly in
Cpp/Emscripten/build.sh. CPU, single threaded, no SIMD.
- Windows (Visual Studio 2017) in
- C# project in
Cs/TestCs.sln. A command line app that renders some frames and dumps out final TGA screenshot at the end. - Unity project in
Unity. I used Unity 2021.3.16.
