|
| 1 | +# Detail Manager Compute Infrastructure - Implementation Status |
| 2 | + |
| 3 | +## Completed Components |
| 4 | + |
| 5 | +### 1. GPU Data Structures (`DetailManager_Compute.h`) |
| 6 | +✅ **DetailInstanceGPU** (128 bytes, cache-aligned) |
| 7 | +- Transform data: position, scale, rotation_y |
| 8 | +- Rendering data: lighting (c_hemi, c_sun), object_id, vis_id, color_rgb |
| 9 | +- Bounding data: AABB min/max, sphere radius |
| 10 | +- Metadata: slot coordinates, flags, fade distance |
| 11 | + |
| 12 | +✅ **FrustumGPU** (64 bytes) |
| 13 | +- 6 frustum planes extracted from view-projection matrix |
| 14 | + |
| 15 | +✅ **DetailCullParams** (64 bytes) |
| 16 | +- Camera position/direction |
| 17 | +- Fade limits (start/end squared distances) |
| 18 | +- SSA thresholds for culling/LOD |
| 19 | +- Frame number for temporal effects |
| 20 | + |
| 21 | +✅ **IndirectDrawArgs** |
| 22 | +- Matches D3D11_DRAW_INDEXED_ARGUMENTS layout |
| 23 | +- `instance_count` written by compute shader |
| 24 | + |
| 25 | +### 2. Compute Shader (`detail_cull.cs`) |
| 26 | +✅ **Culling Pipeline** (256 threads per group) |
| 27 | +1. Distance culling (fade_limit_sqr) |
| 28 | +2. Frustum culling (sphere vs 6 planes) |
| 29 | +3. SSA (Screen Space Area) culling |
| 30 | +4. Sort into 3 visibility lists by vis_id (still/wave1/wave2) |
| 31 | +5. Atomic counter updates |
| 32 | +6. Output to UAV buffers |
| 33 | + |
| 34 | +✅ **Functions:** |
| 35 | +- `SphereInsidePlane()` - Plane/sphere intersection |
| 36 | +- `FrustumCullSphere()` - 6-plane frustum test |
| 37 | +- `ComputeSSA()` - Screen-space area for LOD |
| 38 | + |
| 39 | +### 3. Manager Class (`DetailManager_Compute.h`) |
| 40 | +✅ **DetailComputeManager** |
| 41 | +- Instance management (Begin/Add/End) |
| 42 | +- GPU buffer allocation (structured buffers, UAVs, SRVs) |
| 43 | +- Compute dispatch (culling pass) |
| 44 | +- Indirect rendering support |
| 45 | +- Statistics tracking |
| 46 | + |
| 47 | +✅ **GPU Resources:** |
| 48 | +- Instance buffer (all instances) + SRV |
| 49 | +- Visible indices buffers [3] + UAVs + SRVs |
| 50 | +- Counter buffer (atomic) + UAV |
| 51 | +- Indirect args buffers [3] + UAVs |
| 52 | +- Constant buffer (cull params) |
| 53 | +- Compute shader reference |
| 54 | + |
| 55 | +✅ **Utility Functions:** |
| 56 | +- `BuildFrustumGPU()` - Extract planes from VP matrix |
| 57 | +- `ConvertToGPUInstance()` - CPU SlotItem → GPU format |
| 58 | + |
| 59 | +## Next Steps |
| 60 | + |
| 61 | +### Phase 1: Implementation (Current) |
| 62 | +- [ ] Create `DetailManager_Compute.cpp` with buffer creation/destruction |
| 63 | +- [ ] Implement `CreateBuffers()` - allocate structured buffers |
| 64 | +- [ ] Implement `UploadInstances()` - CPU→GPU transfer |
| 65 | +- [ ] Implement `DispatchCulling()` - bind resources & dispatch compute |
| 66 | +- [ ] Implement `RenderIndirect()` - DrawIndexedInstancedIndirect calls |
| 67 | +- [ ] Compile shader: `detail_cull.cs` → compiled bytecode |
| 68 | + |
| 69 | +### Phase 2: Integration |
| 70 | +- [ ] Add compute path toggle to `DetailManager.h` |
| 71 | +- [ ] Integrate with existing `cache_Decompress()` to build instance list |
| 72 | +- [ ] Replace `UpdateVisibleM()` with compute dispatch |
| 73 | +- [ ] Hook into `hw_Render()` for indirect draws |
| 74 | +- [ ] Add console commands for enable/disable/stats |
| 75 | + |
| 76 | +### Phase 3: Testing & Optimization |
| 77 | +- [ ] Verify visual parity with CPU path |
| 78 | +- [ ] Profile GPU/CPU times |
| 79 | +- [ ] Benchmark instance counts (10K, 100K, 500K) |
| 80 | +- [ ] Tune thread group size (current: 256) |
| 81 | +- [ ] Add GPU timestamps for perf tracking |
| 82 | + |
| 83 | +## Architecture Benefits |
| 84 | + |
| 85 | +### Memory Efficiency |
| 86 | +- CPU: ~80 bytes per SlotItem (pointers, matrices, scattered data) |
| 87 | +- GPU: 128 bytes per instance (cache-aligned, contiguous) |
| 88 | +- Structured buffers avoid pre-multiplied VB/IB waste |
| 89 | + |
| 90 | +### Performance Gains |
| 91 | +- **Culling**: GPU parallel > CPU serial |
| 92 | +- **Draw Calls**: 3 indirect draws vs 100s of batched draws |
| 93 | +- **Bandwidth**: No constant buffer spam (64×4 float4s per batch) |
| 94 | +- **Scalability**: 100K+ instances possible |
| 95 | + |
| 96 | +### Flexibility |
| 97 | +- Easy to add Hi-Z occlusion culling (future) |
| 98 | +- Temporal anti-aliasing support (frame counter) |
| 99 | +- GPU-driven LOD selection |
| 100 | +- Foundation for grass state texture |
| 101 | + |
| 102 | +## File Structure |
| 103 | + |
| 104 | +``` |
| 105 | +src/Layers/xrRender/ |
| 106 | +├── DetailManager_Compute.h ✅ Created (GPU structs, manager class) |
| 107 | +└── DetailManager_Compute.cpp ⏳ Next (implementation) |
| 108 | +
|
| 109 | +res/gamedata/shaders/r5/ |
| 110 | +└── detail_cull.cs ✅ Created (frustum culling) |
| 111 | +
|
| 112 | +Documentation/ |
| 113 | +├── GRASS_RENDERING_ARCHITECTURE.md ✅ Created (design doc) |
| 114 | +└── COMPUTE_INFRASTRUCTURE_STATUS.md ✅ This file |
| 115 | +``` |
| 116 | + |
| 117 | +## Shader Compilation |
| 118 | + |
| 119 | +The compute shader needs to be compiled: |
| 120 | +``` |
| 121 | +input: res/gamedata/shaders/r5/detail_cull.cs |
| 122 | +output: <shader_cache>/detail_cull_cs.cso |
| 123 | +``` |
| 124 | + |
| 125 | +Engine should auto-compile on first load, or use shader compiler tool. |
| 126 | + |
| 127 | +## Configuration |
| 128 | + |
| 129 | +Future console variables: |
| 130 | +``` |
| 131 | +r_detail_compute 1/0 - Enable GPU compute path |
| 132 | +r_detail_compute_stats 1/0 - Show culling statistics |
| 133 | +r_detail_max_instances 100000 - Max instances to allocate |
| 134 | +``` |
| 135 | + |
| 136 | +## Notes |
| 137 | + |
| 138 | +- All structures are tightly packed for GPU efficiency |
| 139 | +- Frustum extraction follows standard VP matrix decomposition |
| 140 | +- SSA calculation matches existing CPU logic for consistency |
| 141 | +- Vis_id splitting (0/1/2) preserves animation system |
| 142 | +- Indirect args allow GPU to control draw count dynamically |
| 143 | + |
| 144 | +--- |
| 145 | + |
| 146 | +**Status**: Infrastructure complete, ready for implementation phase |
| 147 | +**Branch**: `yohji/feat/mt-detailmanager` |
| 148 | +**Last Updated**: 2025-10-09 |
0 commit comments