Skip to content

Commit b97914c

Browse files
committed
Merge branch 'yohji/feat/computeshaders' into revolution
2 parents 4b8873d + 19ec375 commit b97914c

38 files changed

+4604
-67
lines changed

COMPUTE_INFRASTRUCTURE_STATUS.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Detail Manager Compute Infrastructure - Implementation Status
2+
3+
## Completed Components
4+
5+
### 1. GPU Data Structures (`DetailManager_Compute.h`)
6+
**DetailInstanceGPU** (128 bytes, cache-aligned)
7+
- Transform data: position, scale, rotation_y
8+
- Rendering data: lighting (c_hemi, c_sun), object_id, vis_id, color_rgb
9+
- Bounding data: AABB min/max, sphere radius
10+
- Metadata: slot coordinates, flags, fade distance
11+
12+
**FrustumGPU** (64 bytes)
13+
- 6 frustum planes extracted from view-projection matrix
14+
15+
**DetailCullParams** (64 bytes)
16+
- Camera position/direction
17+
- Fade limits (start/end squared distances)
18+
- SSA thresholds for culling/LOD
19+
- Frame number for temporal effects
20+
21+
**IndirectDrawArgs**
22+
- Matches D3D11_DRAW_INDEXED_ARGUMENTS layout
23+
- `instance_count` written by compute shader
24+
25+
### 2. Compute Shader (`detail_cull.cs`)
26+
**Culling Pipeline** (256 threads per group)
27+
1. Distance culling (fade_limit_sqr)
28+
2. Frustum culling (sphere vs 6 planes)
29+
3. SSA (Screen Space Area) culling
30+
4. Sort into 3 visibility lists by vis_id (still/wave1/wave2)
31+
5. Atomic counter updates
32+
6. Output to UAV buffers
33+
34+
**Functions:**
35+
- `SphereInsidePlane()` - Plane/sphere intersection
36+
- `FrustumCullSphere()` - 6-plane frustum test
37+
- `ComputeSSA()` - Screen-space area for LOD
38+
39+
### 3. Manager Class (`DetailManager_Compute.h`)
40+
**DetailComputeManager**
41+
- Instance management (Begin/Add/End)
42+
- GPU buffer allocation (structured buffers, UAVs, SRVs)
43+
- Compute dispatch (culling pass)
44+
- Indirect rendering support
45+
- Statistics tracking
46+
47+
**GPU Resources:**
48+
- Instance buffer (all instances) + SRV
49+
- Visible indices buffers [3] + UAVs + SRVs
50+
- Counter buffer (atomic) + UAV
51+
- Indirect args buffers [3] + UAVs
52+
- Constant buffer (cull params)
53+
- Compute shader reference
54+
55+
**Utility Functions:**
56+
- `BuildFrustumGPU()` - Extract planes from VP matrix
57+
- `ConvertToGPUInstance()` - CPU SlotItem → GPU format
58+
59+
## Next Steps
60+
61+
### Phase 1: Implementation (Current)
62+
- [ ] Create `DetailManager_Compute.cpp` with buffer creation/destruction
63+
- [ ] Implement `CreateBuffers()` - allocate structured buffers
64+
- [ ] Implement `UploadInstances()` - CPU→GPU transfer
65+
- [ ] Implement `DispatchCulling()` - bind resources & dispatch compute
66+
- [ ] Implement `RenderIndirect()` - DrawIndexedInstancedIndirect calls
67+
- [ ] Compile shader: `detail_cull.cs` → compiled bytecode
68+
69+
### Phase 2: Integration
70+
- [ ] Add compute path toggle to `DetailManager.h`
71+
- [ ] Integrate with existing `cache_Decompress()` to build instance list
72+
- [ ] Replace `UpdateVisibleM()` with compute dispatch
73+
- [ ] Hook into `hw_Render()` for indirect draws
74+
- [ ] Add console commands for enable/disable/stats
75+
76+
### Phase 3: Testing & Optimization
77+
- [ ] Verify visual parity with CPU path
78+
- [ ] Profile GPU/CPU times
79+
- [ ] Benchmark instance counts (10K, 100K, 500K)
80+
- [ ] Tune thread group size (current: 256)
81+
- [ ] Add GPU timestamps for perf tracking
82+
83+
## Architecture Benefits
84+
85+
### Memory Efficiency
86+
- CPU: ~80 bytes per SlotItem (pointers, matrices, scattered data)
87+
- GPU: 128 bytes per instance (cache-aligned, contiguous)
88+
- Structured buffers avoid pre-multiplied VB/IB waste
89+
90+
### Performance Gains
91+
- **Culling**: GPU parallel > CPU serial
92+
- **Draw Calls**: 3 indirect draws vs 100s of batched draws
93+
- **Bandwidth**: No constant buffer spam (64×4 float4s per batch)
94+
- **Scalability**: 100K+ instances possible
95+
96+
### Flexibility
97+
- Easy to add Hi-Z occlusion culling (future)
98+
- Temporal anti-aliasing support (frame counter)
99+
- GPU-driven LOD selection
100+
- Foundation for grass state texture
101+
102+
## File Structure
103+
104+
```
105+
src/Layers/xrRender/
106+
├── DetailManager_Compute.h ✅ Created (GPU structs, manager class)
107+
└── DetailManager_Compute.cpp ⏳ Next (implementation)
108+
109+
res/gamedata/shaders/r5/
110+
└── detail_cull.cs ✅ Created (frustum culling)
111+
112+
Documentation/
113+
├── GRASS_RENDERING_ARCHITECTURE.md ✅ Created (design doc)
114+
└── COMPUTE_INFRASTRUCTURE_STATUS.md ✅ This file
115+
```
116+
117+
## Shader Compilation
118+
119+
The compute shader needs to be compiled:
120+
```
121+
input: res/gamedata/shaders/r5/detail_cull.cs
122+
output: <shader_cache>/detail_cull_cs.cso
123+
```
124+
125+
Engine should auto-compile on first load, or use shader compiler tool.
126+
127+
## Configuration
128+
129+
Future console variables:
130+
```
131+
r_detail_compute 1/0 - Enable GPU compute path
132+
r_detail_compute_stats 1/0 - Show culling statistics
133+
r_detail_max_instances 100000 - Max instances to allocate
134+
```
135+
136+
## Notes
137+
138+
- All structures are tightly packed for GPU efficiency
139+
- Frustum extraction follows standard VP matrix decomposition
140+
- SSA calculation matches existing CPU logic for consistency
141+
- Vis_id splitting (0/1/2) preserves animation system
142+
- Indirect args allow GPU to control draw count dynamically
143+
144+
---
145+
146+
**Status**: Infrastructure complete, ready for implementation phase
147+
**Branch**: `yohji/feat/mt-detailmanager`
148+
**Last Updated**: 2025-10-09

0 commit comments

Comments
 (0)