I was looking at runtime traces and noticed the massive bubbles when using lower-end GPUs. Since emulator work is bursty, this has always been an issue where GPU load is low but actually is just bouncing around between 100% and something like 10% over and over causing averages that look like the GPU is not doing much.
The current architecture is from a time when submits were (and still are) expensive. This is ideal for games not for emulators since the CPU can request data at any time and we then have a full system stall.