-
Notifications
You must be signed in to change notification settings - Fork 289
video_core/shader: Optimize fragment shader by skipping passthrough TEV stages #1029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I'll take a look and see if I can fix it |
All fixed |
Got a little carried away there, I think I'm going to stop here with optimizations |
6200d83
to
e1cfa00
Compare
Everything should be working now, I went and tested my library of games and there are no longer any graphical issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start the review, but please note that this area of the emulator is not my strong point, so you will need to provide info in some places.
// Compile the vertex shader. | ||
shader_engine->SetupBatch(vs_setup, regs.internal.vs.main_offset); | ||
// Compute current VS config hash | ||
u64 vs_hash = vs_setup.GetProgramCodeHash() ^ vs_setup.GetSwizzleDataHash(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use an xor? Is the calculation of the hash less time consuming than just falling through the rest of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash comparison acts as a cache check. if the shader configuration hasn't changed since last time, skip shader setup
@@ -291,6 +291,9 @@ class PicaCore { | |||
PrimitiveAssembler primitive_assembler; | |||
CommandList cmd_list; | |||
std::unique_ptr<ShaderEngine> shader_engine; | |||
u64 last_vs_hash = 0xDEADBEEFDEADBEEF; // Track last used VS hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but why not just start at 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it when I'm not busy if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget why I didn't do that, probably just to track the value for debugging or something
WriteTevStage(index); | ||
} | ||
if (!tev_stage_processed) { | ||
out += "combiner_output = rounded_primary_color;\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this? If all stages are passthrough then is the primary color used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tev_stage_processed flag stays false if all stages are detected as passthrough by the IsPassThroughTevStage helper function, and that if statement is triggered when false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that I understood, I'm asking for justifying the use of rounded_primary_color
AppendAlphaCombiner(stage.alpha_op); | ||
out += ");\n"; | ||
} | ||
if (IsPassThroughTevStage(stage)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check it here again? Are there other codepaths than on line 164?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That check is pointless and can be removed. I must've left that there from testing.
"clamp(alpha_output_{} * {}.0, 0.0, 1.0));\n", | ||
index, stage.GetColorMultiplier(), index, stage.GetAlphaMultiplier()); | ||
// Batch static appends for color_results | ||
out += "color_results_1 = "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered maybe using a more optimized string structure for concatenating? I agree that using fmt may not be the best idea, but is it so to just use string concat? (A "haven't researched, out of scope" response is a valid response :P)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look into that, but I will right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made that change and pushed to GitHub, I noticed a decent improvement in performance as a result
@@ -1610,4 +1606,51 @@ std::vector<u32> GenerateFragmentShader(const FSConfig& config, const Profile& p | |||
return module.Assemble(); | |||
} | |||
|
|||
// Helper to detect passthrough TEV stages for optimization | |||
static bool IsPassThroughTevStage(const TexturingRegs::TevStageConfig& stage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function different than the previous IsPassThroughTevStage one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were the same, but including constant as a passthrough stage on glsl caused graphical glitches, and so I had to be more conservative on that optimization for glsl. I believe I made that change in one of the most recent commits
src/video_core/shader/shader_jit.cpp
Outdated
|
||
namespace Pica::Shader { | ||
|
||
JitEngine::JitEngine() = default; | ||
JitEngine::~JitEngine() = default; | ||
JitEngine::JitEngine() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this is all about and the reasoning behind it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it offloads shader compilation to a thread pool to reduce the load on the main thread
By the way, I have played the LM2 intro side by side on 2121.1 and the msys2 artifact from this build, and the vulkan shader stutter, with the cache cleaned up beforehand, seems to be LONGER (a few ms) in this PR than on 2121.1. |
do you have at least a 3-3.5 ms render thread delay? you still need a delay, just much less. On my hardware, level D-1 of LM2 went from requiring a 9.5 ms delay on 2121.1 just to get to where the stuttering infrequent enough to be playable, to only needing a 3-4 ms delay to eliminate stuttering altogether |
As per the project readme, don't repeatedly merge master into your branch. A maintainer will do it if/when necessary. |
I did that because the PR that was recently merged had modified files that may have an effect on this PR |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
…EV stages This change adds a fast-path optimization in the fragment shader generator to detect and skip TEV stages that simply pass through their input unchanged. This reduces shader complexity and improves performance for common rendering cases where TEV stages are configured as passthrough. The optimization checks for: - Replace operation for both color and alpha - Previous buffer as source - No color/alpha modifiers - Unity multipliers This is a safe optimization as it preserves exact PICA behavior while reducing unnecessary shader instructions.
…acking VS config hash
…c op config fields when disabled
…inds in OpenGL rasterizer
…n overhead in TEV stage emission
ab87c11
to
f831d9e
Compare
This change adds a fast-path optimization in the fragment shader generator to detect and skip TEV stages that simply pass through their input unchanged. This reduces shader complexity and improves performance for common rendering cases where TEV stages are configured as passthrough.
The optimization checks for:
This is a safe optimization as it preserves exact PICA behavior while reducing unnecessary shader instructions.
This change also increases performance in games like Luigi's Mansion: Dark Moon