video_core/shader: Optimize fragment shader by skipping passthrough TEV stages #1029

jbm11208 · 2025-05-10T22:58:56Z

This change adds a fast-path optimization in the fragment shader generator to detect and skip TEV stages that simply pass through their input unchanged. This reduces shader complexity and improves performance for common rendering cases where TEV stages are configured as passthrough.

The optimization checks for:

Replace operation for both color and alpha
Previous buffer as source
No color/alpha modifiers
Unity multipliers

This is a safe optimization as it preserves exact PICA behavior while reducing unnecessary shader instructions.

This change also increases performance in games like Luigi's Mansion: Dark Moon

PabloMK7 · 2025-05-11T10:19:23Z

Looks like the change is breaking some visuals, notice the fountain:

Here is how it should look like:

Algiuxs · 2025-05-11T14:06:55Z

Looks like the change is breaking some visuals, notice the fountain:

Here is how it should look like:

Also the red bush

jbm11208 · 2025-05-11T15:15:44Z

I'll take a look and see if I can fix it

jbm11208 · 2025-05-11T21:29:13Z

All fixed

jbm11208 · 2025-05-12T06:09:39Z

Got a little carried away there, I think I'm going to stop here with optimizations

jbm11208 · 2025-05-16T17:23:09Z

Everything should be working now, I went and tested my library of games and there are no longer any graphical issues.

PabloMK7

Let's start the review, but please note that this area of the emulator is not my strong point, so you will need to provide info in some places.

PabloMK7 · 2025-05-19T18:51:09Z

src/video_core/pica/pica_core.cpp

-    // Compile the vertex shader.
-    shader_engine->SetupBatch(vs_setup, regs.internal.vs.main_offset);
+    // Compute current VS config hash
+    u64 vs_hash = vs_setup.GetProgramCodeHash() ^ vs_setup.GetSwizzleDataHash();


Is there a reason to use an xor? Is the calculation of the hash less time consuming than just falling through the rest of the function?

The hash comparison acts as a cache check. if the shader configuration hasn't changed since last time, skip shader setup

PabloMK7 · 2025-05-19T18:51:41Z

src/video_core/pica/pica_core.h

@@ -291,6 +291,9 @@ class PicaCore {
    PrimitiveAssembler primitive_assembler;
    CommandList cmd_list;
    std::unique_ptr<ShaderEngine> shader_engine;
+    u64 last_vs_hash = 0xDEADBEEFDEADBEEF; // Track last used VS hash


nitpick, but why not just start at 0.

I can change it when I'm not busy if you want

I forget why I didn't do that, probably just to track the value for debugging or something

PabloMK7 · 2025-05-19T18:53:53Z

src/video_core/shader/generator/glsl_fs_shader_gen.cpp

        WriteTevStage(index);
    }
+    if (!tev_stage_processed) {
+        out += "combiner_output = rounded_primary_color;\n";


Are you sure about this? If all stages are passthrough then is the primary color used?

The tev_stage_processed flag stays false if all stages are detected as passthrough by the IsPassThroughTevStage helper function, and that if statement is triggered when false.

Yes, that I understood, I'm asking for justifying the use of rounded_primary_color

PabloMK7 · 2025-05-19T18:54:41Z

src/video_core/shader/generator/glsl_fs_shader_gen.cpp

-            AppendAlphaCombiner(stage.alpha_op);
-            out += ");\n";
-        }
+    if (IsPassThroughTevStage(stage)) {


Why do we check it here again? Are there other codepaths than on line 164?

That check is pointless and can be removed. I must've left that there from testing.

PabloMK7 · 2025-05-19T18:56:20Z

src/video_core/shader/generator/glsl_fs_shader_gen.cpp

-                           "clamp(alpha_output_{} * {}.0, 0.0, 1.0));\n",
-                           index, stage.GetColorMultiplier(), index, stage.GetAlphaMultiplier());
+    // Batch static appends for color_results
+    out += "color_results_1 = ";


Have you considered maybe using a more optimized string structure for concatenating? I agree that using fmt may not be the best idea, but is it so to just use string concat? (A "haven't researched, out of scope" response is a valid response :P)

I didn't look into that, but I will right now.

Made that change and pushed to GitHub, I noticed a decent improvement in performance as a result

PabloMK7 · 2025-05-19T18:57:58Z

src/video_core/shader/generator/spv_fs_shader_gen.cpp

@@ -1610,4 +1606,51 @@ std::vector<u32> GenerateFragmentShader(const FSConfig& config, const Profile& p
    return module.Assemble();
 }

+// Helper to detect passthrough TEV stages for optimization
+static bool IsPassThroughTevStage(const TexturingRegs::TevStageConfig& stage) {


Why is this function different than the previous IsPassThroughTevStage one?

They were the same, but including constant as a passthrough stage on glsl caused graphical glitches, and so I had to be more conservative on that optimization for glsl. I believe I made that change in one of the most recent commits

PabloMK7 · 2025-05-19T18:58:19Z

src/video_core/shader/shader_jit.cpp


 namespace Pica::Shader {

-JitEngine::JitEngine() = default;
-JitEngine::~JitEngine() = default;
+JitEngine::JitEngine() {


Can you explain what this is all about and the reasoning behind it?

it offloads shader compilation to a thread pool to reduce the load on the main thread

PabloMK7 · 2025-05-19T19:00:41Z

By the way, I have played the LM2 intro side by side on 2121.1 and the msys2 artifact from this build, and the vulkan shader stutter, with the cache cleaned up beforehand, seems to be LONGER (a few ms) in this PR than on 2121.1.

jbm11208 · 2025-05-19T20:05:51Z

By the way, I have played the LM2 intro side by side on 2121.1 and the msys2 artifact from this build, and the vulkan shader stutter, with the cache cleaned up beforehand, seems to be LONGER (a few ms) in this PR than on 2121.1.

do you have at least a 3-3.5 ms render thread delay? you still need a delay, just much less. On my hardware, level D-1 of LM2 went from requiring a 9.5 ms delay on 2121.1 just to get to where the stuttering infrequent enough to be playable, to only needing a 3-4 ms delay to eliminate stuttering altogether

OpenSauce04 · 2025-05-26T16:33:28Z

As per the project readme, don't repeatedly merge master into your branch. A maintainer will do it if/when necessary.

jbm11208 · 2025-05-26T16:37:35Z

As per the project readme, don't repeatedly merge master into your branch. A maintainer will do it if/when necessary.

I did that because the PR that was recently merged had modified files that may have an effect on this PR

…EV stages This change adds a fast-path optimization in the fragment shader generator to detect and skip TEV stages that simply pass through their input unchanged. This reduces shader complexity and improves performance for common rendering cases where TEV stages are configured as passthrough. The optimization checks for: - Replace operation for both color and alpha - Previous buffer as source - No color/alpha modifiers - Unity multipliers This is a safe optimization as it preserves exact PICA behavior while reducing unnecessary shader instructions.

…haders

…acking VS config hash

…c op config fields when disabled

…inds in OpenGL rasterizer

…n overhead in TEV stage emission

… issues

pull-request-size bot added the size/L label May 10, 2025

PabloMK7 self-requested a review May 10, 2025 23:04

jbm11208 marked this pull request as ready for review May 11, 2025 01:20

pull-request-size bot added size/XL and removed size/L labels May 12, 2025

OpenSauce04 added manual merge enhancement New feature or request labels May 12, 2025

OpenSauce04 force-pushed the LM2-Optimized branch from 6200d83 to e1cfa00 Compare May 16, 2025 16:12

PabloMK7 reviewed May 19, 2025

View reviewed changes

pull-request-size bot added size/XXL size/XL and removed size/XL size/XXL labels May 20, 2025

This comment was marked as off-topic.

Sign in to view

jbm11208 added 20 commits June 9, 2025 19:39

Update licensing

f44ad5f

Fix clang-format and compilation errors

c0c6f81

Fix Graphical Glitches

f493ea6

Fix clang-format in spv_fs_shader_gen.cpp

6032639

More general optimizations

608bb44

Minimize Shader Cache Growth for Disabled Lighting

ca9fad0

Implement LRU-based shader cache eviction and size limit in JIT engine

3a7d7c9

Implement parallel shader compilation and safe fallback for missing s…

3731d2b

…haders

Optimize shader state changes: avoid redundant SetupBatch calls by tr…

9fbbb68

…acking VS config hash

more optimizations

f8d1586

Reduce shader permutations: mask out fog, alpha test, blend, and logi…

1c2a141

…c op config fields when disabled

Optimize texture state changes: avoid redundant texture and sampler b…

721056c

…inds in OpenGL rasterizer

fix opengl

75d4faa

Optimize shader generation: reduce string formatting and concatenatio…

52047cd

…n overhead in TEV stage emission

remove debug output

6422ce3

Fix: Block Until Shader Compilation Completes to Prevent Glitches

b0c1eff

Remove allowing constant stage alpha & color sources to fix graphical…

e112b7f

… issues

Use fmt for string concat in glsl_fs_shader_gen.cpp

6e090f4

Move Shader JIT Multithreading to a Separate Branch

f831d9e

OpenSauce04 force-pushed the LM2-Optimized branch from ab87c11 to f831d9e Compare June 9, 2025 18:39

video_core/shader: Optimize fragment shader by skipping passthrough TEV stages #1029

Are you sure you want to change the base?

video_core/shader: Optimize fragment shader by skipping passthrough TEV stages #1029

Conversation

jbm11208 commented May 10, 2025

Uh oh!

PabloMK7 commented May 11, 2025

Uh oh!

Algiuxs commented May 11, 2025

Uh oh!

jbm11208 commented May 11, 2025

Uh oh!

jbm11208 commented May 11, 2025

Uh oh!

jbm11208 commented May 12, 2025

Uh oh!

jbm11208 commented May 16, 2025

Uh oh!

PabloMK7 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PabloMK7 May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PabloMK7 commented May 19, 2025

Uh oh!

jbm11208 commented May 19, 2025

Uh oh!

OpenSauce04 commented May 26, 2025

Uh oh!

jbm11208 commented May 26, 2025

Uh oh!

This comment was marked as off-topic.

This comment was marked as off-topic.

Uh oh!

PabloMK7 May 19, 2025 •

edited

Loading