Getting better compile time performance on large shader libraries #9354
Replies: 2 comments
-
|
There's a lot of detail here, and it's hard for me (as a person who is not primarily a performance-oriented engineer) to tease apart what the take-away points are. That said, I'll try to lend my thoughts as somebody who has been involved with Slang since the start. Aside: I want to note that I greatly appreciate you setting up the closest thing possible to an "apples to apples" comparison between Why use Slang instead of XYZ, if Slang is slower?This isn't one of the questions you asked, but I think it's a question that looms around any comparison like this between tools. If all you have is what we might call "vanilla" HLSL code, and all you want is a command-line tool that you can invoke on your My opinion is that the main reasons to use the Slang compiler instead of something else are because either:
Ultimately those are just the same reason; they both amount to using the Slang compiler because of the unique things it enables you to do. Our main focus on the Slang project (since even back in the days when it was just a research effort) has been to show that GPU shading languages could be significantly better than what most developers were stuck using. If the DXC team announced tomorrow that their compiler now supports the Slang language, cross-compiles to all the targets the Slang compiler supports, and also has better performance? I'd actually consider that a win. All along, my personal goal has been to help get better tools in the hands of developers, and we only designed and implemented a new language to show people just how much better things could be. Why is the Slang compiler slower than DXC?I believe there are two main factors here. The Slang compiler supports the Slang language, not just HLSLEven if you are just using An (entirely justified) argument can be made that the Slang compiler should only make you "pay for what you use," so that when handing the Slang compiler vanilla HLSL it wouldn't need to turn on any of its more advanced features. That's a good aspirational goal but, in many cases, being able to support some of the features of Slang at all requires particular approaches to the compiler's design and architecture, such that they aren't just things we can turn on/off as needed. The Slang codebase is less mature than DXCThose of us working on Slang need to be honest about the fact that there are many quality of implementation (QOI) differences between the Slang codebase/compiler and DXC. DXC is quite simply a more mature codebase, and there are times when it shows. Slang has been in development for less time than DXC, and has had many fewer active contributors over the history of the project (on average). Moving the Slang project to open governance under Khronos has given us more attention and some new contributors, but catching up to other tools in terms of overall maturity will take time. It is also important to note that the DXC project leveraged a lot of pre-existing mature code, in clang and LLVM. The use of clang/LLVM gave DXC a foundation with many person-years of effort put into it, which includes a lot of thoughtful engineering work to optimize the performance of the compiler framework. Such technology choices have benefited some aspects of DXC, such as compilation performance, but they also represent trade-offs. At the start of the Slang project we considered whether to build on top of clang/LLVM, and made a conscious decision to follow a different path for our compiler architecture. The clang compiler architecture does not match well with the vision we had/have for the Slang language, and LLVM is (even now) not a good match for several of the compilation targets we wanted to support. The DXC team took the trade-off that let them (more) easily build a mature compiler for a less ambitious language. The Slang team took the trade-off that made it possible for us to build a compiler for a much more ambitious language at all. Can the performance gap be overcome?The question is phrased as a binary yes/no, but the answer (whatever it is) is likely to be much more subtle. One thing I can state with some confidence is that it would take less engineering effort to greatly improve the performance of the Slang compiler codebase than it would to make the DXC compiler codebase accept the Slang language. So if the goal is to close the gap by having a compiler for a language as powerful as Slang, but with compilation performance closer to DXC, we know which side of the gap to start from. Another thing that isn't explicit in the question, but that I'd like to make sure sees discussion (I hope that other Slang contributors will follow up here...) is the question of whether the performance gap should be overcome by:
These amount to the typical question of whether one should optimize the code for an existing algorithm, or pick a different/better algorithm, when trying to optimize. When code is coming from an existing engine/renderer, such as Unity's SRP system, the shaders will typically have been authored in many ways (large and small) around the design and limitations of the HLSL language and the DXC compiler. Sometimes it is possible to refactor such code to be closer to more idiomatic Slang-first designs, but it is not always easy to work around design choices that are deeply baked into the engine/renderer. Some quick thoughts on things that might be worth exploring (I'm saying this as somebody who doesn't primarily focus on performance analysis/optimization, so I really hope others from the Slang project can help with deeper insights: Command-Line
|
Beta Was this translation helpful? Give feedback.
-
|
Of course it's only after writing that giant post that I realize I should cross reference this with a specific performance-related issue that, for all I know, could be related to some of the counter-intuitive perf results seen in the OP of this discussion when trying to use pre-compiled modules: #9400 It doesn't immediately seem like that issue is related to the performance issues being observed here, but it also isn't something I'd want to rule out. Either way, it is an example of a performance concern raised by a user that we believe we can (and will) address. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm interested in exploring the feature set from slang with cross-api, modules and specialization constants being key areas of focus.
I have found that for the same hlsl shader, outputting SPIR-V, there's quite more time taken by slang for validation than DXC. I've tested various vertex and fragement shaders and measured between 4 and 20x more time taken by slangc than dxc.
This is of concern to me because these time differences can compound in codebases with very large shader libraries and particularly ones with large permutations of shader variants.
Real world shader Benchmark
To start, I grabbed the hlsl output (after custom precompiler) from a Unity URP Shader graph called
VegetationSSS. It is about 11 thousand lines long due to a couple dozen#includes of the SRP shader library. This is a very big shader but it also very common in URP.I ended up converting all the SRP library hlsl (which are usually
#include) to modules with specialization constants to give slang the best chance.I ran these both on a windows desktop and a Linux laptop which ended up being faster due to better IO and better single core performance. I tested their output to SPIR-V.
Specs
I ran a few different configurations:
import unity_srp_core_all;Looking at these measurements,
linkAndOptimizeIRandspecializeModulefare much better than I expected having put a couple dozen specialization constants and with Unity SRP Shader libraries, there is a lot of potential for saving there.DXC looks to be nearly 3-4x as fast as slang to compile the same fragment shader in the best case.
Moving code to modules greatly improves
compileInnerto the point of taking less time than dxc in the compilation step but shifts the time taken tocheckAllTranslationUnitsto the point of overally not saving much time at all.Perf analysis
Here are some of the results of the compilation of the full shadergraph hlsl to spirv on linux.
The first flamegraph is slang and the second is dxc.
I've highlighted the same call both do to the spirv library (spvtools::Optimizer::Run) after most of the AST is parsed.
On slang, this call takes less long, presumably thanks to the optimization slang does. However, from looking at this graph it looks like slang spends more time optimizing than the time difference with dxc.
Another thing to note is that in the DXC graph, the majority of the time is this call to spvtools whereas in slang, the AST building dominates the time taken.
Observations
What this test has shown me is that:
checkAllTranslationUnitsscaling up.Compared to DXC
I redid this test at a smaller scale with 1 thousand lines of dead code (attached) and even with just a dummy pixel shader returning white and no other code. What I see is that, ignoring the time spent loading the builtin module before
checkAllTranslationUnits, slang's runtime is very close to DXC's.Slang seems to do parse for all Translation Units followed by a check for all translation units. DXC does a parse and a DiagnoseTranslationUnit together one translation unit at a time.
I'm not sure if
DiagnoseTranslationUnitis equivalent tocheckTranslationUnitsandDiagnoseTranslationUnitis barely measurable in terms of performance.How to repro
Dummy pixel shader
The most basic shader takes
Slang Results
DXC Results
Pixel shader with dead code from Unity SRP Shader Library
This requires cloning https://github.com/Unity-Technologies/Graphics and compiling this shader at the root of that repo.
I've attached the output of the precompiler so you don't need to do the cloning.
test.preprocessed.hlsl.txt
Slang Results
DXC Results
Questions
I'm posting here because I'm curious why there is such a large difference in performance between the two compilers.
Is there a reason in the design of slang that causes such a discrepancy? Can it be overcome?
Has anyone had luck tuning slang to improve the validation performance?
Beta Was this translation helpful? Give feedback.
All reactions