[SYCL][NFCI] Refactor device code split implementation once again #8833

AlexeySachkov · 2023-03-28T14:57:48Z

Apology for a not so small PR (or rather PR description?) in advance.
The PR is marked as NFCI, because no functional changes are intended, but I'm not 100% sure if there are corner-cases when behavior changes.

Intro

This is a refactoring of how we perform device code split in sycl-post-link, which is intended to solve several existing issues with the current implementation:

increased peak RAM consumption by sycl-post-link
bad scaling with more and more split "dimensions" being added
increased tests maintenance cost due to non-deterministic order (between commits) of output files produced by sycl-post-link

A bit more context about the issues above:

(1) Increase peak RAM consumption is caused by the fact that we currently preserve all splits in-memory, even though we can process them on-by-one and discard them as soon as we stored them to a disk. This was implemented as a memory consumption optimization in #5021, but it got accidentally reverted in #7302 as an attempt to workaround (2).

(2) is pretty much summarized in our source code:

llvm/llvm/tools/sycl-post-link/sycl-post-link.cpp

Lines 806 to 811 in afebb25

    
           // TODO this nested splitting scheme will not scale well when other split 
        
           // "dimensions" will be added. Some infra/"split manager" needs to be 
        
           // implemented in this case - e.g. all needed splitters are registered, then 
        
           // split manager applies them in the order added and runs needed tforms on the 
        
           // "leaf" ModuleDesc's resulted from splitting. Some bookkeeping is needed for 
        
           // ESIMD splitter to link back needed modules.

(3) is caused by a bad implementation decision made in #7302: because every split is now identified by a hash, every time you add a new split "dimension"/new feature to an account, it results in different hashes for existing tests. Just look how many unrelated tests had to be updated in #7512, #8056 and #8167

Now to the PR itself:

It introduces a new infrastructure for categorizing/grouping kernel functions: instead of using hashes, we now build a string description for each kernel function and then group kernels with the same description string together.

String description is built by a new entity: it accepts a set of rules, where each rule is a simple function which returns a string for passed llvm::Function. Results of all rules are concatenated together and rules are invoked in a stable order of their registration.

There is a simple API for building those rules. It provides some predefined rules for the most popular use cases like turning a function attribute or a metadata into a string descriptor for the function. There is also a possibility to pass a custom callback there to implement more complicated logic.

How does this PR help with issues above?

(1) and (2) are fixed in conjunction: sycl-post-link was refactored to avoid storing more than one split module at a time and that is possible because the PR unifies per-scope and optional-kernel-features splitters into a single generic splitter. The new API for kernels categorization seems to be flexible enough to provide that infrastructure so merged splitters still look OK code-wise.

(3) is caused by using string identifiers instead of hashes as well as by using a data structure which sorts identifiers.

Any other benefits from this PR?

About 50 lines of code less to support :)

Extending device code split for more optional features would be even easier than it is now: instead of adding several changes to various places around UsedOptionalFeatures structure, it will be just adding a 1-3 lines of code. Please also note that UsedOptionalFeatures contains tons of inconsistencies in its implementation, which will all gone with this PR: in operator== we don't use hash and instead compare certain fields directly (and we do miss some of them); generateModuleName method skips some of optional features and ignores them.

Cross-module device_global usages checks should now work at all split dimensions (except for ESIMD).

Any potential downsides?

With current UsedOptionalFeatures there is a possibility to embed various information (used aspects, large-grf flag, etc.) directly during device code split to avoid re-gathering that information later when we generate properties. With the suggested approach, it would be harder to do, because it doesn't seem to naturally fit to the proposed infrastructure: see changes I did around large-grf in this PR.

However, we have never actually implemented this and re-querying some metadata from function doesn't seem like a bottleneck, so it should really be a very minor and only theoretical downside.

…ric-module-splitter

llvm/tools/sycl-post-link/ModuleSplitter.cpp

sarnex

wow, doing it this way is so much simpler and easier to understand. there's so much less nonsense now. thanks for doing this!

llvm/tools/sycl-post-link/ModuleSplitter.cpp

llvm/tools/sycl-post-link/sycl-post-link.cpp

…ric-module-splitter

AlexeySachkov · 2023-04-20T14:34:12Z

@sarnex, @asudarsa, sorry for delay. I've rebased the PR on top of #8763 and it is now ready for review. Changes since last update:

863abed fixed a warning about improper sycl-post-link options being emitted even for valid cases
b80c1c4 is a merge commit. It is buildable, but LIT tests fail, because it essentially reverts all modification to module splitters which were done in [SYCL] Add support to propagate compile flags to device backend compiler #8763
adadb0d restores reverted functionality. It is a good example, which highlights:
** simplicity of extending device code split (just compare amount of changes made in ModuleSplitter.cpp with [SYCL] Add support to propagate compile flags to device backend compiler #8763);
** inability to propagate properties computed at device code split phase to sycl-post-link: we don't really "compute" them anymore. That's the same thing as with large-grf which is mentioned in PR description
** amount of changes in tests after extending a new splitter is now minimal: the order of output modules is stable again
9158ca8 fixes incorrect merge in one of tests
7ff7531 fixes comment from @sarnex

I would like you to take another look at the PR before I merge it, to review recent changes

sarnex

looks great to me, only nits, thanks a lot! looking forward to making use of this soon!

llvm/tools/sycl-post-link/ModuleSplitter.cpp

llvm/tools/sycl-post-link/sycl-post-link.cpp

llvm/tools/sycl-post-link/ModuleSplitter.cpp

AlexeySachkov · 2023-04-27T10:49:02Z

@sarnex, @asudarsa, hopefully this is now the final iteration and the patch will be ready for merge once CI passes.

I finally figured out the root cause of pre-commit failures. It turned out that #8763 (inadvertently, I presume) don't emit optLevel device image property when invoke_simd feature is involved. The property simply gets lost during merge of two modules produced by ESIMD splitter:

llvm/llvm/tools/sycl-post-link/ModuleSplitter.h

Lines 65 to 75 in 7bdbd59

    
           Properties merge(const Properties &Other) const { 
        
             Properties Res; 
        
             Res.HasESIMD = HasESIMD == Other.HasESIMD 
        
                                ? HasESIMD 
        
                                : SyclEsimdSplitStatus::SYCL_AND_ESIMD; 
        
             Res.UsesLargeGRF = UsesLargeGRF || Other.UsesLargeGRF; 
        
             // Scope remains global 
        
             // OptLevel is expected to be the same for both merging EPGs 
        
             assert(OptLevel == Other.OptLevel && "OptLevels are not same"); 
        
             return Res; 
        
           }

As you can see, we return Res which has OptLevel set to -1, and we never update its value from this or Other

llvm/llvm/tools/sycl-post-link/ModuleSplitter.h

Lines 60 to 61 in 7bdbd59

    
           // front-end opt level for kernel compilation 
        
           int OptLevel = -1;

In my PR, I "compute" the property after all splitting and merging is done, based on the actual content of the module, so the property gets set for modules containing invoke_simd.

There were two changes since last update:

d674eb2 - first attempt to fix the problem. It didn't help, but still useful: we should only look at entry points when "computing" the property, because we could have pulled-in some functions from other translation units to make a kernel self-contained.
0f49bfb - rather a hack, which essentially restores behavior implemented in [SYCL] Add support to propagate compile flags to device backend compiler #8763. It should fix pre-commit failures

My plan is the following:

get review from you and proceed with merge once the PR is accepted
submit an issue to IGC/NEO folks
submit a tracker to remove the hack inserted in 0f49bfb

Please let me know if there are questions or concerns. @asudarsa, it would be especially good to hear feedback from you, because the PR touches the work you recently did on propagating compilation options to backends.

asudarsa · 2023-04-27T13:27:27Z

llvm/tools/sycl-post-link/ModuleSplitter.cpp

+        ::sycl::kernel_props::ATTR_LARGE_GRF, "large-grf");
+    Categorizer.registerListOfIntegersInMetadataSortedRule("sycl_used_aspects");
+    Categorizer.registerListOfIntegersInMetadataRule("reqd_work_group_size");
+    Categorizer.registerSimpleStringAttributeRule(


Looks good. Thanks

asudarsa

Hi @AlexeySachkov

Overall looks good to me. Thanks

sarnex

new changes lgtm also, thanks.

in my experience invoke_simd is very sensitive to the environment, so im not surprised changing the optlevel causes an issue. dropping the flag and making a bug for the gpu people makes sense, ill email you who to assign it to

…ric-module-splitter

AlexeySachkov · 2023-04-28T07:52:16Z

Merge with sycl branch to properly restart CI: since I had regressions on L0 at some point, I don't want to merge this without making sure that I actually fixed them

sarnex · 2023-04-28T13:42:48Z

@AlexeySachkov Thanks again for doing this! I'm going to use this for some work I'm doing immediately!

AlexeySachkov added 13 commits February 28, 2023 07:56

WIP

3bb0d74

Merge remote-tracking branch 'origin/sycl' into private/asachkov/gene…

354be1a

…ric-module-splitter

start using new splitter

4ca6479

fixes

acc36d0

remaining fixes

8921214

remove dead code

74577a3

move some code around

0108a36

refactor using std::variant

0887e06

move some method definitions into class definition

a39bba1

use SmallString + tiny refactorings

eab7e02

Add some comments

f802bc2

cleanup some large-grf-related dead code

2c11625

a bit of clang-format

471967c

AlexeySachkov requested review from dm-vodopyanov, steffenlarsen, asudarsa and sarnex March 28, 2023 14:58

AlexeySachkov temporarily deployed to aws March 28, 2023 15:43 — with GitHub Actions Inactive

AlexeySachkov commented Mar 28, 2023

View reviewed changes

llvm/tools/sycl-post-link/ModuleSplitter.cpp Show resolved Hide resolved

sarnex approved these changes Mar 28, 2023

View reviewed changes

llvm/tools/sycl-post-link/ModuleSplitter.cpp Outdated Show resolved Hide resolved

llvm/tools/sycl-post-link/sycl-post-link.cpp Show resolved Hide resolved

Fixes to -ir-output-only flow

466e2df

AlexeySachkov temporarily deployed to aws March 29, 2023 12:43 — with GitHub Actions Inactive

AlexeySachkov added 5 commits March 31, 2023 05:29

Merge remote-tracking branch 'origin/sycl' into private/asachkov/gene…

619eef4

…ric-module-splitter

tiny sycl-post-link cleanup

a76e8d2

Some renamings

7384fe4

Refactoring

2a36701

clang-format

1528921

AlexeySachkov marked this pull request as ready for review March 31, 2023 10:06

AlexeySachkov requested a review from a team as a code owner March 31, 2023 10:06

AlexeySachkov changed the title ~~Refactor device code split implementation once again~~ [NFCI] Refactor device code split implementation once again Mar 31, 2023

Apply comments

7ff7531

Better solution for review comments

e4b452e

AlexeySachkov temporarily deployed to aws April 20, 2023 15:44 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to aws April 20, 2023 17:44 — with GitHub Actions Inactive

sarnex approved these changes Apr 20, 2023

View reviewed changes

AlexeySachkov added 2 commits April 21, 2023 03:41

Apply comments

d61e93e

Apply clang-format

ec37831

AlexeySachkov temporarily deployed to aws April 21, 2023 08:23 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to aws April 21, 2023 09:30 — with GitHub Actions Inactive

sarnex approved these changes Apr 21, 2023

View reviewed changes

Consider only entry points when emitting optLevel property

d674eb2

AlexeySachkov temporarily deployed to aws April 27, 2023 10:06 — with GitHub Actions Inactive

This (rather) hacky change should help fix regressions

0f49bfb

AlexeySachkov temporarily deployed to aws April 27, 2023 10:52 — with GitHub Actions Inactive

asudarsa reviewed Apr 27, 2023

View reviewed changes

asudarsa approved these changes Apr 27, 2023

View reviewed changes

sarnex approved these changes Apr 27, 2023

View reviewed changes

AlexeySachkov temporarily deployed to aws April 27, 2023 17:33 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to aws April 27, 2023 20:37 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to aws April 27, 2023 21:01 — with GitHub Actions Inactive

Merge remote-tracking branch 'origin/sycl' into private/asachkov/gene…

a4dc3d4

…ric-module-splitter

AlexeySachkov temporarily deployed to aws April 28, 2023 08:41 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to aws April 28, 2023 09:12 — with GitHub Actions Inactive

AlexeySachkov merged commit 67da385 into intel:sycl Apr 28, 2023

jzc mentioned this pull request May 18, 2023

[SYCL][CUDA][HIP] Throw a runtime error with invalid sub-group size to kernel #6103

Open

AlexeySachkov deleted the private/asachkov/generic-module-splitter branch May 22, 2024 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][NFCI] Refactor device code split implementation once again #8833

[SYCL][NFCI] Refactor device code split implementation once again #8833

Uh oh!

AlexeySachkov commented Mar 28, 2023 •

edited

Loading

Uh oh!

Uh oh!

sarnex left a comment

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Apr 20, 2023

Uh oh!

sarnex left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Apr 27, 2023

Uh oh!

asudarsa Apr 27, 2023

Uh oh!

asudarsa left a comment

Uh oh!

sarnex left a comment

Uh oh!

AlexeySachkov commented Apr 28, 2023

Uh oh!

sarnex commented Apr 28, 2023

Uh oh!

Uh oh!

	// TODO this nested splitting scheme will not scale well when other split
	// "dimensions" will be added. Some infra/"split manager" needs to be
	// implemented in this case - e.g. all needed splitters are registered, then
	// split manager applies them in the order added and runs needed tforms on the
	// "leaf" ModuleDesc's resulted from splitting. Some bookkeeping is needed for
	// ESIMD splitter to link back needed modules.

[SYCL][NFCI] Refactor device code split implementation once again #8833

[SYCL][NFCI] Refactor device code split implementation once again #8833

Uh oh!

Conversation

AlexeySachkov commented Mar 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Intro

A bit more context about the issues above:

Now to the PR itself:

How does this PR help with issues above?

Any other benefits from this PR?

Any potential downsides?

Uh oh!

Uh oh!

sarnex left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Apr 20, 2023

Uh oh!

sarnex left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Apr 27, 2023

Uh oh!

asudarsa Apr 27, 2023

Choose a reason for hiding this comment

Uh oh!

asudarsa left a comment

Choose a reason for hiding this comment

Uh oh!

sarnex left a comment

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov commented Apr 28, 2023

Uh oh!

sarnex commented Apr 28, 2023

Uh oh!

Uh oh!

AlexeySachkov commented Mar 28, 2023 •

edited

Loading