-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(dialects and transforms): csl stencil lowering #2747
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2747 +/- ##
==========================================
+ Coverage 89.79% 89.85% +0.05%
==========================================
Files 394 408 +14
Lines 48747 51021 +2274
Branches 7471 7912 +441
==========================================
+ Hits 43774 45843 +2069
- Misses 3793 3928 +135
- Partials 1180 1250 +70 ☔ View full report in Codecov by Sentry. |
func.func @gauss_seidel(%a : memref<1024x512xtensor<512xf32>>, %b : memref<1024x512xtensor<512xf32>>) { | ||
%0 = stencil.external_load %a : memref<1024x512xtensor<512xf32>> -> !stencil.field<[-1,1023]x[-1,511]xtensor<512xf32>> | ||
%1 = stencil.load %0 : !stencil.field<[-1,1023]x[-1,511]xtensor<512xf32>> -> !stencil.temp<[-1,1023]x[-1,511]xtensor<512xf32>> | ||
%2 = stencil.external_load %b : memref<1024x512xtensor<512xf32>> -> !stencil.field<[-1,1023]x[-1,511]xtensor<512xf32>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I missed something specific to the tensorized aspect of it, you could write the function as taking the !stencil.field
s directly here, they would get lowered to the memref you are converting from anyway 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always happy to work with as many kernels as I can get my hands on and any variety of representations we might want to support. IIRC, this particular one was provided by @mesham, the tensorisation pass applied to it should only rewrite types and accesses.
…#2766) Introduces an intermediate dialect with ops: * `csl_stencil.prefetch` to indicate prefetched buffer transfers * `csl_stencil.access` performs a `stencil.access` to a prefetched buffer The `stencil-to-csl-stencil` transform: * lowers `dmp.swap` to `csl_stencil.prefetch` * adds prefetched buffers to signature of `stencil.apply` * lowers `stencil.access` to `csl_stencil.access` iff they are accesses to prefetched buffers For a more detailed description see the document in #2747. This PR implements Step 1 outlined there. --------- Co-authored-by: n-io <[email protected]>
This PR implements the `csl_stencil.apply` op as outlined in Step 2 of #2747 This operation combines a `csl_stencil.prefetch` (symmetric buffer communication across a given stencil shape) with a `stencil.apply`. Please see the doc string of the op for a detailed description. --------- Co-authored-by: n-io <[email protected]>
This PR implements the conversion to csl_stencil.apply op as outlined in Step 2 of #2747 The `csl_stencil.apply` op combines a csl_stencil.prefetch (symmetric buffer communication across a given stencil shape) with a stencil.apply. The transformation consists of several steps: * When rewriting a `stencil.apply`, select the `csl_stencil.prefetch` with the biggest memory overhead (if several are present) to be fused with the apply op into a `csl_stencil.apply` * Find a suitable split of ops to be divided across the two code blocks * Re-order arithmetic e.g. `(a+b)+c -> (c+a)+b` to access, consume, and reduce data of neighbours (for input stencil only) * Move this into first code block, move all other ops into second code block * Fallback strategy: Move everything into first code block * Add `tensor.InsertSliceOp` to insert computed chunk into returned z-value tensor * Set up code block args * Move ops into new regions according to determined split * Translate arg usage to match new block args * Set up yield ops * Run tensor update shape on chunk region --------- Co-authored-by: n-io <[email protected]>
Is this branch still useful? |
This PR proposes a
csl_stencil
dialect that manages stencil accesses across the stencil pattern, including communication between PEs, with the goal of preparing the stencil dialect for lowering to CSL.