-
Notifications
You must be signed in to change notification settings - Fork 49
Annotate liveness pass (+ reuseLDS pass changes) #2015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
What quickly draws my attention is that Initially that also makes me think is not a good idea. I'd rather have rock.alloc/rock.dealloc and rock.live_in/rock.live_out; the former ops are for memory management, and the latter serve as metadata actually. However the point here is that To me it looks like it was a mistake to call it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new liveness annotation system to enable attention pipelining of the outer loop by replacing the previous rock.dealloc
approach with rock.live_in
and rock.live_out
annotations, allowing for multiple liveness ranges per buffer allocation.
- Replaces
rock.dealloc
withrock.live_in
/rock.live_out
for more flexible LDS memory management - Adds a new
AnnotateLiveness
pass that automatically detects and marks buffer liveness based on write/read patterns - Updates the
ReuseLDS
pass to work with the new liveness annotations and handle multiple liveness ranges per buffer
Reviewed Changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 11 comments.
Show a summary per file
File | Description |
---|---|
mlir/lib/Dialect/Rock/Transforms/AnnotateLiveness.cpp | New pass implementation for automatic liveness annotation |
mlir/lib/Dialect/Rock/Transforms/ReuseLDS.cpp | Major refactor to use new liveness annotations and interference graph analysis |
mlir/include/mlir/Dialect/Rock/IR/RockOps.td | Defines new rock.live_in and rock.live_out operations |
mlir/test/Dialect/Rock/lowering_reuse_lds.mlir | Updated test cases to use new liveness annotations |
mlir/test/Dialect/Rock/lowering_annotate_liveness.mlir | New test file for the liveness annotation pass |
Multiple test files | Removal of rock.dealloc calls throughout existing tests |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
// Annotate lifetime of memory allocation on GPU memory hierachy. | ||
def Rock_GpuDeallocOp: | ||
Rock_Op<"dealloc", [MemoryEffects<[MemFree<DefaultResource>]>]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like rock.dealloc
annotated the memref input arg with [MemoryEffects<[MemFree<DefaultResource>]
. Based on Pablo's earlier comment, he mentioned that we were never using this for actual deallocations, but is there a chance that the presence of this annotation was leading some community passes actually treating this like it was doing deallocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because we got rid of rock.dealloc in RockToGPU.cpp (see MIGPUDeallocRewritePattern), which is the last pass of buildKernelPipeline().
return emitError("The size of rock.alloc should be greather than zero."); | ||
} | ||
|
||
//===-----------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want verifier ops for the new LiveIn and LiveOut ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
funcPm.addPass(rock::createRockAnnotateLivenessPass()); | ||
funcPm.addPass(rock::createRockReuseLDSPass()); | ||
funcPm.addPass(rock::createRockOutputSwizzlePass()); | ||
funcPm.addPass(rock::createRockAnnotateLivenessPass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining why we need to call RockAnnotateLiveness
and RockReuseLDS
again after running RockOutputSwizzle
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add the following comments:
// We run reuse LDS before the output swizzle pass because it uses a heuristic to determine whether to swizzle or not, and that heuristic needs the actual LDS usage.
// After running output swizzle, we'll create a new LDS buffer and we need to run reuse LDS again to be able to reuse LDS memory.
// (outside the loop). This would be incorrect because the buffer is alive for | ||
// the whole loop. However, in practise, this is not a problem because if there | ||
// are any interferences they will also happen in the epilogue and prologue. | ||
// This might need to get improved if changes to pipelining are made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth filing a case for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I'd say if it's not a lot of work, we could fix it. Because this might happen in the future if we change the pipeline we currently use. But if it's a lot of work it might not be worth it...
} | ||
|
||
// Annotate LDS buffer usage based on the following assumptions: | ||
// 1. Liveness range is determined by a pattern of write(), then read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen right now if the number of writes
and reads
does not match? Should we catch that error in this pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's ok if they don't match. You can have write(buffer), write(buffer), read(buffer). I guess what you mean is something like: write(buffer), write(buffer), read(buffer), write(buffer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should fail for those cases, see line 206: "Found a non closed read-write pattern"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was the case that I was thinking of. Can you add a LIT test for the failing case as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
acd8802
to
f2299cc
Compare
…DS). Also update ReuseLDS pass accordingly.
f7e3311
to
4b9228f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
@@ -0,0 +1,305 @@ | |||
//===- AnnotateLiveness - MLIR Rock ops lowering passes -----===// | |||
// | |||
// Copyright 2025 The MLIR Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The MLIR authors?
} | ||
|
||
// Annotate LDS buffer usage based on the following assumptions: | ||
// 1. Liveness range is determined by a pattern of write(), then read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Liveness range is determined by a pattern of write(), then read()
is difficult to understand. What about:
1. Liveness range is determined by a pattern of one or more write() ops, and then one or more read() ops. In other words, there cannot be a write() after a read().
// read([0, 1, 2]). | ||
// clang-format on | ||
// | ||
// Where write(buffer, indices, data), read(indices), alloc(size). We would be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit pedantic but I would prefer read(buffer, indices)
instead of read(indices
)
func::FuncOp func = getOperation(); | ||
|
||
// Only run this pass on GPU kernel functions. | ||
if (!func->hasAttr("kernel")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!func->hasAttr("kernel")) {
LLVM_DEBUG(llvm::dbgs() << "Skipping RockAnnotateLivenessPass on func with no kernel attribute";
return;
}
@@ -0,0 +1,142 @@ | |||
// RUN: sed s/##TOKEN_ARCH##/%arch/g %s | rocmlir-opt -rock-annotate-liveness | FileCheck %s | |||
|
|||
#wg = #gpu.address_space<workgroup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Either use #wg
everywhere or use #gpu.address_space<workgroup>
everywhere
//===-----------------------------------------------------===// | ||
|
||
LogicalResult GpuDeallocOp::verify() { | ||
LogicalResult LiveInOp::verify() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If live_in
/live_out
targets only LDS memory, this is a good moment to check if the GpuAllocOp
is LDS or not.
@@ -0,0 +1,47 @@ | |||
// RUN: sed s/##TOKEN_ARCH##/%arch/g %s | rocmlir-opt -rock-annotate-liveness -verify-diagnostics | |||
|
|||
#wg = #gpu.address_space<workgroup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same here, let's use either #wg
or #gpu.address_space<workgroup>
below
// RUN: sed s/##TOKEN_ARCH##/%arch/g %s | rocmlir-opt -rock-annotate-liveness -verify-diagnostics | ||
|
||
#wg = #gpu.address_space<workgroup> | ||
#priv = #gpu.address_space<private> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: #priv
is not used
bool hasRead = lastRead != nullptr; | ||
bool hasWrite = currentWrite != nullptr; | ||
if (hasRead != hasWrite) { | ||
return buffer->emitError("Found a non closed read-write pattern"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this. Let's take non_closed_read_write_pattern
as an example.
Yes there is a write on a buffer that is not read later, but we could have a valid program that does that, right? It would be cleaned by dce at some point.
I guess supporting that complicates the logic, but can't we treat the return
op as an implicit live.live_out
for all buffers?
// Update the last read (could be write, read, read, ... pattern) | ||
lastRead = op; | ||
if (!currentWrite) { | ||
return buffer->emitError("Read before write"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error makes sense, we are reading from LDS before writing to it. Because it makes no sense to read from an uninitialized buffer (i.e., if we have not written anything to it yet). But I wonder if this error string is appropriate. Actually, the error is not (any) "Read before write", but reading from a position of LDS that has never been written (e.g., we care about the first read before write) - not sure how to put it cleanly.
Motivation
In order to allow attention pipelining of the outer loop, we need to be able to annotate the liveness of LDS buffers. Currently, we do:
Note that we currently use rock.dealloc manually. However, for attention, we need to do:
So, after doing pipelining, we would end up with multiple calls to rock.dealloc. The main idea is that currently we have a single liveness range for each alloc, it starts when we call rock.alloc() and ends when we call rock.dealloc().
Technical Details
The solution of this PR is to introduce rock.live_in and rock.live_out (instead of rock.dealloc), then, decouple rock.alloc from liveness analysis. So, we can have multiple regions where a buffer is used and then not used, then used again.
Also, we introduce a new pass AnnotateLiveness that automatically marks the liveness of the buffers. The assumption we use to do this: there are blocks of write() then load() (or write(), write(), ... load(), load()). That block would be a liveness range, we would add rock.live_in before the first write() and rock.live_out after the last load().
See annotateLiveness() comment for more details about other assumptions.
Test Plan
Tests pass.
Test Result
All tests pass.
Submission Checklist