Skip to content

Commit a60b578

Browse files
Copilotwsmoses
andauthored
Extend SimpleGVN to support load-load forwarding (#2625)
* Initial plan * Implement load-load forwarding in SimpleGVN Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Apply clang-format to SimpleGVN.cpp Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Remove unused next_argument label Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Update SimpleGVN README to document load-load forwarding Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Combine dominatesAndCovers functions to avoid redundancy Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Refactor collectMemoryOps per review feedback - Use pointer parameter instead of bool for Calls to indicate if nocapture calls should be collected - Iterate over uses instead of users to get argument index directly - Use isNoCapture utility function from Utils.h - Avoid recollecting in second phase by tracking eliminated loads Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Change Calls parameter from pointer to reference with sentinel pattern - Changed Calls from pointer to reference parameter in collectMemoryOps - Use sentinel value (nullptr CallInst) to distinguish between modes - Empty Calls on entry = reject nocapture calls (store-load forwarding) - Non-empty Calls on entry = collect nocapture calls (load-load forwarding) - Eliminates need for separate boolean parameter Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Remove unnecessary sentinel pattern for nocapture call collection Instead of using sentinel values and calling collectMemoryOps again, directly traverse uses to collect nocapture calls in the second phase. This simplifies the code and avoids the complexity of the sentinel pattern. Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * Refactor aliasing check into common helper and improve tests - Extract hasAliasingWriteBetween() helper to check for stores/calls between instructions - Reuse this helper in both store-load and load-load forwarding - Update load_load_no_nocapture_call.ll: place both loads after call to properly test no forwarding - Update load_load_with_store_between.ll: demonstrate both load-load forwarding before and after store Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> * fix * fix * fix * fix * fix --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: wsmoses <1260124+wsmoses@users.noreply.github.com> Co-authored-by: William S. Moses <gh@wsmoses.com>
1 parent 54eda9b commit a60b578

File tree

10 files changed

+508
-185
lines changed

10 files changed

+508
-185
lines changed

enzyme/Enzyme/SimpleGVN.cpp

Lines changed: 282 additions & 175 deletions
Large diffs are not rendered by default.

enzyme/enzyme

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
./enzyme

enzyme/test/Enzyme/SimpleGVN/README.md

Lines changed: 50 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,41 @@ SimpleGVN is a GVN-like (Global Value Numbering) optimization pass that forwards
88

99
## How It Works
1010

11-
The pass:
12-
1. Identifies function arguments with both `noalias` and `nocapture` attributes
13-
2. Verifies all uses are exclusively loads, stores, or GEP instructions
11+
The pass operates in two phases:
12+
13+
### Phase 1: Store-to-Load Forwarding
14+
1. Identifies function arguments with both `noalias` and `nocapture` attributes, and allocas
15+
2. Verifies all uses are exclusively loads, stores, GEP instructions, or casts
1416
3. For each load, finds dominating stores that cover the load's memory range
1517
4. Replaces the load with the stored value if no aliasing store exists in between
1618

19+
### Phase 2: Load-to-Load Forwarding
20+
1. Re-collects loads and stores, this time allowing nocapture function calls
21+
2. For each load, finds dominating loads that cover the same memory range
22+
3. Replaces the load with the value from the dominating load if:
23+
- No aliasing store exists between the two loads
24+
- No nocapture function call exists between the two loads (as they may modify memory)
25+
1726
## Test Cases
1827

28+
### Store-to-Load Forwarding Tests
1929
- **basic.ll** - Simple store-to-load forwarding
2030
- **offset.ll** - Forwarding with GEP offsets
2131
- **dominance.ll** - Verifies dominance requirements
2232
- **intermediate_store.ll** - Handles intermediate stores correctly
2333
- **no_noalias.ll** - Rejects optimization when noalias is missing
24-
- **call_use.ll** - Rejects when argument has non-memory uses
34+
- **call_use.ll** - Rejects when argument has non-memory uses (non-nocapture calls)
2535
- **struct_field.ll** - Handles struct field accesses
2636
- **type_conversion.ll** - Tests byte-level extraction
2737
- **comprehensive.ll** - Multiple loads/stores at different offsets
2838

39+
### Load-to-Load Forwarding Tests
40+
- **load_load_basic.ll** - Simple load-to-load forwarding
41+
- **load_load_offset.ll** - Load-to-load forwarding with GEP offsets
42+
- **load_load_nocapture_call.ll** - No forwarding when nocapture call exists between loads
43+
- **load_load_no_nocapture_call.ll** - Optimization disabled when call lacks nocapture attribute
44+
- **load_load_with_store_between.ll** - No load-to-load forwarding when store exists between loads
45+
2946
## Running the Tests
3047

3148
Using opt with the new pass manager:
@@ -38,21 +55,44 @@ Using opt with the legacy pass manager (LLVM < 16):
3855
opt -load LLVMEnzyme-18.so -simple-gvn -S < test.ll
3956
```
4057

41-
## Example
58+
## Examples
59+
60+
### Store-to-Load Forwarding Example
4261

4362
Input:
4463
```llvm
45-
define i32 @foo(i32* noalias nocapture %ptr) {
46-
store i32 42, i32* %ptr
47-
%v = load i32, i32* %ptr
64+
define i32 @foo(ptr noalias nocapture %ptr) {
65+
store i32 42, ptr %ptr
66+
%v = load i32, ptr %ptr
4867
ret i32 %v
4968
}
5069
```
5170

5271
Output after SimpleGVN:
5372
```llvm
54-
define i32 @foo(i32* noalias nocapture %ptr) {
55-
store i32 42, i32* %ptr
73+
define i32 @foo(ptr noalias nocapture %ptr) {
74+
store i32 42, ptr %ptr
5675
ret i32 42
5776
}
5877
```
78+
79+
### Load-to-Load Forwarding Example
80+
81+
Input:
82+
```llvm
83+
define i32 @bar(ptr noalias nocapture %ptr) {
84+
%v1 = load i32, ptr %ptr
85+
%v2 = load i32, ptr %ptr
86+
%sum = add i32 %v1, %v2
87+
ret i32 %sum
88+
}
89+
```
90+
91+
Output after SimpleGVN:
92+
```llvm
93+
define i32 @bar(ptr noalias nocapture %ptr) {
94+
%v1 = load i32, ptr %ptr
95+
%sum = add i32 %v1, %v1
96+
ret i32 %sum
97+
}
98+
```
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
; RUN: if [ %llvmver -lt 16 ]; then %opt < %s %loadEnzyme -simple-gvn -S | FileCheck %s; fi
2+
; RUN: %opt < %s %newLoadEnzyme -passes="simple-gvn" -S | FileCheck %s
3+
4+
declare void @julia___conv_filter__271_37475({ { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }* noalias nocapture nofree noundef nonnull writeonly sret({ { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }) align 8 dereferenceable(72) %0);
5+
6+
; Function Attrs: noinline
7+
define private {} addrspace(10)* @julia__conv_filter__37469([1 x {} addrspace(10)*]* %return_roots) {
8+
top:
9+
%0 = alloca { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }, align 8
10+
call fastcc void @julia___conv_filter__271_37475({ { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }* noalias nocapture nofree noundef nonnull writeonly sret({ { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }) align 8 dereferenceable(72) %0)
11+
%a5 = getelementptr inbounds { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }, { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }* %0, i64 0, i32 0, i32 0
12+
%a6 = load {} addrspace(10)*, {} addrspace(10)** %a5, align 8
13+
%a7 = getelementptr inbounds [1 x {} addrspace(10)*], [1 x {} addrspace(10)*]* %return_roots, i64 0, i64 0
14+
store {} addrspace(10)* %a6, {} addrspace(10)** %a7, align 8
15+
%srcloccs2 = getelementptr inbounds { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }, { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }* %0, i64 0, i32 0, i32 0
16+
%a8 = load {} addrspace(10)*, {} addrspace(10)** %srcloccs2, align 8
17+
ret {} addrspace(10)* %a8
18+
}
19+
20+
; CHECK: define private {} addrspace(10)* @julia__conv_filter__37469([1 x {} addrspace(10)*]* %return_roots)
21+
; CHECK-NEXT: top:
22+
; CHECK-NEXT: %0 = alloca { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }, align 8
23+
; CHECK-NEXT: call fastcc void @julia___conv_filter__271_37475({ { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }* noalias nocapture nofree noundef nonnull writeonly sret({ { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }) align 8 dereferenceable(72) %0)
24+
; CHECK-NEXT: %a5 = getelementptr inbounds { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }, { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }* %0, i64 0, i32 0, i32 0
25+
; CHECK-NEXT: %a6 = load {} addrspace(10)*, {} addrspace(10)** %a5, align 8
26+
; CHECK-NEXT: %a7 = getelementptr inbounds [1 x {} addrspace(10)*], [1 x {} addrspace(10)*]* %return_roots, i64 0, i64 0
27+
; CHECK-NEXT: store {} addrspace(10)* %a6, {} addrspace(10)** %a7, align 8
28+
; CHECK-NEXT: %srcloccs2 = getelementptr inbounds { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }, { { {} addrspace(10)*, [1 x [2 x i64]], i64, i64 }, [4 x i64] }* %0, i64 0, i32 0, i32 0
29+
; CHECK-NEXT: ret {} addrspace(10)* %a6
30+
; CHECK-NEXT: }
31+
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
; RUN: if [ %llvmver -lt 16 ]; then %opt < %s %loadEnzyme -opaque-pointers -simple-gvn -S | FileCheck %s; fi
2+
; RUN: %opt < %s %newLoadEnzyme -opaque-pointers -passes="simple-gvn" -S | FileCheck %s
3+
4+
; Test basic load-to-load forwarding with noalias nocapture argument
5+
6+
define i32 @test_load_load_basic(ptr noalias nocapture %ptr) {
7+
entry:
8+
%val1 = load i32, ptr %ptr, align 4
9+
%val2 = load i32, ptr %ptr, align 4
10+
%sum = add i32 %val1, %val2
11+
ret i32 %sum
12+
}
13+
14+
; CHECK: define i32 @test_load_load_basic(ptr noalias nocapture %ptr)
15+
; CHECK-NEXT: entry:
16+
; CHECK-NEXT: %val1 = load i32, ptr %ptr, align 4
17+
; CHECK-NEXT: %sum = add i32 %val1, %val1
18+
; CHECK-NEXT: ret i32 %sum
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
; RUN: if [ %llvmver -lt 16 ]; then %opt < %s %loadEnzyme -opaque-pointers -simple-gvn -S | FileCheck %s; fi
2+
; RUN: %opt < %s %newLoadEnzyme -opaque-pointers -passes="simple-gvn" -S | FileCheck %s
3+
4+
; Test that optimization is NOT applied when call does not have nocapture attribute
5+
; and that even with nocapture, we don't forward if there's an intermediate call
6+
7+
declare void @external_func(ptr)
8+
9+
define i32 @test_load_load_no_nocapture_call(ptr noalias nocapture %ptr) {
10+
entry:
11+
call void @external_func(ptr %ptr)
12+
%val1 = load i32, ptr %ptr, align 4
13+
%val2 = load i32, ptr %ptr, align 4
14+
%sum = add i32 %val1, %val2
15+
ret i32 %sum
16+
}
17+
18+
; CHECK: define i32 @test_load_load_no_nocapture_call(ptr noalias nocapture %ptr)
19+
; CHECK-NEXT: entry:
20+
; CHECK-NEXT: call void @external_func(ptr %ptr)
21+
; CHECK-NEXT: %val1 = load i32, ptr %ptr, align 4
22+
; CHECK-NEXT: %val2 = load i32, ptr %ptr, align 4
23+
; CHECK-NEXT: %sum = add i32 %val1, %val2
24+
; CHECK-NEXT: ret i32 %sum
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
; RUN: if [ %llvmver -lt 16 ]; then %opt < %s %loadEnzyme -opaque-pointers -simple-gvn -S | FileCheck %s; fi
2+
; RUN: %opt < %s %newLoadEnzyme -opaque-pointers -passes="simple-gvn" -S | FileCheck %s
3+
4+
; Test load-to-load forwarding with nocapture function call between loads
5+
; The nocapture call should prevent forwarding
6+
7+
declare void @nocapture_func(ptr nocapture)
8+
9+
define i32 @test_load_load_nocapture_call(ptr noalias nocapture %ptr) {
10+
entry:
11+
%val1 = load i32, ptr %ptr, align 4
12+
call void @nocapture_func(ptr nocapture %ptr)
13+
%val2 = load i32, ptr %ptr, align 4
14+
%sum = add i32 %val1, %val2
15+
ret i32 %sum
16+
}
17+
18+
; CHECK: define i32 @test_load_load_nocapture_call(ptr noalias nocapture %ptr)
19+
; CHECK-NEXT: entry:
20+
; CHECK-NEXT: %val1 = load i32, ptr %ptr, align 4
21+
; CHECK-NEXT: call void @nocapture_func(ptr nocapture %ptr)
22+
; CHECK-NEXT: %val2 = load i32, ptr %ptr, align 4
23+
; CHECK-NEXT: %sum = add i32 %val1, %val2
24+
; CHECK-NEXT: ret i32 %sum
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
; RUN: if [ %llvmver -lt 16 ]; then %opt < %s %loadEnzyme -opaque-pointers -simple-gvn -S | FileCheck %s; fi
2+
; RUN: %opt < %s %newLoadEnzyme -opaque-pointers -passes="simple-gvn" -S | FileCheck %s
3+
4+
; Test load-to-load forwarding with GEP offsets
5+
6+
define i32 @test_load_load_offset(ptr noalias nocapture %ptr) {
7+
entry:
8+
%gep = getelementptr i32, ptr %ptr, i64 1
9+
%val1 = load i32, ptr %gep, align 4
10+
%val2 = load i32, ptr %gep, align 4
11+
%sum = add i32 %val1, %val2
12+
ret i32 %sum
13+
}
14+
15+
; CHECK: define i32 @test_load_load_offset(ptr noalias nocapture %ptr)
16+
; CHECK-NEXT: entry:
17+
; CHECK-NEXT: %gep = getelementptr i32, ptr %ptr, i64 1
18+
; CHECK-NEXT: %val1 = load i32, ptr %gep, align 4
19+
; CHECK-NEXT: %sum = add i32 %val1, %val1
20+
; CHECK-NEXT: ret i32 %sum
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
; RUN: if [ %llvmver -lt 16 ]; then %opt < %s %loadEnzyme -opaque-pointers -simple-gvn -S | FileCheck %s; fi
2+
; RUN: %opt < %s %newLoadEnzyme -opaque-pointers -passes="simple-gvn" -S | FileCheck %s
3+
4+
; Test that load-to-load forwarding does not happen when there's a store between loads
5+
; But does happen after the store when there's no intervening write
6+
7+
define i32 @test_load_load_with_store_between(ptr noalias nocapture %ptr) {
8+
entry:
9+
%val1 = load i32, ptr %ptr, align 4
10+
%val2 = load i32, ptr %ptr, align 4
11+
store i32 %val1, ptr %ptr, align 4
12+
%val3 = load i32, ptr %ptr, align 4
13+
%val4 = load i32, ptr %ptr, align 4
14+
%sum1 = add i32 %val1, %val2
15+
%sum2 = add i32 %sum1, %val3
16+
%sum3 = add i32 %sum2, %val4
17+
ret i32 %sum3
18+
}
19+
20+
; val1 and val2 should be forwarded (load-load before store)
21+
; val3 and val4 should both be forwarded from the store (store-load forwarding)
22+
; CHECK: define i32 @test_load_load_with_store_between(ptr noalias nocapture %ptr)
23+
; CHECK-NEXT: entry:
24+
; CHECK-NEXT: %val1 = load i32, ptr %ptr, align 4
25+
; CHECK-NEXT: store i32 %val1, ptr %ptr, align 4
26+
; CHECK-NEXT: %sum1 = add i32 %val1, %val1
27+
; CHECK-NEXT: %sum2 = add i32 %sum1, %val1
28+
; CHECK-NEXT: %sum3 = add i32 %sum2, %val1
29+
; CHECK-NEXT: ret i32 %sum3
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
; RUN: if [ %llvmver -lt 16 ]; then %opt < %s %loadEnzyme -opaque-pointers -simple-gvn -S | FileCheck %s; fi
2+
; RUN: %opt < %s %newLoadEnzyme -opaque-pointers -passes="simple-gvn" -S | FileCheck %s
3+
4+
; Test load-to-load forwarding with nocapture function call between loads
5+
; The nocapture call should prevent forwarding
6+
7+
declare void @nocapture_func(ptr nocapture)
8+
9+
define i32 @test_load_load_nocapture_call(ptr noalias nocapture %ptr) {
10+
entry:
11+
store i32 0, ptr %ptr, align 4
12+
br label %next
13+
14+
next:
15+
call void @nocapture_func(ptr nocapture %ptr)
16+
%val2 = load i32, ptr %ptr, align 4
17+
ret i32 %val2
18+
}
19+
20+
; CHECK: define i32 @test_load_load_nocapture_call(ptr noalias nocapture %ptr)
21+
; CHECK-NEXT: entry:
22+
; CHECK-NEXT: store i32 0, ptr %ptr, align 4
23+
; CHECK-NEXT: br label %next
24+
25+
; CHECK: next: ; preds = %entry
26+
; CHECK-NEXT: call void @nocapture_func(ptr nocapture %ptr)
27+
; CHECK-NEXT: %val2 = load i32, ptr %ptr, align 4
28+
; CHECK-NEXT: ret i32 %val2
29+
; CHECK-NEXT: }

0 commit comments

Comments
 (0)