-
Notifications
You must be signed in to change notification settings - Fork 839
[Outlining] Remove overlapping sequences #7146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1a589b6 to
0d961b8
Compare
| bool operator<(const Interval& other) const { | ||
| return start < other.start && weight < other.weight; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take end into account as well. Otherwise the std::set<Interval> returned by IntervalProcessor::getOverlaps() will not be able to hold two intervals that differ only in their ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
|
|
||
| std::set<Interval> | ||
| IntervalProcessor::getOverlaps(std::vector<Interval>& intervals) { | ||
| std::sort(intervals.begin(), intervals.end(), [](Interval a, Interval b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::sort(intervals.begin(), intervals.end(), [](Interval a, Interval b) { | |
| std::sort(intervals.begin(), intervals.end(), [](const Interval& a, const Interval& b) { |
Just to avoid copying intervals around unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
| }); | ||
|
|
||
| std::set<Interval> overlaps; | ||
| auto& firstInterval = intervals[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an early return if the input vector is empty to avoid UB here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/hash-stringify-walker.cpp
Outdated
| for (auto startIdx : substring.StartIndices) { | ||
| auto interval = | ||
| Interval(startIdx, | ||
| startIdx + substring.Length - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised we're using intervals inclusive of their ends. Would this work without the - 1 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done
src/passes/hash-stringify-walker.cpp
Outdated
| auto interval = | ||
| Interval(startIdx, | ||
| startIdx + substring.Length - 1, | ||
| substring.Length * substring.StartIndices.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth a comment about why we are using this weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/hash-stringify-walker.cpp
Outdated
| std::set<Interval> overlaps = IntervalProcessor::getOverlaps(intervals); | ||
| std::set<unsigned> doNotInclude; | ||
| for (auto& interval : overlaps) { | ||
| doNotInclude.insert(intervalMap[interval]); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify the code here and get away without any map or set lookups if IntervalProcessor returned a sequence of kept indices in its input vector rather than a set of removed intervals. With a sequence of kept indices, we could directly construct the list of kept substrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea, thanks!
| ) | ||
| ) | ||
|
|
||
| ;; Test that no attempt is made to outline overlapping repeat substrings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add comments about what the overlapping substrings are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/lit/passes/outlining.wast
Outdated
| (drop (i32.add | ||
| (i32.const 0) | ||
| (i32.const 1) | ||
| )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make the test more concise and easy to read by just using constants and drops, unless I'm missing some reason why this wouldn't work.
(drop (i32.const 0))
(drop (i32.const 1))
(drop (i32.const 2))
(drop (i32.const 3))
(drop (i32.const 0))
(drop (i32.const 1))
(drop (i32.const 2))
(drop (i32.const 3))
(drop (i32.const 1))
(drop (i32.const 2))
(drop (i32.const 1))
(drop (i32.const 2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, thanks
src/support/intervals.h
Outdated
| }; | ||
|
|
||
| struct IntervalProcessor { | ||
| static std::set<Interval> getOverlaps(std::vector<Interval>&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add gTests for edge cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tlively
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be good to include gtest unit tests for the various kinds of overlaps.
src/passes/hash-stringify-walker.cpp
Outdated
| for (Index i = 0; i < substrings.size(); i++) { | ||
| auto substring = substrings[i]; | ||
| for (auto startIdx : substring.StartIndices) { | ||
| // TODO: This weight was picked with an assumption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/Outlining.cpp
Outdated
| DBG(printHashString(stringify.hashString, stringify.exprs)); | ||
| // Remove substrings that are substrings of longer repeat substrings. | ||
| substrings = StringifyProcessor::dedupe(substrings); | ||
| // Remove substrings with overlapping indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Remove substrings with overlapping indices | |
| // Remove substrings with overlapping indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| std::vector<Interval> intervals; | ||
| std::vector<int> substringIdxs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a comment saying how these two vectors relate to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/hash-stringify-walker.cpp
Outdated
| if (substringsIncluded.find(substringIdx) != substringsIncluded.end()) { | ||
| continue; | ||
| } | ||
| substringsIncluded.insert(substringIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (substringsIncluded.find(substringIdx) != substringsIncluded.end()) { | |
| continue; | |
| } | |
| substringsIncluded.insert(substringIdx); | |
| if (!substringsIncluded.insert(substringIdx)->second) { | |
| continue; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.h
Outdated
| }; | ||
|
|
||
| struct IntervalProcessor { | ||
| // TODO: Given a vector of Interval, returns a vector of the indices, mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the TODO here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was for me to review the comment before submitting
src/passes/hash-stringify-walker.cpp
Outdated
| if (seenCount[substringIdx] == substring.StartIndices.size() && | ||
| substringsIncluded.insert(substringIdx).second) { | ||
| result.push_back(substring); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're only considering a substring for outlining at all if all of its ocurrences survive overlap filtering? Could we keep the substring in consideration and just remove the particular occurrence of it that had the overlap instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
| std::sort( | ||
| intIntervals.begin(), intIntervals.end(), [](const auto& a, const auto& b) { | ||
| return a.first.start < b.first.end; | ||
| return a.first.start < b.first.start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the lambda here if you fix operator< to be a total order (meaning that for any pair of intervals a and b, exactly one of a < b,b < a, or a == b is true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
|
|
||
| std::vector<int> result; | ||
| auto& firstInterval = intIntervals[0]; | ||
| auto& formerInterval = intIntervals[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this a reference means that when you do formerInterval = latterInterval below, it writes to the first element in intIntervals, which is a little odd. Intervals should be small enough that copying them is cheap, so let's just make this a non-reference. Alternatively, to avoid copying intervals, you could make this an index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
| firstInterval = nextInterval; | ||
| } else { | ||
| result.push_back(firstInterval.second); | ||
| if (latterInterval.first.weight > formerInterval.first.weight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the weights are equal, perhaps you can choose to keep the interval with the nearest end to reduce its potential to overlap with subsequent intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.h
Outdated
| // back to the original input vector, of non-overlapping indices, ie, the | ||
| // intervals that overlap have already been removed. | ||
| // Given a vector of Interval, returns a vector of the indices that, mapping | ||
| // back to the original input vector, do not overlap with each other, ie: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // back to the original input vector, do not overlap with each other, ie: the | |
| // back to the original input vector, do not overlap with each other, i.e. the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batched suggestion
test/gtest/intervals.cpp
Outdated
| std::vector<Interval> intervals; | ||
| intervals.emplace_back(Interval{0, 4, 2}); | ||
| intervals.emplace_back(Interval{4, 8, 2}); | ||
| ASSERT_EQ(IntervalProcessor::filterOverlaps(intervals).size(), 2u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to test the precise results rather than just the size of the results. You can still do it with a single ASSERT_EQ:
std::vector<int> expected{0, 1};
ASSERT_EQ(IntervalProcessor::filterOverlaps(intervals), expected);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.h
Outdated
|
|
||
| struct IntervalProcessor { | ||
| // Given a vector of Interval, returns a vector of the indices that, mapping | ||
| // back to the original input vector, do not overlap with each other, ie: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjust punctuation around ie to , i.e.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/hash-stringify-walker.cpp
Outdated
|
|
||
| // Construct intervals | ||
| for (Index i = 0; i < substrings.size(); i++) { | ||
| auto substring = substrings[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid copying the list of start indices and whatever else is in there:
| auto substring = substrings[i]; | |
| auto& substring = substrings[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batched suggestion
src/passes/hash-stringify-walker.cpp
Outdated
| result.push_back(SuffixTree::RepeatedSubstring( | ||
| {substrings[i].Length, std::move(startIndices[i])})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be an emplace_back to really be sure we're not copying the vector of start indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/hash-stringify-walker.cpp
Outdated
| auto interval = | ||
| Interval(startIdx, startIdx + substring.Length, substring.Length); | ||
| intervals.push_back(interval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be slightly shorter using emplace_back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
| #include <iostream> | ||
| #include <optional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these are stale.
| #include <iostream> | |
| #include <optional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batched suggestion
src/support/intervals.cpp
Outdated
| // former overlaps with candidate | ||
| // replace former if the weights are the same but the candidate ends earlier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // former overlaps with candidate | |
| // replace former if the weights are the same but the candidate ends earlier | |
| // Former overlaps with candidate. Replace former if the weights are the same but the candidate ends earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batched suggestion
src/support/intervals.h
Outdated
| // back to the original input vector, of non-overlapping indices, ie, the | ||
| // intervals that overlap have already been removed. | ||
| // Given a vector of Interval, returns a vector of the indices that, mapping | ||
| // back to the original input vector, do not overlap with each other, ie: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like it!
test/gtest/intervals.cpp
Outdated
| ASSERT_EQ(IntervalProcessor::filterOverlaps(intervals), expected); | ||
| } | ||
|
|
||
| TEST(IntervalsTest, TestOverlapFoundA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best to give these more descriptive names like TestOverlapInnerNested, TestOverlapOuterNested, TestOverlapAscendingSequence, TestOverlapDescendingSequence, etc. Ideally the role each interval plays in a test should be clear from the name of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| ;; CHECK-NEXT: (drop | ||
| ;; CHECK-NEXT: (i32.const 1) | ||
| ;; CHECK-NEXT: ) | ||
| ;; CHECK-NEXT: (drop | ||
| ;; CHECK-NEXT: (i32.const 2) | ||
| ;; CHECK-NEXT: ) | ||
| ;; CHECK-NEXT: ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this substring isn't outlined at all, it's not clear to me that the test is behaving as expected. If we add more repetitions of this shorter substring, will it be outlined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-authored-by: Thomas Lively <[email protected]>
tlively
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with these last comments addressed.
| result.emplace_back(SuffixTree::RepeatedSubstring( | ||
| {substrings[i].Length, std::move(startIndices[i])})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| result.emplace_back(SuffixTree::RepeatedSubstring( | |
| {substrings[i].Length, std::move(startIndices[i])})); | |
| result.emplace_back(substrings[i].Length, std::move(startIndices[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I remove SuffixTree::RepeatedSubstring, I get the following error:
FAILED: src/passes/CMakeFiles/passes.dir/hash-stringify-walker.cpp.o /usr/bin/c++ -I/src -I/third_party/FP16/include -I/third_party/llvm-project/include -I -DBUILD_LLVM_DWARF -Wall -Werror -Wextra -Wno-unused-parameter -Wno-dangling-pointer -fno-omit-frame-pointer -fno-rtti -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -Wswitch -Wimplicit-fallthrough -Wnon-virtual-dtor -fPIC -fdiagnostics-color=always -g -g3 -std=c++17 -MD -MT src/passes/CMakeFiles/passes.dir/hash-stringify-walker.cpp.o -MF src/passes/CMakeFiles/passes.dir/hash-stringify-walker.cpp.o.d -o src/passes/CMakeFiles/passes.dir/hash-stringify-walker.cpp.o -c /src/passes/hash-stringify-walker.cpp /src/passes/hash-stringify-walker.cpp: In static member function ‘static std::vector<wasm::SuffixTree::RepeatedSubstring> wasm::StringifyProcessor::filterOverlaps(const std::vector<wasm::SuffixTree::RepeatedSubstring>&)’: /src/passes/hash-stringify-walker.cpp:184:26: error: no matching function for call to ‘std::vector<wasm::SuffixTree::RepeatedSubstring>::emplace_back(<brace-enclosed initializer list>)’ 184 | result.emplace_back( | ~~~~~~~~~~~~~~~~~~~^ 185 | {substrings[i].Length, std::move(startIndices[i])}); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/c++/14/vector:72, from /src/wasm.h:34, from /src/ir/boolean.h:20, from /src/ir/bits.h:20, from /src/ir/properties.h:20, from /src/ir/iteration.h:20, from /src/passes/stringify-walker.h:20, from /src/passes/hash-stringify-walker.cpp:17: /usr/include/c++/14/bits/vector.tcc:111:7: note: candidate: ‘std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {}; _Tp = wasm::SuffixTree::RepeatedSubstring; _Alloc = std::allocator<wasm::SuffixTree::RepeatedSubstring>; reference = wasm::SuffixTree::RepeatedSubstring&]’ 111 | vector<_Tp, _Alloc>:: | ^~~~~~~~~~~~~~~~~~~ /usr/include/c++/14/bits/vector.tcc:111:7: note: candidate expects 0 arguments, 1 provided At global scope: cc1plus: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics cc1plus: note: unrecognized command-line option ‘-Wno-implicit-int-float-conversion’ may have been intended to silence earlier diagnostics [3/18] Building CXX object src/passes/CMakeFiles/passes.dir/Outlining.cpp.o ninja: build stopped: subcommand failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no constructor provided on SuffixTree, so we'll keep as-is
test/gtest/intervals.cpp
Outdated
| intervals.emplace_back(Interval{12, 18, 3}); | ||
| intervals.emplace_back(Interval{14, 21, 5}); | ||
| intervals.emplace_back(Interval{23, 28, 5}); | ||
| std::vector<int> expected{4, 3, 7}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I reading correctly that this is keeping {6, 20, 15} at index 4 followed by {4, 11, 1} at index 3? That doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this expected vector was wrong and the test was failing. it is keeping {0, 4, 7}.
test/gtest/intervals.cpp
Outdated
|
|
||
| TEST(IntervalsTest, TestOne) { | ||
| std::vector<Interval> intervals; | ||
| intervals.emplace_back(Interval{0, 2, 5}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using emplace_back, we can just pass the arguments to the constructor without having to construct and then move a temporary object.
| intervals.emplace_back(Interval{0, 2, 5}); | |
| intervals.emplace_back(0, 2, 5); |
test/gtest/intervals.cpp
Outdated
| intervals.emplace_back(Interval{2, 4, 2}); | ||
| intervals.emplace_back(Interval{3, 6, 5}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the name of the test, I would have expected to see an interval nested fully within another interval.
tlively
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 🎉 🎉
While determining whether repeat sequences of instructions are candidates for outlining, remove sequences that overlap, giving weight to sequences that are longer and appear more frequently.