Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Martien.exhaustiveswp #242

Merged
merged 15 commits into from
Dec 12, 2024
Merged

Martien.exhaustiveswp #242

merged 15 commits into from
Dec 12, 2024

Conversation

martien-de-jong
Copy link
Collaborator

@martien-de-jong martien-de-jong commented Nov 20, 2024

Set the baseline for GEMM and Conv2D_bf16
Add a provision to try multiple strategies for a single II.
Add a critical path heuristic

For best effect on QoR, the pre-pipeliner and the prescheduler should be skipped. This is forced by

#pragma clang loop pipeliner(disable)

in the source, or globally using --enable-pipeliner=0 --enable-misched=0

QOR:
I ran a like-for like comparison with the base aie-public version The CSV reporting failed, but the result json file shows significant overall improvements (up to >10%) with only a few minor sub-percent regressions

This is the tail of the diff. Trust me the regression head is tiny in comparison

image

N.NumPushedLatest = 0;
N.LastEarliestPusher = -1;
N.LastLatestPusher = -1;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters represents the path that pushed Earliest of the failing node beyond its modulo cycle in the previous iteration. If we don't reset it, we can use it to provide look-ahead in a rerun.

llvm/lib/Target/AIE/AIEPostPipeliner.cpp Outdated Show resolved Hide resolved
}
Scoreboard[C] |= Slots;
MII = std::max(MII, C + 1);
Counts += getSlotCounts(MI, TII);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: nop
Copy link
Collaborator

@andcarminati andcarminati Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original vst.conv's can be also included, because they can be used a baseline for the epilogue merging.

@martien-de-jong martien-de-jong force-pushed the martien.exhaustiveswp branch 2 times, most recently from f70cc9f to 9257966 Compare November 28, 2024 16:55
if (K < NInstr) {
N.Earliest = N.StaticEarliest;
N.Latest = N.StaticLatest;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that imply that nodes outside of the first iteration never get an Earliest or Latest? Should we make that explicit with an optional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They get a default 0, -1 from reset, which are sane values, not sentinels. Earliest of the second iteration is used to represent LCD latencies, but this is propagated back to the first iteration.

Copy link
Collaborator

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm done with the review, let me know if you want to discuss any of the comments :)

# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates

# RUN: llc --mtriple=aie2 -O2 --start-before=postmisched %s \
# RUN: --debug-only=postpipeliner-summary -o - | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of --debug-only=postpipeliner-summary in this test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

developer convenience. update_llc_tests shows it on stderr, and the last line reports the final II.

Martien de Jong added 4 commits December 11, 2024 12:03
Colors for discriminating edge types in dot graph
Include cycles neceessary for resources of ancesters/offspring
in Earliest/Latest
int NumPushedLatest = 0;

// Latest corrected by taking Earliest of an LCD successor into account
int LCDLatest = -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to have a symmetric LCDEarliest for bottom-up scheduling?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. That's an oversight.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess we would not have a strategy to make use of it anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Leaving it for now, but we should keep in mind that LcdLatest is asymmetric wrt TopDown now.

Copy link
Collaborator

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think this adds a lot of flexibility to the pipeliner and I'm sure we'll do great things with it 💪

@martien-de-jong martien-de-jong enabled auto-merge (rebase) December 12, 2024 15:15
@martien-de-jong martien-de-jong merged commit fcfddf3 into aie-public Dec 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants