Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FusionLoops based on MZ's LoopNest #233

Open
wants to merge 290 commits into
base: pytorch_fusion
Choose a base branch
from

Conversation

lly-zero-one
Copy link
Collaborator

No description provided.

ZolotukhinM and others added 27 commits February 21, 2020 11:58
…pper accessors to make the code more explicit. (pytorch#186)

* Remove wrapper function accessors from TensorNode: instead access function_'s members directly through function().

* Remove TensorNode class.

* Remove TensorOperationNode class.
* formatted guard elimination

* initial impl of symbolic shapes
…ytorch#191)

* Remove BaseStmtNode class.

* Use `const BaseExprNode*` instead of Expr in classes from ir.h

* Rename Expr->ExprHandler, Var->VarHandler, BaseExprNode->Expr, Variable->Var.

* Fixup CUDA build.

* Rename {Expr,Var}Handler to {Expr,Var}Handle.

* Fixup after rebase.
…tBinaryOp. (pytorch#192)

* Backport a clang-tidy fix: replace BINARY_ACCEPT with IRPrinter::visitBinaryOp.

* Make visitBinaryOp a local function rather than a method of IRPrinter.
…ts. (pytorch#194)

All `test_*` functions are now moved into a test-class (with no changes to them).
* Add rand benchmark.
* Add an option to disable texpr fuser.
…remainder (pytorch#198)

* Add the cast_float, backward ops and also fix the remainder

* fix the conflict

* change expr to exprhandle

* formatting

* fix the linter
* Fix some ir printer bugs

* also true_stmt
…rch#142)

* Enable axis splitting and GPU grid binding with variable shapes

* farwell ExprStmt, we hardly knew ye
* Add workflow.md.

* Remove the suggestions from the doc.

* Add language reference.

* Add language reference.

* Address some of the comments.
* Adding bitwise integer ops: &,^,<<, >>
* Add the cast_float, backward ops and also fix the remainder

fix the conflict

change expr to exprhandle

formatting

fix the linter

add the type_as support

* fix the threshold failure
* Aten op: where
This require a helper function which does promote types for the condition expression.
* LLVM codgen for fmod, remainder
* fix testATengeInt
…t way.

LoopNest is my attempt to simplify our core abstraction. The main idea
behind this change is to merge two classes: `TensorExprNode` and `For`
(derived from `Stmt`). Currently they represent basically the same
thing, but in a slightly different way. `TensorExprNode` attaches some
metadata and provides a different way for traversing through
siblings/parents/children. `For` represents the same structure, but
without any metadata. Once a kernel is lowered to `For` statements, they
are immediately consumed by a codegen, which lowers them to LLVMIR or
prints as a CUDA string.

This PR adds some functionality to `For` statements (and to other types
of statements as well) and implements `SplitWithTail` and
`ComputeInline` using only those.  The implementation is just a proof of
concept: it doesn't cover all corner cases, but they are trivial to add.

As a demo, I added a test where we create a simple tensor-expression,
then split one of the axis and then lower it to a Stmt. The demo shows
that we're producing exactly the same result.

For the reference, below is the output of the test (Root stmt - produced
by the new implementation, Ref stmt - the product of the existing one):
```
[ RUN      ] TensorExprTest.LoopNest_LLVM
Root stmt:
for (int n = 0; n < N; n++) {
  for (int i = 0; i < 1024; i++) {
    for (int j_outer = 0; j_outer < ((256 - 0) / 17); j_outer++) {
      for (int j_inner = 0; j_inner < 17; j_inner++) {
        g[(((n * (1024 * 256)) + (i * 256)) + (((j_outer * 17) + j_inner) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]);
      }
    }
    for (int j_tail = 0; j_tail < ((256 - 0) % 17); j_tail++) {
      g[(((n * (1024 * 256)) + (i * 256)) + ((j_tail + (((256 - 0) / 17) * 17)) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]);
    }
  }
}

Ref stmt:
for (int n = 0; n < N; n++) {
  for (int i = 0; i < 1024; i++) {
    for (int j_outer = 0; j_outer < ((256 - 0) / 17); j_outer++) {
      for (int j_inner = 0; j_inner < 17; j_inner++) {
        g[(((n * (1024 * 256)) + (i * 256)) + (((j_outer * 17) + j_inner) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]);
      }
    }
    for (int j_tail = 0; j_tail < ((256 - 0) % 17); j_tail++) {
      g[(((n * (1024 * 256)) + (i * 256)) + ((j_tail + (((256 - 0) / 17) * 17)) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]);
    }
  }
}
[       OK ] TensorExprTest.LoopNest_LLVM (3 ms)
```
@lly-zero-one lly-zero-one requested review from zheng-xq and removed request for zheng-xq March 2, 2020 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants