FusionLoops based on MZ's LoopNest #233

lly-zero-one · 2020-03-02T06:47:09Z

No description provided.

Enable Werror

…pper accessors to make the code more explicit. (pytorch#186) * Remove wrapper function accessors from TensorNode: instead access function_'s members directly through function(). * Remove TensorNode class. * Remove TensorOperationNode class.

* formatted guard elimination * initial impl of symbolic shapes

…ytorch#191) * Remove BaseStmtNode class. * Use `const BaseExprNode*` instead of Expr in classes from ir.h * Rename Expr->ExprHandler, Var->VarHandler, BaseExprNode->Expr, Variable->Var. * Fixup CUDA build. * Rename {Expr,Var}Handler to {Expr,Var}Handle. * Fixup after rebase.

…tBinaryOp. (pytorch#192) * Backport a clang-tidy fix: replace BINARY_ACCEPT with IRPrinter::visitBinaryOp. * Make visitBinaryOp a local function rather than a method of IRPrinter.

…ts. (pytorch#194) All `test_*` functions are now moved into a test-class (with no changes to them).

* Add rand benchmark. * Add an option to disable texpr fuser.

…remainder (pytorch#198) * Add the cast_float, backward ops and also fix the remainder * fix the conflict * change expr to exprhandle * formatting * fix the linter

* Fix some ir printer bugs * also true_stmt

…rch#142) * Enable axis splitting and GPU grid binding with variable shapes * farwell ExprStmt, we hardly knew ye

* Add workflow.md. * Remove the suggestions from the doc. * Add language reference. * Add language reference. * Address some of the comments.

* Adding bitwise integer ops: &,^,<<, >>

…ch#200)

* Add the cast_float, backward ops and also fix the remainder fix the conflict change expr to exprhandle formatting fix the linter add the type_as support * fix the threshold failure

* Aten op: where This require a helper function which does promote types for the condition expression.

* LLVM codgen for fmod, remainder

* fix testATengeInt

This reverts commit 5bf52fa.

The moved code wasnt changed.

…t way. LoopNest is my attempt to simplify our core abstraction. The main idea behind this change is to merge two classes: `TensorExprNode` and `For` (derived from `Stmt`). Currently they represent basically the same thing, but in a slightly different way. `TensorExprNode` attaches some metadata and provides a different way for traversing through siblings/parents/children. `For` represents the same structure, but without any metadata. Once a kernel is lowered to `For` statements, they are immediately consumed by a codegen, which lowers them to LLVMIR or prints as a CUDA string. This PR adds some functionality to `For` statements (and to other types of statements as well) and implements `SplitWithTail` and `ComputeInline` using only those. The implementation is just a proof of concept: it doesn't cover all corner cases, but they are trivial to add. As a demo, I added a test where we create a simple tensor-expression, then split one of the axis and then lower it to a Stmt. The demo shows that we're producing exactly the same result. For the reference, below is the output of the test (Root stmt - produced by the new implementation, Ref stmt - the product of the existing one): ``` [ RUN ] TensorExprTest.LoopNest_LLVM Root stmt: for (int n = 0; n < N; n++) { for (int i = 0; i < 1024; i++) { for (int j_outer = 0; j_outer < ((256 - 0) / 17); j_outer++) { for (int j_inner = 0; j_inner < 17; j_inner++) { g[(((n * (1024 * 256)) + (i * 256)) + (((j_outer * 17) + j_inner) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]); } } for (int j_tail = 0; j_tail < ((256 - 0) % 17); j_tail++) { g[(((n * (1024 * 256)) + (i * 256)) + ((j_tail + (((256 - 0) / 17) * 17)) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]); } } } Ref stmt: for (int n = 0; n < N; n++) { for (int i = 0; i < 1024; i++) { for (int j_outer = 0; j_outer < ((256 - 0) / 17); j_outer++) { for (int j_inner = 0; j_inner < 17; j_inner++) { g[(((n * (1024 * 256)) + (i * 256)) + (((j_outer * 17) + j_inner) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + ((j_outer * 17) + j_inner))]); } } for (int j_tail = 0; j_tail < ((256 - 0) % 17); j_tail++) { g[(((n * (1024 * 256)) + (i * 256)) + ((j_tail + (((256 - 0) / 17) * 17)) * 1))] = (((A[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))] + B[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + C[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]) + D[(((n * ((1 * 256) * 1024)) + (i * (1 * 256))) + (j_tail + (((256 - 0) / 17) * 17)))]); } } } [ OK ] TensorExprTest.LoopNest_LLVM (3 ms) ```

zheng-xq and others added 30 commits February 18, 2020 10:54

fix the schedule_test

5bde90f

fixed lowering

0a006eb

LLVM code generation for simple loops

d608a9d

bugfixes

ab35cdd

refcount fixing self-assignment

5b494cd

Make LOG(FATAL) nonreturn

b5f6794

Enable Werror

Adding statement conversion for SplitWithTail

0560717

Add a reference tests for Split

237fb35

clang-format

a436d64

A functinoal reference chck for schedule tests.

a1e1f28

clang-format

d603a2e

Add support for Float immediates.

fc296fc

Get absolute path for ASMJIT_DIR (pytorch#24)

4326190

Silence deprecation warnings from LLVM

a11afd8

Include legacy PassManager for debug printing

41e8cc3

Set code model to medium to avoid indirect jumps in generated asm

df63162

Fix argument type of input float buffers

37b606d

Add support for Casts in LLVM codegen.

6e74daf

Add a complete tensor+lower+llvm test

63cc9bc

Enable the failing test

62a7507

Enable export of compile_commands.json.

1808fe9

Floating point arithmetic

948fc60

Test fp32 mul using compute expr

be0beb5

Broadcast add test using compute expr

f51d3f0

Update to LLVM 9

e1ddac5

Implementation of Broadcast for LLVM.

41b6c57

Add Buffer operator() overload, and some other minor features

f3984e7

Cleanup use of ConstantInt API.

2ccc9c9

fix accidental experimental changes

228d060

Change the Compute interface to bring the dim sizes and names together

861797d

ZolotukhinM and others added 27 commits February 21, 2020 11:58

Fix the broken Cuda build (pytorch#188)

105c0bb

initial impl of symbolic shapes (pytorch#176)

5bf52fa

* formatted guard elimination * initial impl of symbolic shapes

Add rand_like support, and Python tests (pytorch#189)

f570657

Support dynamic shapes in texpr fuser (pytorch#190)

5fba20f

Backport a clang-tidy fix: replace BINARY_ACCEPT with IRPrinter::visi…

66e813b

…tBinaryOp. (pytorch#192) * Backport a clang-tidy fix: replace BINARY_ACCEPT with IRPrinter::visitBinaryOp. * Make visitBinaryOp a local function rather than a method of IRPrinter.

Backport some changes from master. (pytorch#193)

421cc32

Reenable the existing fuser by default and disable it only in our tes…

5be45c9

…ts. (pytorch#194) All `test_*` functions are now moved into a test-class (with no changes to them).

Add rand benchmark. (pytorch#196)

42aeac3

* Add rand benchmark. * Add an option to disable texpr fuser.

Add the cast_float, sigmoid_backward, tanh_backward and also fix the …

ab2dc46

…remainder (pytorch#198) * Add the cast_float, backward ops and also fix the remainder * fix the conflict * change expr to exprhandle * formatting * fix the linter

Fix some ir printer bugs (pytorch#201)

1a7b387

* Fix some ir printer bugs * also true_stmt

Enable axis splitting and GPU grid binding with variable shapes (pyto…

f5bc58b

…rch#142) * Enable axis splitting and GPU grid binding with variable shapes * farwell ExprStmt, we hardly knew ye

Add a doc about end-to-end tensor expressions workflow. (pytorch#195)

30c15ba

* Add workflow.md. * Remove the suggestions from the doc. * Add language reference. * Add language reference. * Address some of the comments.

Adding bitwise integer ops: &,^,<<, >> (pytorch#202)

06119e0

* Adding bitwise integer ops: &,^,<<, >>

Replace ExprHandle with Expr* in Function, Tensor, and Buffer. (pytor…

42a4312

…ch#200)

Add the type_as support (pytorch#199)

ba4dfa8

* Add the cast_float, backward ops and also fix the remainder fix the conflict change expr to exprhandle formatting fix the linter add the type_as support * fix the threshold failure

Aten op: where (pytorch#197)

b5ea519

* Aten op: where This require a helper function which does promote types for the condition expression.

LLVM codgen for fmod, remainder (pytorch#206)

32b0c3d

* LLVM codgen for fmod, remainder

fix testATengeInt (pytorch#208)

6cb2ad4

* fix testATengeInt

Make functions actually support multiple outputs. (pytorch#204)

38531d7

Revert "initial impl of symbolic shapes (pytorch#176)"

4f3cadf

This reverts commit 5bf52fa.

Move Stmt classes to a separate file. (pytorch#209)

4225716

The moved code wasnt changed.

Add support for more dtypes (pytorch#205)

1ee1ef2

Add indentation to IRPrinter's output. (pytorch#211)

af20070

Add the Fuse axis based on LoopNest

4cc77de

lly-zero-one requested review from zheng-xq and removed request for zheng-xq March 2, 2020 06:48

ZolotukhinM force-pushed the pytorch_fusion branch from 6628d0f to 36e8a6f Compare March 4, 2020 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FusionLoops based on MZ's LoopNest #233

FusionLoops based on MZ's LoopNest #233

lly-zero-one commented Mar 2, 2020

FusionLoops based on MZ's LoopNest #233

Are you sure you want to change the base?

FusionLoops based on MZ's LoopNest #233

Conversation

lly-zero-one commented Mar 2, 2020