Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FusionLoops based on MZ's LoopNest #233

Open
wants to merge 290 commits into
base: pytorch_fusion
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
290 commits
Select commit Hold shift + click to select a range
5bde90f
fix the schedule_test
zheng-xq Jan 8, 2020
0a006eb
fixed lowering
zheng-xq Jan 9, 2020
d608a9d
LLVM code generation for simple loops
bertmaher Dec 19, 2019
ab35cdd
bugfixes
bertmaher Jan 10, 2020
5b494cd
refcount fixing self-assignment
zheng-xq Jan 10, 2020
b5f6794
Make LOG(FATAL) nonreturn
zheng-xq Jan 11, 2020
0560717
Adding statement conversion for SplitWithTail
zheng-xq Jan 12, 2020
237fb35
Add a reference tests for Split
zheng-xq Jan 13, 2020
a436d64
clang-format
bertmaher Jan 13, 2020
a1e1f28
A functinoal reference chck for schedule tests.
zheng-xq Jan 13, 2020
d603a2e
clang-format
zheng-xq Jan 13, 2020
fc296fc
Add support for Float immediates.
Jan 13, 2020
4326190
Get absolute path for ASMJIT_DIR (#24)
bertmaher Jan 13, 2020
a11afd8
Silence deprecation warnings from LLVM
bertmaher Jan 13, 2020
41e8cc3
Include legacy PassManager for debug printing
bertmaher Jan 13, 2020
df63162
Set code model to medium to avoid indirect jumps in generated asm
bertmaher Jan 13, 2020
37b606d
Fix argument type of input float buffers
bertmaher Jan 13, 2020
6e74daf
Add support for Casts in LLVM codegen.
Jan 13, 2020
63cc9bc
Add a complete tensor+lower+llvm test
zheng-xq Jan 14, 2020
62a7507
Enable the failing test
zheng-xq Jan 14, 2020
1808fe9
Enable export of compile_commands.json.
Jan 13, 2020
948fc60
Floating point arithmetic
bertmaher Jan 14, 2020
be0beb5
Test fp32 mul using compute expr
bertmaher Jan 14, 2020
f51d3f0
Broadcast add test using compute expr
bertmaher Jan 14, 2020
e1ddac5
Update to LLVM 9
Jan 14, 2020
41b6c57
Implementation of Broadcast for LLVM.
Jan 14, 2020
f3984e7
Add Buffer operator() overload, and some other minor features
zheng-xq Jan 14, 2020
2ccc9c9
Cleanup use of ConstantInt API.
Jan 14, 2020
228d060
fix accidental experimental changes
zheng-xq Jan 14, 2020
861797d
Change the Compute interface to bring the dim sizes and names together
zheng-xq Jan 15, 2020
0a40ee2
clang-format
zheng-xq Jan 15, 2020
86fdd8c
refactor Buffer into its own files
zheng-xq Jan 15, 2020
30dd262
Add support for vector casts in LLVM CodeGen
Jan 15, 2020
fb36c54
Implement masked loads and stores.
Jan 15, 2020
e909220
Implement vector masked loads and stores.
Jan 15, 2020
395ea95
Add a PaddedBuffer test util
zheng-xq Jan 15, 2020
b8adc89
Improve the user interface for SimpleIREvaluator
zheng-xq Jan 16, 2020
3372a79
Add a test for Block codegen.
Jan 16, 2020
b44ad1c
Fix gtest include path
Jan 16, 2020
b42a518
clang-format
bertmaher Jan 16, 2020
62159ee
Add expressions and support for Max and Min. (#5)
resistor Jan 16, 2020
99c43ca
Rename compiler to tensorexpr and move files around to be more simila…
ZolotukhinM Jan 17, 2020
9ce3294
Add missing include <math.h> (#7)
ZolotukhinM Jan 17, 2020
987188b
Change isnan to std::isnan. It breaks my clang builds. (#8)
zheng-xq Jan 17, 2020
efe7a9f
Change the SimpleIREvaluator frontend (#9)
zheng-xq Jan 17, 2020
2170754
Make LLVM dependency optional. (#10)
resistor Jan 17, 2020
aef8ae1
[wip] Basic fuser pass to select texpr subgraphs
bertmaher Jan 17, 2020
3e68f32
Revert "[wip] Basic fuser pass to select texpr subgraphs"
bertmaher Jan 17, 2020
c6a45bb
Revert changes to the main pytorch CMakeLists.txt (for now).
Jan 17, 2020
a556fae
Add a test for aten::_cast_Float lowering. (#12)
resistor Jan 17, 2020
2502613
Hook tensorexp up to the main build, and switch to c10 logging
bertmaher Jan 17, 2020
0638d66
More ATen op tests. (#16)
resistor Jan 17, 2020
6cd5feb
Fix some missing returns
bertmaher Jan 17, 2020
bb8c52d
Include tests back to the 'all' target. (#14)
ZolotukhinM Jan 17, 2020
86ada3e
Even more ATen op tests. (#18)
resistor Jan 17, 2020
1333dd4
Test for relu ATen op. (#19)
resistor Jan 17, 2020
c7b599f
Add intrinsics function support. (#20)
zheng-xq Jan 18, 2020
8dfbc31
Remove fmax/fmin, as they are already covered by the Max/Min operator…
zheng-xq Jan 18, 2020
1003c71
refactor CallNode and BaseCallNode, so we can have a common concrete …
zheng-xq Jan 19, 2020
ab5cdcf
Add FunctionCall to use existing tensors (#23)
zheng-xq Jan 20, 2020
826b35c
Add the ability to use an existing tensor expression in other compute…
zheng-xq Jan 20, 2020
a71e307
fixing broken compilation on mac/clang
Jan 21, 2020
cb48cf5
adding IRnode for Compare-Select Ops and their LLVM Codegen
Jan 21, 2020
d7a6866
Fix Werror. (#26)
resistor Jan 21, 2020
be1ff18
Add tests for some transcendental ops. (#27)
resistor Jan 21, 2020
4888b68
Add Allocate and Free support. (#29)
zheng-xq Jan 22, 2020
f382220
Tensor expr fuser pass for extremely simple expressions
bertmaher Jan 21, 2020
16ec17b
Make fusion work for arbitrary buffer/tensor combinations of inputs (…
bertmaher Jan 22, 2020
311d874
fix Let02 test
Jan 22, 2020
9eb48f4
Access inputs and intermediates uniformly through Tensors (#31)
bertmaher Jan 22, 2020
356b7b8
adding LLVM Codegen for Let
Jan 22, 2020
f5dc3a6
Adding ComputeInline support. (#35)
zheng-xq Jan 23, 2020
1b70b54
Fix broken tests (#36)
zheng-xq Jan 23, 2020
2dbc14e
Make tx fuser work with arbitrary ranks
bertmaher Jan 22, 2020
52e9365
[fuser] Broadcast args
bertmaher Jan 22, 2020
3cea72a
Improve naming of arg broadcasting function
bertmaher Jan 23, 2020
4b0effc
modifying CMakeLists.txt to enable ninja test && minor update for LLV…
Jan 23, 2020
6161cef
Test cases for tensorexpr fusion (#37)
bertmaher Jan 23, 2020
4981c71
CompareSelct Op: Addressing XQ and Owen's comments
Jan 23, 2020
a74de1a
Sketch sufficient support for constants to get constant alpha working…
resistor Jan 24, 2020
0df8278
Fix indices when inlining non-leaf calls (#39)
bertmaher Jan 24, 2020
00bc846
Fixing the inline ordering issue (#43)
zheng-xq Jan 24, 2020
f6d385d
Avoid creating redundant and/or improperly ordered Constant's in fuse…
resistor Jan 24, 2020
f4aff3f
Move fuser-styled tests to schedule_test (#44)
bertmaher Jan 24, 2020
7d17a1f
Add aten::sub to the new fuser. (#46)
resistor Jan 24, 2020
17eea4e
Refactor CodeGen from SimpleIREval (#47)
zheng-xq Jan 24, 2020
b00ff10
Inline all the things (#45)
bertmaher Jan 24, 2020
1799490
clang-format for atent_test.cpp
Jan 24, 2020
f7b7ea9
Eliminate a ton of warnings for my own sanity. (#48)
resistor Jan 24, 2020
4aea9fa
Add support for type promotion/demotion. (#50)
resistor Jan 24, 2020
f66634e
Flesh out new fuser coverage to several more ops. (#51)
resistor Jan 24, 2020
b60c8db
Adding the first basic CudaCodeGen. (#52)
zheng-xq Jan 25, 2020
9982d9f
aten tests for eq, ge, gt, le, lt
Jan 24, 2020
2df8972
support for aten ops: eq
Jan 25, 2020
158e44f
support for more aten ops: ge, gt, le, lt, ne
Jan 25, 2020
462abfd
Minimal CMake change to link LLVM to libtorch
bertmaher Jan 25, 2020
9079255
Fix issues causing assertion failures in llvm debug builds
bertmaher Jan 25, 2020
f59cd84
Fatal on unimplement llvm codegen ops (Allocate, etc.)
bertmaher Jan 25, 2020
ebc0404
Optionally compile tx fuser kernels with llvm
bertmaher Jan 25, 2020
f6f2e8b
Test for 2D broadcasted with large dims to show vectorization
bertmaher Jan 25, 2020
a88e155
Updated isSupported for increased op coverage. (#54)
resistor Jan 27, 2020
4932592
Refactor LLVMCodeGen to compile kernel in constructor
bertmaher Jan 27, 2020
b11592b
Cmake integration to PT codebase (#28)
ZolotukhinM Jan 27, 2020
118a51c
Remove old padded_buffer.{cpp,h}. (#56)
ZolotukhinM Jan 27, 2020
6daaaaa
Add support for code generation of Log10 intrinsics with LLVM. (#57)
resistor Jan 28, 2020
16536bd
Remove tests/test_utils.h: inline what's still used and nuke what's u…
ZolotukhinM Jan 28, 2020
137b33a
Move Fuser tests (tests/tests.py) to test/test_tensorexpr.py. (#59)
ZolotukhinM Jan 28, 2020
d42f726
Remove old CMakeLists and README.txt
Jan 28, 2020
6b5acd9
Add support for vectorized and unmasked loads and stores with LLVM. (…
resistor Jan 28, 2020
87d012e
Enable CodeGen-level optimizations in LLVM. (#63)
resistor Jan 28, 2020
3eecc5f
Add Bind/GPUBlock/GPUThread support. (#64)
zheng-xq Jan 28, 2020
2b303d8
Bind/run interface to CodeGen (#60)
bertmaher Jan 28, 2020
92221a8
Fix ambiguity in CreateExtractElementCall (0ull can be a Value*, I gu…
bertmaher Jan 28, 2020
6f27ad2
Allow constants as lhs/rhs args (not just alpha) (#66)
bertmaher Jan 28, 2020
fce3aa4
Use correct tensor type for fuser output (#67)
bertmaher Jan 28, 2020
b033b87
clang-format
bertmaher Jan 28, 2020
adb8d3e
Rename 'compiler' namespace to 'tensorexpr'.
Jan 28, 2020
f135f74
Include all built llvm targets (#68)
bertmaher Jan 28, 2020
0d30f8a
Switch back to linking only the native LLVM target. (#69)
resistor Jan 29, 2020
ce7a305
Virtual dtors for IRVisitor/IRMutator (#70)
bertmaher Jan 29, 2020
585727a
Add semicolon to make nvcc compile (#71)
zheng-xq Jan 29, 2020
3ed78e5
Enable NVRTC for the GPU backend. (#74)
zheng-xq Jan 30, 2020
bd56fa9
Fix non-CUDA testing. (#75)
resistor Jan 30, 2020
b54b508
Getting fused (a)Sin(h), (a)Cos(h),(a) Tan(h), abs working with the i…
protonu Jan 30, 2020
4ea0e1a
remove the leak tests, as we will get rid of refcounting (#76)
zheng-xq Jan 30, 2020
609d15d
Implement aten::min, max, and clamp (#72)
bertmaher Jan 30, 2020
61ccd91
clang-format tensorexpr/tests.h (#77)
bertmaher Jan 30, 2020
aa099b8
Refactor UniqueNameManager into its own files. (#79)
zheng-xq Jan 31, 2020
278cd37
refactor cuda_codegen (#80)
zheng-xq Jan 31, 2020
fd2439b
simplify nvrtc major, minor versions (#81)
zheng-xq Jan 31, 2020
cc15703
Allow CodeGen to take Var args (interpreter support only) (#78)
bertmaher Jan 31, 2020
77e49b3
[LLVMCodeGen] Refactor kernel constructor to be less sprawling (#82)
bertmaher Jan 31, 2020
8b480d0
(TE Interpreter)Support for floor, ceil, trunc, remainder, sqrt and i…
protonu Feb 2, 2020
785e1ae
Add Cond and Mod to SimpleIREval (#84)
zheng-xq Feb 3, 2020
0507806
[LLVMCodeGen] Support dynamic shapes by binding Var args (#86)
bertmaher Feb 3, 2020
ae8c3e2
Add SplitWithMask core support. (#87)
zheng-xq Feb 3, 2020
79c93fd
Add Cuda tests for SplitWithMask (#88)
zheng-xq Feb 3, 2020
aa33334
Disable DEBUG_PRINT (#89)
zheng-xq Feb 3, 2020
dd4f1a1
Remove some debug prints (#90)
zheng-xq Feb 3, 2020
538af26
Fix the no-CUDA build. (#92)
resistor Feb 3, 2020
a15a6b7
Add support for multiple outputs from the fused subgraph. (#91)
resistor Feb 3, 2020
f00cd2d
Remove RefCounting (#93)
zheng-xq Feb 4, 2020
4336e01
Add some comments for KernelScope. Address comments. (#94)
zheng-xq Feb 4, 2020
4d59c21
Completely remove refcount.h (#95)
zheng-xq Feb 4, 2020
04a180c
fix the fuser pass (#97)
zheng-xq Feb 4, 2020
6a76a00
Rename Kernel to KernelArena (#98)
zheng-xq Feb 4, 2020
02dc018
Add support for fusion through ConstantChunk ops. (#96)
resistor Feb 4, 2020
7a8ee00
Fix implicit noexcept deduction warning. (#99)
resistor Feb 4, 2020
b06c9dc
Make llvm tests conditional on USE_LLVM (#100)
bertmaher Feb 4, 2020
3d5600f
Refactor ComputeNode into ComputeValue, to be able to handle arbitrar…
resistor Feb 5, 2020
8ede876
Improve Stmt pretty printing from TensorExprFuser (#102)
bertmaher Feb 5, 2020
4fec4f1
Add support for IfThenElse (#103)
resistor Feb 5, 2020
f25db67
Add end-to-end support and a PyTorch fuser example on CudaCodeGen (#104)
zheng-xq Feb 5, 2020
931ece7
fix rebase errors (#105)
zheng-xq Feb 5, 2020
47177e2
fixes to build on system without LLVM and CUDA (#107)
protonu Feb 6, 2020
343b836
Add support for aten::cat to the new fuser. (#106)
resistor Feb 6, 2020
0e90cdd
Bail out of fusion if we don't have a complete tensor type (for now).…
resistor Feb 6, 2020
d5f5c29
Standardize codegen call() interface and remove bind/run (#109)
bertmaher Feb 6, 2020
0b710a1
Clean up sketchy handling of scalar args in llvm codegen (#110)
bertmaher Feb 6, 2020
db3bce0
Test 2D dynamic shapes (#112)
bertmaher Feb 6, 2020
fa6a3b7
clang-format (#113)
bertmaher Feb 6, 2020
9ea2ddb
Add LLVM codegen for a lot of transcendental ops. (#115)
resistor Feb 6, 2020
5beda95
Fix bug with binary math intrinsics. (#116)
resistor Feb 6, 2020
e5039c6
Use CUDA for 3-arg test (#117)
bertmaher Feb 6, 2020
a309cf4
Refactor CudaCodeGen into generic registration, so we can have both t…
zheng-xq Feb 6, 2020
b40025b
Add instructions on how to rebase on master.
Feb 6, 2020
bc63f99
Dynamic shape support in CUDA codegen (#120)
bertmaher Feb 6, 2020
f2bb122
Disable GPU fuser. Revive the Cuda tests (#121)
zheng-xq Feb 6, 2020
1802c71
Add ExecutionCounter to detect whether the underlying code is execute…
zheng-xq Feb 7, 2020
cfe9824
Adding GPU index flatting to support arbitrary elementwise and broadc…
zheng-xq Feb 7, 2020
a020c93
fix a bug kLog to Intrin::log (#124)
Krovatkin Feb 7, 2020
5497723
Allow scalar variables as inputs (#125)
bertmaher Feb 7, 2020
c1f0b3d
clang-format (#127)
bertmaher Feb 7, 2020
c4fc6d9
Format python tests with `black` (#128)
bertmaher Feb 7, 2020
1d654a9
Add support for fusion in nested blocks. (#129)
resistor Feb 7, 2020
5fde7e8
Teach the LLVM JIT to use dlsym to resolve symbols. (#130)
resistor Feb 7, 2020
53da506
Factor out kernel codegen from tx fusion pass (#131)
bertmaher Feb 8, 2020
1ef2107
Use standard JIT logging in TX fuser.
Feb 9, 2020
ceb1ce1
Move memory management classes (KernelArena, KernelScope, KernelScope…
ZolotukhinM Feb 9, 2020
8a980bd
(IR Interpreter) Adding more Operators: Erfc, Exmp1, frac, lgamma, ne…
protonu Feb 10, 2020
e4743b2
Add erfc to llvm codegen (#134)
bertmaher Feb 10, 2020
d387084
Squash some warnings (#135)
bertmaher Feb 10, 2020
238f21e
(IR interpreter) addcmul (#137)
protonu Feb 10, 2020
1036260
Remove IRNode. CodeGen accepts only Stmt. Add ExprEval utility wrappe…
zheng-xq Feb 10, 2020
360e7a3
Add the benchmark from NNC (#141)
zheng-xq Feb 11, 2020
376e0b3
Fix verifier errors in LLVM codegen when conditional loads feed direc…
resistor Feb 12, 2020
327360d
Strength reduction peephole for pow(). (#144)
resistor Feb 12, 2020
5f7b34a
Fix incorrect pow(x, 0) case. (#145)
resistor Feb 12, 2020
56e9156
Use `const Value*` where possible (#146)
bertmaher Feb 12, 2020
5af5528
Make Broadcast work (#147)
zheng-xq Feb 12, 2020
e317774
Fixed CudaCodeGen output streams. Switch to __ldg by default (#148)
zheng-xq Feb 12, 2020
781e75a
Add ElementWise support (#150)
zheng-xq Feb 12, 2020
2b1eda8
Fix an assertion failure when merging constants into aten::cat fusion…
resistor Feb 12, 2020
fad5348
adding LLVM support ops: sigmoid, relu, neg, addcmul, reciprocal, lga…
protonu Feb 12, 2020
fcc16c2
Add more operator support and tests (#140)
lly-zero-one Feb 12, 2020
4339ce7
Fix accidental assignment in condition (#153)
bertmaher Feb 13, 2020
4885664
Add elementwise benchmarks and comparisons. (#155)
zheng-xq Feb 13, 2020
55ed6a4
Backport some of the fixes from the master PR. (#157)
ZolotukhinM Feb 13, 2020
15c4e1d
Adding broadcasting benchmarks (#158)
zheng-xq Feb 13, 2020
b689b5a
Fix the missing aten::pow support (#160)
zheng-xq Feb 13, 2020
4b3fe96
Fix the missing aten::pow support (#161)
zheng-xq Feb 13, 2020
3af53c5
Fixing the failing test (#164)
zheng-xq Feb 14, 2020
fed6761
Add NNC support for aten::slice and aten::unsqueeze. (#159)
resistor Feb 14, 2020
50c126e
Get strides working (#163)
bertmaher Feb 14, 2020
d826c0d
Change the default block size from 1024 to 512. (#165)
zheng-xq Feb 14, 2020
bb2127f
Check that dtype is float before calling std::isnan. (#167)
ZolotukhinM Feb 14, 2020
a9fc714
Check that dtype is float before calling std::isnan. (#168)
ZolotukhinM Feb 14, 2020
0adc044
Remove asmjit backend. (#169)
ZolotukhinM Feb 14, 2020
b329549
Cleanup fuser pass a little. (#170)
ZolotukhinM Feb 14, 2020
54841e8
Add the Binary/unary op and also tests (#154)
lly-zero-one Feb 18, 2020
7a05955
Adding options to set the cuda loop levels, block count and block siz…
zheng-xq Feb 19, 2020
06915c0
fixing failures in LLVM codegen for Compare Select Ops (#173)
protonu Feb 19, 2020
b7bfd90
Add LetStmt support. (#174)
zheng-xq Feb 20, 2020
0387f04
Pass Graph to TensorExprKernel constructor (#177)
bertmaher Feb 20, 2020
ea1e2ad
Broadcast based on input shapes (#178)
bertmaher Feb 20, 2020
69fc6ac
Add PrioritizeLoad to CudaCodeGen. (#179)
zheng-xq Feb 20, 2020
578a4e3
[WIP] Adding 4-Op CompareSelect (#175)
protonu Feb 20, 2020
9da75ac
Remove class FunctionNode. (#180)
ZolotukhinM Feb 21, 2020
7ed486f
Add support for pow() in the LLVM backend. (#182)
resistor Feb 21, 2020
090b7bf
Add support for None operands to aten::clamp in the TE fuser. (#181)
resistor Feb 21, 2020
e7dd481
Add guard elimination support for aten::unsqueeze. (#33371) (#184)
resistor Feb 21, 2020
5b43893
fix for test testATengeInt (#185)
protonu Feb 21, 2020
1244564
Adding Cuda Random support in TE. (#183)
zheng-xq Feb 21, 2020
8202cd7
Remove TensorNode and TensorOperationNode classes and remove some wra…
ZolotukhinM Feb 21, 2020
105c0bb
Fix the broken Cuda build (#188)
zheng-xq Feb 21, 2020
5bf52fa
initial impl of symbolic shapes (#176)
bertmaher Feb 21, 2020
f570657
Add rand_like support, and Python tests (#189)
zheng-xq Feb 23, 2020
5fba20f
Support dynamic shapes in texpr fuser (#190)
bertmaher Feb 24, 2020
a30144c
Use BaseExprNode* in IR classes directly rather than through Expr. (#…
ZolotukhinM Feb 24, 2020
66e813b
Backport a clang-tidy fix: replace BINARY_ACCEPT with IRPrinter::visi…
ZolotukhinM Feb 24, 2020
421cc32
Backport some changes from master. (#193)
ZolotukhinM Feb 24, 2020
5be45c9
Reenable the existing fuser by default and disable it only in our tes…
ZolotukhinM Feb 25, 2020
42aeac3
Add rand benchmark. (#196)
zheng-xq Feb 25, 2020
ab2dc46
Add the cast_float, sigmoid_backward, tanh_backward and also fix the …
lly-zero-one Feb 25, 2020
1a7b387
Fix some ir printer bugs (#201)
bertmaher Feb 25, 2020
f5bc58b
Enable axis splitting and GPU grid binding with variable shapes (#142)
bertmaher Feb 25, 2020
30c15ba
Add a doc about end-to-end tensor expressions workflow. (#195)
ZolotukhinM Feb 25, 2020
06119e0
Adding bitwise integer ops: &,^,<<, >> (#202)
protonu Feb 25, 2020
42a4312
Replace ExprHandle with Expr* in Function, Tensor, and Buffer. (#200)
ZolotukhinM Feb 26, 2020
ba4dfa8
Add the type_as support (#199)
lly-zero-one Feb 26, 2020
b5ea519
Aten op: where (#197)
protonu Feb 26, 2020
32b0c3d
LLVM codgen for fmod, remainder (#206)
protonu Feb 26, 2020
6cb2ad4
fix testATengeInt (#208)
protonu Feb 26, 2020
38531d7
Make functions actually support multiple outputs. (#204)
ZolotukhinM Feb 26, 2020
4f3cadf
Revert "initial impl of symbolic shapes (#176)"
bertmaher Feb 26, 2020
4225716
Move Stmt classes to a separate file. (#209)
ZolotukhinM Feb 27, 2020
1ee1ef2
Add support for more dtypes (#205)
nickgg Feb 27, 2020
af20070
Add indentation to IRPrinter's output. (#211)
ZolotukhinM Feb 27, 2020
0c73c32
[RFC] Add LoopNest class that implements Schedule's API in a differen…
Feb 27, 2020
4cc77de
Add the Fuse axis based on LoopNest
lly-zero-one Mar 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions benchmarks/tensorexpr/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
import argparse
import itertools
import framework
import os
import types
import tensor_engine
#import normalization
import broadcast
#import reduction
import elementwise
#import softmax
#import pooling
#import conv
#import matmul


def main():
parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter,
description=
'''Benchmark operators in specific shapes.
Works only with Python3.\n A few examples:
* benchmark.py: runs all the default configs with all the benchmarks.
* benchmark.py reduce: runs all the default configs with all benchmark with a prefix 'reduce'
* benchmark.py layernorm_fwd_cpu_128_32_128_128: run a particular benchmark in that config''')
parser.add_argument('benchmark_names', type=str, default=None, nargs='*',
help='name of the benchmark to run')
parser.add_argument('--device', type=str, default='cpu,cuda',
help='a comma separated list of device names')
parser.add_argument('--mode', type=str, default='fwd,both',
help='a comma separated list of running modes')
parser.add_argument('--engine', type=str, default='pt',
help='the underlying tensor engine. only pt for now')
parser.add_argument('--jit_mode', type=str, default='trace',
help='the jit mode to use: one of {trace, none}')
parser.add_argument('--cuda_pointwise_loop_levels', type=int, default=None,
help='num of loop levesl for Cuda pointwise operations: 2 or 3')
parser.add_argument('--cuda_pointwise_block_count', type=int, default=None,
help='num of block for Cuda pointwise operations')
parser.add_argument('--cuda_pointwise_block_size', type=int, default=None,
help='num of blocks for Cuda pointwise operations')
parser.add_argument('--cuda_fuser', type=str, default='te',
help='The Cuda fuser backend to use: one of {te, old, none}')

args = parser.parse_args()

def set_global_threads(num_threads):
os.environ['OMP_NUM_THREADS'] = str(num_threads)
os.environ['MKL_NUM_THREADS'] = str(num_threads)
os.environ['TVM_NUM_THREADS'] = str(num_threads)
os.environ['NNC_NUM_THREADS'] = str(num_threads)

devices = args.device.split(',')
# accept 'gpu' as an alternative as the 'cuda' device
devices = ['cuda' if device == 'gpu' else device for device in devices]
cpu_count = 0
for index, device in enumerate(devices):
if device.startswith('cpu'):
cpu_count += 1
if cpu_count > 1:
raise ValueError('more than one CPU device is not allowed: %d' % (cpu_count))
if device == 'cpu':
continue
num_threads_str = device[3:]
try:
# see if the device is in 'cpu1' or 'cpu4' format
num_threads = int(num_threads_str)
set_global_threads(num_threads)
devices[index] = 'cpu'
except ValueError:
continue

modes = args.mode.split(',')

tensor_engine.set_engine_mode(args.engine)

def run_default_configs(bench_cls, allow_skip=True):
for mode, device, config in itertools.product(modes, devices, bench_cls.default_configs()):
benchmark = bench_cls(mode, device, *config)
benchmark.jit_mode = args.jit_mode
if not benchmark.is_supported():
if allow_skip:
continue
else:
raise ValueError('attempted to run an unsupported benchmark: %s' % (benchmark.desc()))
framework.run_benchmark(benchmark, args)

benchmark_classes = framework.benchmark_classes
if not args.benchmark_names:
# by default, run all the benchmarks
for benchmark_cls in benchmark_classes:
run_default_configs(benchmark_cls, allow_skip=True)
else:
for name in args.benchmark_names:
# if the name is the prefix of a benchmark class, run all the benchmarks for that class
match_class_name = False
for bench_cls in benchmark_classes:
if name in bench_cls.module():
match_class_name = True
run_default_configs(bench_cls, allow_skip=True)

if match_class_name:
continue

# if not a class module, parse the config and call it that way
match_class_name = False
for bench_cls in benchmark_classes:
cls_module = bench_cls.module()
if name.startswith(cls_module):
match_class_name = True
if name[len(cls_module)] != '_':
raise ValueError('invalid name: %s' % (name))
config_str = name[(len(cls_module) + 1):]
config = config_str.split('_')
if len(config) < 2:
raise ValueError('invalid config: %s' % config)
mode, device = config[0:2]
#TODO: make sure virtual devices such as 'cpu1' and 'cpu4' are supported.
if mode not in ['fwd', 'both']:
raise ValueError('invalid mode: %s' % (mode))
for i, entry in enumerate(config):
try:
value = int(entry)
config[i] = value
except ValueError:
pass
benchmark = bench_cls(*config)
benchmark.jit_mode = args.jit_mode
framework.run_benchmark(benchmark, args)

if not match_class_name:
available_classes = ', '.join([bench_cls.module() for bench_cls in benchmark_classes])
raise ValueError('invalid name: %s\nAvailable benchmark classes:\n%s' % (name, available_classes))


if __name__== '__main__':
main()
Loading