Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] TensorTiler2D #1870

Draft
wants to merge 91 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
747ca3a
First version of tensor tiler
hunhoffe Oct 21, 2024
e5518fa
Merge branch 'main' into tiler-helper
hunhoffe Oct 21, 2024
1f86f05
Add some tests for the tiler
hunhoffe Oct 21, 2024
39e0a5d
Some improvements
hunhoffe Oct 21, 2024
03a0741
Merge branch 'main' into tiler-helper
hunhoffe Oct 22, 2024
8d67307
Some small improvements to tensortiler
hunhoffe Oct 22, 2024
637e314
Stub out example
hunhoffe Oct 22, 2024
d18a2ef
Added simple tiling examples
hunhoffe Oct 22, 2024
3c8ffb3
Merge branch 'main' into tiler-helper
hunhoffe Oct 22, 2024
d293c96
Update programming_examples/basic/tiling_exploration/per_tile/aie2.py
hunhoffe Oct 22, 2024
9c2ce5f
Fix makefile typos
hunhoffe Oct 22, 2024
2a3a484
Add tensor tiler tests
hunhoffe Oct 22, 2024
a47df3a
a couple more tests
hunhoffe Oct 22, 2024
babf9e7
Add a few more tests, remove template
hunhoffe Oct 22, 2024
46a487c
Add one more test
hunhoffe Oct 22, 2024
1071ee0
make tensortile test formatting a bit more sane
hunhoffe Oct 22, 2024
192194d
More python formatting
hunhoffe Oct 22, 2024
4f9656a
A few more tests
hunhoffe Oct 22, 2024
34ea2d8
Merge branch 'main' into tiler-helper
hunhoffe Oct 22, 2024
e437776
add visualization example
hunhoffe Oct 22, 2024
c744299
caption more correctly
hunhoffe Oct 22, 2024
d51e5c8
A bit of progress towards matrix_vector
hunhoffe Oct 23, 2024
87df9a7
Merge branch 'main' into tiler-helper
hunhoffe Oct 23, 2024
3bb4d19
Merge branch 'main' into tiler-helper
hunhoffe Oct 25, 2024
c4d2071
Updates from erika-iron-brainstorming branch
hunhoffe Oct 25, 2024
7b08c1a
Merge branch 'main' into tiler-helper
hunhoffe Oct 25, 2024
b08eebf
python format
hunhoffe Oct 25, 2024
06716e1
Add some visualization of access count (in addition to existing acces…
hunhoffe Oct 25, 2024
7f91465
Merge branch 'main' into tiler-helper
hunhoffe Oct 28, 2024
64879eb
Fix up tests after access count visualization changes
hunhoffe Oct 28, 2024
9accb78
Try to simplify form of sizes/strides by collapses stride value which…
hunhoffe Oct 28, 2024
6ad0860
Rename chunk to tile_group; repeat working in initial tests
hunhoffe Oct 28, 2024
864c72e
Missed adding in previous commit
hunhoffe Oct 28, 2024
db8fe9b
Some refinement for repeat count
hunhoffe Oct 28, 2024
0cf026a
Complete matrix vector tiling sweep test
hunhoffe Oct 28, 2024
eeec0a9
npu_dma_memcpy_nd take TensorTile, some repeat tests
hunhoffe Oct 28, 2024
ec47d34
More tile repeat tests
hunhoffe Oct 28, 2024
516803b
Finish tile repeat test suite
hunhoffe Oct 28, 2024
0d93ac0
Fix bad change from a few commits ago
hunhoffe Oct 28, 2024
364b0a4
Fix another bad change from a few commits ago
hunhoffe Oct 28, 2024
854de4d
First attempt at tile step in tile helper
hunhoffe Oct 30, 2024
100b866
Saving progress
hunhoffe Oct 30, 2024
00e632e
Add test file, will remove later
hunhoffe Oct 30, 2024
108fa81
Merge branch 'main' into tiler-helper
hunhoffe Oct 30, 2024
6dfd1db
move scratch file to somewhere less disruptive
hunhoffe Oct 31, 2024
17f4125
Disable tensor tiler 2d mat mul whole array test (for now)
hunhoffe Oct 31, 2024
d758e6a
Merge branch 'main' into tiler-helper
hunhoffe Oct 31, 2024
b2685bc
Add notes for how to proceed with impl
hunhoffe Oct 31, 2024
f104e7a
First step of tiler cleanup
hunhoffe Nov 1, 2024
246df71
Saving progress
hunhoffe Nov 1, 2024
38d7465
plot size based on tensor dims
hunhoffe Nov 1, 2024
64f99c7
access order looks nice even with larger tensors
hunhoffe Nov 2, 2024
def3c0e
Add experimentation notebook, will probably delete later
hunhoffe Nov 2, 2024
a783ef7
simpler_tiler appears functional
hunhoffe Nov 2, 2024
bd22201
tile sequence access order visualization seems good
hunhoffe Nov 2, 2024
b95b1a6
forgot to add init file
hunhoffe Nov 2, 2024
7f242d9
Animation is working in notebook but not in visualization
hunhoffe Nov 2, 2024
821ace1
Saving progress
hunhoffe Nov 2, 2024
f2c70a3
Just starting to test tile groups
hunhoffe Nov 2, 2024
59836c2
update to tiling speed
hunhoffe Nov 2, 2024
b375cdb
some tile groups working
hunhoffe Nov 2, 2024
0cf2dea
Better sizes for partial
hunhoffe Nov 4, 2024
38f6105
fixed some bugs with partial tile groups
hunhoffe Nov 4, 2024
a83149a
Seems to be working for step iteration without partial and without re…
hunhoffe Nov 4, 2024
07462cb
Fix bug
hunhoffe Nov 4, 2024
32fc56d
Step partial not implemented yet, but the rest seems good
hunhoffe Nov 4, 2024
4153d8d
Remove old code
hunhoffe Nov 4, 2024
aeeebc1
Move new code over to prepare for testing
hunhoffe Nov 4, 2024
0c34136
Fix file paths
hunhoffe Nov 4, 2024
6e6c223
add first few new tests
hunhoffe Nov 4, 2024
b50a5e3
Tests for simple tiler
hunhoffe Nov 4, 2024
a7a5e1c
Remove notebook
hunhoffe Nov 4, 2024
16abd03
Merge branch 'main' into tiler-helper
hunhoffe Nov 4, 2024
827a1e0
Remove old tests, first group of group_tiler tests
hunhoffe Nov 4, 2024
f5039f8
Small iterations on tests
hunhoffe Nov 4, 2024
f9202c2
Small iterations on tests
hunhoffe Nov 4, 2024
48cb941
Add some partial tests
hunhoffe Nov 5, 2024
beb4478
Add more partial tests
hunhoffe Nov 5, 2024
ba191dd
Finish partial tests for now
hunhoffe Nov 5, 2024
f406042
Add checks for type of pattern_repeat
hunhoffe Nov 5, 2024
22b793f
some unchecked changes
hunhoffe Nov 5, 2024
9d02033
fix small bugs
hunhoffe Nov 6, 2024
a821ef5
code simplification is mostly done
hunhoffe Nov 7, 2024
27e7fa6
Start fixing up number of dimensions
hunhoffe Nov 7, 2024
8cea0a6
Try to reduce hard-coded dimensions in 2d tiler
hunhoffe Nov 7, 2024
a5813b5
reduce hardcoded dimensionality a little bit more
hunhoffe Nov 7, 2024
a125ea1
small improvements for testing
hunhoffe Nov 7, 2024
d86ac40
Add first tests for step tiler
hunhoffe Nov 7, 2024
fa803af
Finish step tiler without partial tests
hunhoffe Nov 7, 2024
eca45bf
stub out step tiler partial
hunhoffe Nov 7, 2024
86891f2
Access tensors from tile sequence
hunhoffe Nov 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions programming_examples/basic/dma_transpose/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,9 @@ endif
run: ${targetname}.exe build/final.xclbin
${powershell} ./$< -x build/final.xclbin -i build/insts.txt -k MLIR_AIE --M ${M} --K ${K}

generate_access_map: ${srcdir}/aie2.py
mkdir -p ${@D}
python3 $< --generate-access-map ${M} ${K}

clean:
rm -rf build _build inst ${targetname}.exe
15 changes: 14 additions & 1 deletion programming_examples/basic/dma_transpose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,24 @@ This reference design can be run on a Ryzen™ AI NPU.
In the [design](./aie2.py), a 2-D array in a row-major layout is read from external memory to `ComputeTile2` with a transposed layout,
by using an implicit copy via the compute tile's Data Movement Accelerator (DMA). The data is read from and written to external memory through the Shim tile (`col`, 0).

This data movement transformation can be visualized as a map which shows the order the data the data is streamed (e.g., in transposed layout):
<p align="center">
<img
src="transpose_data.png">
<h3 align="center"> Visualization of the Transpose Data Transformation for M=32, K=16.
</h3>
</p>

The implicit copy is performed using the `object_fifo_link` operation that specifies how input data arriving via `of_in` should be sent further via `of_out` by specifically leveraging the compute tile's DMA. This operation and its functionality are described in more depth in [Section-2b](../../../programming_guide/section-2/section-2b/README.md/#object-fifo-link) of the programming guide.


To compile and run the design for NPU:
```
```bash
make
make run
```

To generate a data visualization of the transpose (like that above), run:
```bash
make generate_access_map
```
52 changes: 38 additions & 14 deletions programming_examples/basic/dma_transpose/aie2.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,32 @@
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates
import argparse
import numpy as np
import sys

from aie.dialects.aie import *
from aie.dialects.aiex import *
from aie.extras.context import mlir_mod_ctx
from aie.helpers.dialects.ext.scf import _for as range_
from aie.helpers.tensortiler.tensortiler2d import TensorTile

N = 4096
M = 64
K = 64

if len(sys.argv) == 3:
M = int(sys.argv[1])
K = int(sys.argv[2])
N = M * K
def my_passthrough(M, K, N, generate_acccess_map=False):
tensor_ty = np.ndarray[(M, K), np.dtype[np.int32]]
data_transform = TensorTile(
tensor_height=M,
tensor_width=K,
sizes=[1, 1, K, M],
strides=[1, 1, 1, K],
offset=0,
)
if generate_acccess_map:
data_transform.visualize(
plot_access_count=False, file_path="transpose_data.png"
)
return

tensor_ty = np.ndarray[(M, K), np.dtype[np.int32]]


def my_passthrough():
with mlir_mod_ctx() as ctx:

@device(AIEDevice.npu1_1col)
Expand Down Expand Up @@ -56,8 +61,7 @@ def sequence(A, B, C):
metadata=of_in,
bd_id=1,
mem=A,
sizes=[1, 1, K, M],
strides=[1, 1, 1, K],
tensor_tile=data_transform,
issue_token=True,
)
npu_dma_memcpy_nd(metadata=of_out, bd_id=0, mem=C, sizes=[1, 1, 1, N])
Expand All @@ -66,4 +70,24 @@ def sequence(A, B, C):
print(ctx.module)


my_passthrough()
if __name__ == "__main__":
p = argparse.ArgumentParser()
p.add_argument("dims", help="M K", type=int, nargs="*", default=[64, 64])
p.add_argument(
"--generate-access-map",
action="store_true",
help="Produce a file showing data access order",
)
args = p.parse_args()

if len(args.dims) != 2:
print(
"ERROR: Must provide either no dimensions or both M and K", file=sys.stderr
)
exit(-1)
my_passthrough(
M=args.dims[0],
K=args.dims[1],
N=args.dims[0] * args.dims[1],
generate_acccess_map=args.generate_access_map,
)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import numpy as np

from aie.helpers.tensortiler.tensortiler2d import TensorTiler2D, TensorTile
from util import construct_test


# RUN: %python %s | FileCheck %s
def ceildiv(a, b):
return -(a // -b)


def run_checks(n_aie_cols, n_aie_rows, M, N, K, m, n, k):
tb_max_n_rows = 4
tb_n_rows = tb_max_n_rows // 2

# Define tilers
c_tiler = TensorTiler2D(M, N, m * n_aie_rows, n)
c_iter = c_tiler.tile_iter(
tile_repeat_step_horizontal=n_aie_cols, iter_step=tb_n_rows
)

for tb in range(ceildiv(M // m // n_aie_rows, tb_max_n_rows)):
for pingpong in [0, 1]:
row_base = tb * tb_max_n_rows + pingpong * tb_max_n_rows // 2
tb_n_rows = min([tb_max_n_rows // 2, M // m // n_aie_rows - row_base])
print(tb_n_rows)
if tb_n_rows <= 0:
# for small input sizes, we may not even need a "pong" iteration
break

for col in range(n_aie_cols):
C_row_offset = row_base * m * n_aie_rows * N
C_col_offset = col * n
C_offset = C_col_offset + C_row_offset
C_sizes = [tb_n_rows, N // n // n_aie_cols, m * n_aie_rows, n]
C_strides = [m * n_aie_rows * N, n * n_aie_cols, N, 1]
expected_c_tile = TensorTile(
M, N, offset=C_offset, sizes=C_sizes, strides=C_strides
)

c_tile = next(c_iter)
if c_tile != expected_c_tile:
# equivalence for tensor tile checks offset, size, stride
# but there may be different but equivalent transformations

reference_access, reference_count = expected_c_tile.access_tensors()
c_access, c_count = c_tile.access_tensors()

"""
assert (reference_access == c_access).all(), (
f"C access orders do not match. "
f"Expected ({expected_c_tile}), got ({c_tile})"
)
assert (reference_count == c_count).all()
"""
print(f"Expected: {expected_c_tile}")
print(f"Actual: {c_tile}")


def matrix_whole_array_tiling_sweep():
n_aie_cols_sweep = [1, 2, 4] # TODO: when partial, add 3
n_aie_rows_sweep = [1, 2, 4] # TODO: when partial, add 3
M_sweep = range(512, 4096, 512)
K_sweep = range(512, 4096, 512)
N_sweep = range(512, 4096, 512)
m_sweep = [16, 32, 64]
n_sweep = [16, 32, 64]
k_sweep = [16, 32, 64]

for n_aie_cols in n_aie_cols_sweep:
for n_aie_rows in n_aie_rows_sweep:
for M in M_sweep:
for N in N_sweep:
for K in K_sweep:
for m in m_sweep:
for n in n_sweep:
for k in k_sweep:
run_checks(
n_aie_cols=n_aie_cols,
n_aie_rows=n_aie_rows,
M=M,
N=N,
K=K,
m=m,
k=k,
n=n,
)
return


if __name__ == "__main__":
matrix_whole_array_tiling_sweep()
10 changes: 6 additions & 4 deletions programming_examples/basic/matrix_scalar_add/aie2.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from aie.dialects.aiex import *
from aie.extras.context import mlir_mod_ctx
from aie.helpers.dialects.ext.scf import _for as range_
from aie.helpers.tensortiler.tensortiler2d import TensorTiler2D

# Size of the entire image
IMAGE_HEIGHT = 16
Expand Down Expand Up @@ -68,23 +69,24 @@ def core_body():
of_out1.release(ObjectFifoPort.Produce, 1)

# To/from AIE-array data movement
tiler = TensorTiler2D(IMAGE_HEIGHT, IMAGE_WIDTH, TILE_HEIGHT, TILE_WIDTH)
t = next(tiler.tile_iter()) # Only transfer one (first) tile of data

@runtime_sequence(tile_ty, tile_ty, tile_ty)
def sequence(inTensor, notUsed, outTensor):
npu_dma_memcpy_nd(
metadata=of_in1,
bd_id=1,
mem=inTensor,
sizes=[1, 1, TILE_HEIGHT, TILE_WIDTH],
strides=[1, 1, IMAGE_WIDTH, 1],
tensor_tile=t,
issue_token=True,
)

npu_dma_memcpy_nd(
metadata=of_out1,
bd_id=0,
mem=outTensor,
sizes=[1, 1, TILE_HEIGHT, TILE_WIDTH],
strides=[1, 1, IMAGE_WIDTH, 1],
tensor_tile=t,
)
dma_wait(of_in1, of_out1)

Expand Down
17 changes: 11 additions & 6 deletions programming_examples/basic/row_wise_bias_add/aie2.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from aie.dialects.aiex import *
from aie.extras.context import mlir_mod_ctx
from aie.helpers.dialects.ext.scf import _for as range_
from aie.helpers.tensortiler.tensortiler2d import TensorTiler2D


def row_wise_bias_add(M, N, m, n):
Expand Down Expand Up @@ -48,28 +49,32 @@ def core_body():
in_fifo.release(ObjectFifoPort.Consume, 1)
bias_fifo.release(ObjectFifoPort.Consume, 1)

tiler = TensorTiler2D(M, N, m, n, tensor_col_major=True)
t = next(
tiler.tile_iter(tile_group_height=M // m, tile_group_width=N // n)
) # Transfer all tiles at once
bias_tiler = TensorTiler2D(1, N, 1, n)
bias_t = next(bias_tiler.tile_iter(tile_group_width=N // n))

@runtime_sequence(tensor_ty, bias_ty, tensor_ty)
def sequence(inp, bias, out):
npu_dma_memcpy_nd(
metadata=in_fifo,
bd_id=0,
mem=inp,
sizes=[1, N // n, M, n],
strides=[0, n, N, 1],
tensor_tile=t,
)
npu_dma_memcpy_nd(
metadata=bias_fifo,
bd_id=1,
mem=bias,
sizes=[1, 1, N // n, n],
strides=[0, 0, n, 1],
tensor_tile=bias_t,
)
npu_dma_memcpy_nd(
metadata=out_fifo,
bd_id=2,
mem=out,
sizes=[1, N // n, M, n],
strides=[0, n, N, 1],
tensor_tile=t,
)
# of_out will only complete after of_in completes, so we just wait on of_out instead of both
dma_wait(out_fifo)
Expand Down
39 changes: 39 additions & 0 deletions programming_examples/basic/tiling_exploration/per_tile/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
##===- Makefile -----------------------------------------------------------===##
#
# This file licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
# Copyright (C) 2024, Advanced Micro Devices, Inc.
#
##===----------------------------------------------------------------------===##

srcdir := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))

include ${srcdir}/../../../makefile-common

tensor_height = 32
tensor_width = 32
tile_height = 4
tile_width = 4
data_str=${tensor_height}_${tensor_width}_${tile_height}_${tile_width}

.PHONY: all template clean

all: build/final_${data_str}.xclbin

build/aie_${data_str}.mlir: ${srcdir}/aie2.py
mkdir -p ${@D}
python3 $< --tensor-height ${tensor_height} --tensor-width ${tensor_width} --tile-height ${tile_height} --tile-width ${tile_width} > $@

build/final_${data_str}.xclbin: build/aie_${data_str}.mlir
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --aie-generate-npu --no-compile-host \
--no-xchesscc --no-xbridge \
--xclbin-name=${@F} --npu-insts-name=insts_${data_str}.txt $(<:%=../%)

run: build/final_${data_str}.xclbin build/insts_${data_str}.txt
${powershell} python3 ${srcdir}/test.py -x build/final_${data_str}.xclbin -i build/insts_${data_str}.txt -k MLIR_AIE --tensor-height ${tensor_height} --tensor-width ${tensor_width} --tile-height ${tile_height} --tile-width ${tile_width}

clean:
rm -rf build
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<!---//===- README.md -----------------------------------------*- Markdown -*-===//
//
// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// Copyright (C) 2024, Advanced Micro Devices, Inc.
//
//===----------------------------------------------------------------------===//-->

# Tiling Exploration

This IRON design flow example, called "Tiling Exploration", demonstrates how data may be `tiled` on input/output. This is a common data transformation pattern, and this example is meant to be interactive.

## Source Files Overview

TODO

## Design Overview

TODO

## Design Component Details

### AIE Array Structural Design

TODO

## Usage

TODO
Loading
Loading