Name		Name	Last commit message	Last commit date
parent directory ..
dma_transpose		dma_transpose
matrix_multiplication		matrix_multiplication
matrix_scalar_add		matrix_scalar_add
passthrough_dmas		passthrough_dmas
passthrough_dmas_plio		passthrough_dmas_plio
passthrough_kernel		passthrough_kernel
passthrough_pykernel		passthrough_pykernel
row_wise_bias_add		row_wise_bias_add
tiling_exploration		tiling_exploration
vector_exp		vector_exp
vector_reduce_add		vector_reduce_add
vector_reduce_max		vector_reduce_max
vector_reduce_min		vector_reduce_min
vector_scalar_add		vector_scalar_add
vector_scalar_add_runlist		vector_scalar_add_runlist
vector_scalar_mul		vector_scalar_mul
vector_vector_add		vector_vector_add
vector_vector_add_BDs_init_values		vector_vector_add_BDs_init_values
vector_vector_modulo		vector_vector_modulo
vector_vector_mul		vector_vector_mul
README.md		README.md
lit.local.cfg		lit.local.cfg

README.md

Basic Programming Examples

These programming examples provide a good starting point to illustrate how to build commonly used compute kernels (both single-core and multi-core data processing pipelines). They serve to highlight how designs can be described in Python and lowered through the mlir-aie tool flow to an executable that runs on the NPU. Passthrough Kernel and Vector Scalar Mul are good designs to get started with. Please see section 3 of the programming guide for a more detailed guide on developing designs.

Passthrough DMAs - This design demonstrates data movement to implement a memcpy operation using object FIFOs just using DMAs without involving the AIE core.
Passthrough Kernel - This design demonstrates a simple AIE implementation for vectorized memcpy on a vector of integer involving AIE core kernel programming.
DMA Transpose - Transposes a matrix with the Shim DMA using npu_dma_memcpy_nd
Vector Scalar Add - Single tile performs a very simple + operation where the kernel loads data from local memory, increments the value by 1 and stores it back.
Vector Scalar Mul - Single tile performs vector * scalar of size 4096. The kernel does a 1024 vector multiply and is invoked multiple times to complete the full vector * scalar compute.
Vector Vector Add - Single tile performs vector + vector of size 1024.
Vector Vector Modulo - Single tile performs vector % vector of size 1024.
Vector Vector Multiply - Single tile performs vector * vector of size 1024.
Vector Reduce Add - Single tile performs a reduction of a vector to return the sum of the elements.
Vector Reduce Max - Single tile performs a reduction of a vector to return the max of the elements.
Vector Reduce Min - Single tile performs a reduction of a vector to return the min of the elements.
Vector Exp - A simple element-wise exponent function, using the look up table capabilities of the AI Engine.
Matrix Scalar Add - Single tile performs matrix * vector with matrix size of 16x8.
Matrix Multiplication - This directory contains multiple designs spanning: single core and multi-core (whole array) matrix-matrix multiplication, and matrix-vector multiplication designs. It also contains sweep infrastructure for benchmarking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basic

basic

README.md

Basic Programming Examples

Files

basic

Directory actions

More options

Directory actions

More options

Latest commit

History

basic

Folders and files

parent directory

README.md

Basic Programming Examples