These programming examples provide a good starting point for those new to NPU programming with IRON, and aim to provide an overview of the IRON and NPU capabilities. All the designs are self-contained and operate on fixed problem sizes for simplicity. Please see the programming guide for a more detailed guide on developing designs.
- Memcpy - This design demonstrates a highly parallel, parameterized implementation of a memcpy operation that uses shim DMAs in every NPU column with the goal to measure memory bandwidth across the full NPU and evaluate how well a design utilizes available memory bandwidth across multiple columns and channels.
-
SAXPY - This design demonstrates an implementation of a SAXPY operation (i.e.
$Z = a*X + Y$ ) with both scalar and vectorized kernels. - Vector Reduce Max - This design demonstrates a vector reduce max implementation using a distributed, parallel approach across multiple AIE cores in one NPU column.
- Matrix Multiplication Single Core - This design demonstrates a single core implementation of a matrix multiplication where input matrices are tiled at the different levels of the NPU memory hierarchy.