Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 66 additions & 1 deletion programming_examples/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,75 @@
<!-- This file is auto-generated by generate_readme.py. Do not edit manually. -->

# MLIR-AIR Programming Examples

These programming examples demonstrate how to leverage the AIR design flow with mlir-air Python bindings and the mlir-air intermediate representation (IR) to build applications targeting AI Engines on AMD NPUs.

## Operator Dashboard

See the **[Operator Dashboard](https://xilinx.github.io/mlir-air/programming_examples/)** for the full table of supported operators with NPU1/NPU2 status indicators. The dashboard is auto-generated from LIT test files and published to GitHub Pages on every push to `main`.
| Category | Operation | Datatype(s) | NPU1 | NPU2 | Design Example |
|:---------|:----------|:------------|:----:|:----:|:---------------|
| Linear Algebra | [Matrix Multiplication](matrix_multiplication/) | bf16, i16, i8 | 🟢 | 🟢 | [matrix_multiplication/](matrix_multiplication/) |
| Linear Algebra | [Vector-Matrix Multiplication](vector_matrix_multiplication/) | bf16 | 🟢 | 🟢 | [vector_matrix_multiplication/](vector_matrix_multiplication/) |
| Linear Algebra | [Matrix-Vector Multiplication](matrix_vector_multiplication/bf16/) | bf16 | ⚪ | 🟢 | [matrix_vector_multiplication/bf16/](matrix_vector_multiplication/bf16/) |
| Linear Algebra | [AXPY](axpy/) | bf16 | 🟢 | 🟢 | [axpy/](axpy/) |
| Element-wise | [Element-wise Add](eltwise_add/) | f32 | 🟢 | 🟢 | [eltwise_add/](eltwise_add/) |
| Element-wise | [Element-wise Add (with L2)](eltwise_add_with_l2/) | f32 | 🟢 | 🟢 | [eltwise_add_with_l2/](eltwise_add_with_l2/) |
| Element-wise | [Element-wise Add (bf16)](primitives/vector_examples/vector_add/) | bf16 | 🟢 | 🟢 | [primitives/vector_examples/vector_add/](primitives/vector_examples/vector_add/) |
| Element-wise | [Element-wise Mul](primitives/vector_examples/vector_mul/) | bf16 | 🟢 | 🟢 | [primitives/vector_examples/vector_mul/](primitives/vector_examples/vector_mul/) |
| Activation/Math | [SiLU](silu/) | bf16 | ⚪ | 🟢 | [silu/](silu/) |
| Activation/Math | [GELU](gelu/) | bf16 | ⚪ | 🟢 | [gelu/](gelu/) |
| Activation/Math | [Softmax](softmax/) | bf16 | 🟢 | 🟢 | [softmax/](softmax/) |
| Activation/Math | [Sine / Cosine](sine_cosine/) | bf16 | 🟢 | ⚪ | [sine_cosine/](sine_cosine/) |
| Activation/Math | [RELU](relu/) | bf16 | 🟢 | 🟢 | [relu/](relu/) |
| Activation/Math | [Leaky RELU](leaky_relu/) | bf16 | 🟢 | 🟢 | [leaky_relu/](leaky_relu/) |
| Activation/Math | [Sigmoid](sigmoid/) | bf16 | ⚪ | 🟢 | [sigmoid/](sigmoid/) |
| Activation/Math | [Tanh](primitives/vector_examples/vector_tanh/) | bf16 | ⚪ | 🟢 | [primitives/vector_examples/vector_tanh/](primitives/vector_examples/vector_tanh/) |
| Normalization | [Layer Normalization](layer_norm/) | bf16 | ⚪ | 🟢 | [layer_norm/](layer_norm/) |
| Normalization | [RMS Normalization](rms_norm/) | bf16 | ⚪ | 🟢 | [rms_norm/](rms_norm/) |
| Normalization | [Weighted RMS Normalization](weighted_rms_norm/) | bf16 | ⚪ | 🟢 | [weighted_rms_norm/](weighted_rms_norm/) |
| Aggregation | [Reduction (Add)](primitives/vector_examples/vector_reduce_add/) | bf16 | 🟢 | 🟢 | [primitives/vector_examples/vector_reduce_add/](primitives/vector_examples/vector_reduce_add/) |
| Pooling | [MaxPool](primitives/vector_examples/vector_reduce_max/) | bf16 | 🟢 | 🟢 | [primitives/vector_examples/vector_reduce_max/](primitives/vector_examples/vector_reduce_max/) |
| Pooling | [AveragePool](average_pool/) | bf16 | 🟢 | 🟢 | [average_pool/](average_pool/) |
| LLM Kernels | [Multi-Head Attention (LLaMA2)](llama2_mha/) | bf16 | 🟢 | ⚪ | [llama2_mha/](llama2_mha/) |
| LLM Kernels | [SwiGLU](swiglu/) | bf16 | ⚪ | 🟢 | [swiglu/](swiglu/) |
| LLM Kernels | [FFN SwiGLU (Decode)](ffn_swiglu/decode/) | bf16 | ⚪ | 🟢 | [ffn_swiglu/decode/](ffn_swiglu/decode/) |
| LLM Kernels | [FFN SwiGLU (Prefill)](ffn_swiglu/prefill/) | bf16 | ⚪ | 🟢 | [ffn_swiglu/prefill/](ffn_swiglu/prefill/) |
| LLM Kernels | [RoPE (LUT-based)](rope_lut/) | bf16 | ⚪ | 🟢 | [rope_lut/](rope_lut/) |
| LLM Kernels | [RoPE (On-chip Sin/Cos)](rope_sincos/) | bf16 | 🟢 | 🟢 | [rope_sincos/](rope_sincos/) |
| Attention | [Flash Attention (Dataflow)](flash_attention/dataflow_based/) | bf16 | 🟢 | 🟢 | [flash_attention/dataflow_based/](flash_attention/dataflow_based/) |
| Attention | [Flash Attention (Kernel Fusion)](flash_attention/kernel_fusion_based/) | bf16 | ⚪ | 🟢 | [flash_attention/kernel_fusion_based/](flash_attention/kernel_fusion_based/) |
| Attention | [Grouped Query Attention (GQA)](flash_attention/kernel_fusion_based/) | bf16 | ⚪ | 🟢 | [flash_attention/kernel_fusion_based/](flash_attention/kernel_fusion_based/) |
| Data Movement | [Passthrough (DMA)](passthrough/passthrough_dma/) | u8, i8, i16, u16, f32, bf16 | 🟢 | 🟢 | [passthrough/passthrough_dma/](passthrough/passthrough_dma/) |
| Data Movement | [Passthrough (Channel)](passthrough/passthrough_channel/) | u8 | 🟢 | 🟢 | [passthrough/passthrough_channel/](passthrough/passthrough_channel/) |
| Data Movement | [Passthrough (Kernel)](passthrough/passthrough_kernel/) | u8 | 🟢 | 🟢 | [passthrough/passthrough_kernel/](passthrough/passthrough_kernel/) |
| Data Movement | [Shim DMA 2D](shim_dma_2d/) | i32 | 🟢 | 🟢 | [shim_dma_2d/](shim_dma_2d/) |
| Data Movement | [Data Transfer Transpose](data_transfer_transpose/) | u32 | 🟢 | 🟢 | [data_transfer_transpose/](data_transfer_transpose/) |
| Data Movement | [Transpose (bf16)](data_transfer_transpose/dma_bf16/) | bf16 | ⚪ | 🟢 | [data_transfer_transpose/dma_bf16/](data_transfer_transpose/dma_bf16/) |
| Data Movement | [Matrix Scalar Add](matrix_scalar_add/) | i32 | 🟢 | 🟢 | [matrix_scalar_add/](matrix_scalar_add/) |
| Communication | [Channel Examples](channel_examples/) | i32 | 🟢 | 🟢 | [channel_examples/](channel_examples/) |
| Communication | [Multi-Segment Examples](multi_segment/) | i32 | 🟡 | 🟡 | [multi_segment/](multi_segment/) |
| Communication | [Cascade Reduction](cascade_reduction/) | i32 | 🟢 | 🟢 | [cascade_reduction/](cascade_reduction/) |
| Memory | [Segment Alloc](segment_alloc/) | i32 | 🟢 | 🟢 | [segment_alloc/](segment_alloc/) |
| Spatial | [Segment Unroll](segment_unroll/) | i32 | 🟢 | 🟢 | [segment_unroll/](segment_unroll/) |
| Dataflow | [Herd Dataflow](herd_dataflow/) | bf16 | 🟢 | 🟢 | [herd_dataflow/](herd_dataflow/) |
| Control Flow | [Conditional Branching](conditional_branching/) | i32 | 🟢 | 🟢 | [conditional_branching/](conditional_branching/) |
| CNN | [2D Convolution](conv2d/) | i32 | 🟢 | 🟢 | [conv2d/](conv2d/) |
| CNN | [Bottleneck](bottleneck/) | bf16 | 🟢 | 🟢 | [bottleneck/](bottleneck/) |
| ML Pipeline | [MNIST-FC (Broadcast Bias Add)](mnist_fc/broadcast_bias_add/) | f32 | ⚪ | 🟢 | [mnist_fc/broadcast_bias_add/](mnist_fc/broadcast_bias_add/) |
| ML Pipeline | [MNIST-FC (ReLU 2D)](mnist_fc/relu/) | f32/bf16 | ⚪ | 🟢 | [mnist_fc/relu/](mnist_fc/relu/) |
| ML Pipeline | [MNIST-FC (Argmax)](mnist_fc/argmax/) | f32→i32 | ⚪ | 🟢 | [mnist_fc/argmax/](mnist_fc/argmax/) |
| ML Pipeline | [MNIST-FC (Integration)](mnist_fc/integration/) | f32 | ⚪ | 🟢 | [mnist_fc/integration/](mnist_fc/integration/) |
| Memory | [Shared L1 Buffer](shared_l1/) | bf16 | 🟢 | ⚪ | [shared_l1/](shared_l1/) |
| Quantization | [Dequant (AWQ int4→bf16)](dequant_awq/) | int4/bf16 | ⚪ | 🟢 | [dequant_awq/](dequant_awq/) |
| Primitives | [Scalar/Vector Operations](primitives/) | various | 🟢 | 🟢 | [primitives/](primitives/) |

### Status Legend

- 🟢 Supported and tested
- 🟡 Work in progress
- ⚪ Not yet supported

**NPU1** = AMD Ryzen AI (Phoenix, AIE2) &nbsp;&nbsp; **NPU2** = AMD Ryzen AI (Strix, AIE2P)

## Getting Started

Expand Down
24 changes: 24 additions & 0 deletions programming_examples/generate_readme.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,30 @@
"path": "bottleneck",
"datatypes": "bf16",
},
{
"category": "ML Pipeline",
"name": "MNIST-FC (Broadcast Bias Add)",
"path": "mnist_fc/broadcast_bias_add",
"datatypes": "f32",
},
{
"category": "ML Pipeline",
"name": "MNIST-FC (ReLU 2D)",
"path": "mnist_fc/relu",
"datatypes": "f32/bf16",
},
{
"category": "ML Pipeline",
"name": "MNIST-FC (Argmax)",
"path": "mnist_fc/argmax",
"datatypes": "f32\u2192i32",
},
{
"category": "ML Pipeline",
"name": "MNIST-FC (Integration)",
"path": "mnist_fc/integration",
"datatypes": "f32",
},
{
"category": "Memory",
"name": "Shared L1 Buffer",
Expand Down
24 changes: 24 additions & 0 deletions programming_examples/mnist_fc/argmax/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright (C) 2026, Advanced Micro Devices, Inc.
# SPDX-License-Identifier: MIT
srcdir := $(shell dirname $(realpath $(firstword $(MAKEFILE_LIST))))

ifdef PEANO_INSTALL_DIR
BUILD_DIR := build_peano
else
BUILD_DIR := build_chess
endif

NE0 ?= 10
NE1 ?= 500

all: run

print:
${powershell} python3 ${srcdir}/run.py --ne0 $(NE0) --ne1 $(NE1) -p

run:
mkdir -p $(BUILD_DIR)
PEANO_INSTALL_DIR=$(PEANO_INSTALL_DIR) cd $(BUILD_DIR) && ${powershell} python3 ${srcdir}/run.py --ne0 $(NE0) --ne1 $(NE1) -v

clean:
rm -rf $(BUILD_DIR) __pycache__
Loading
Loading