See Vitis™ Development Environment on xilinx.com See Vitis AI Development Environment on xilinx.com |
Version: Vitis 2024.1
Developing an accelerated AI Engine design for the VCK190 can be done using the Vitis compiler (v++
). This compiler can be used to compile programmable logic (PL) kernels and connect these PL kernels to the AI Engine and PS device.
In this tutorial, you will learn clocking concepts for the Vitis compiler and how to define clocking for an ADF Graph, as well as PL kernels using clocking automation functionality. The design being used is a simple classifier design as shown in the following figure:
Prerequisites for this tutorial are:
- Familiarity with the
v++ -c --mode aie
flow. - Familiarity with the
gcc
style command line compilation.
In the design, the following clocking steps are used:
Kernel Location | Compile Setting |
---|---|
Interpolator, Polar Clip, & Classifier | AI Engine Frequency (1 GHz) |
mm2s & s2mm |
150 MHz and 100 MHz (v++ -c & v++ -l ) |
For detailed information, see the Clocking the PL Kernels section here. |
IMPORTANT: Before beginning the tutorial, make sure you have installed the Vitis 2024.1 software. The Vitis release includes all the embedded base platforms including the VCK190 base platform that is used in this tutorial. In addition, ensure you have downloaded the Common Images for Embedded Vitis Platforms from this link: https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/embedded-platforms/2024.1.html The common image package contains a prebuilt Linux kernel and root file system that can be used with the AMD Versal™ board for embedded design development using Vitis.
Before starting this tutorial, run the following steps:
- Go to the directory where you have unzipped the Versal Common Image package.
- In a Bash shell, run the
/Common Images Dir/xilinx-versal-common-v2024.1/environment-setup-cortexa72-cortexa53-xilinx-linux
script. This script sets up theSDKTARGETSYSROOT
andCXX
variables. If the script is not present, you must run the/Common Images Dir/xilinx-versal-common-v2024.1/sdk.sh
. - Set up your
ROOTFS
andIMAGE
to point to therootfs.ext4
, andImage
files located in the/Common Images Dir/xilinx-versal-common-v2024.1
directory. - Set up your
PLATFORM_REPO_PATHS
environment variable to$XILINX_VITIS/lin64/Vitis/2024.1/base_platforms/
.
This tutorial targets VCK190 production board for 2024.1 version.
You will learn the following:
- Clocking in Versal for PL and AIE kernels using --freqhz directive.
The ADF graph has connections to the PL through the PLIO interfaces. These interfaces can have reference clocking either from the graph.cpp
through the PLIO()
constructor or through the --pl-freq
. This will help with determining what kind of clock can be set on the PL kernels that are going to connect to the PLIO. Here you will set the reference frequency to be 200 MHz for all PLIO interfaces.
NOTE: If you do not specify the --pl-freq
, it will be set to 1/4 the frequency of the AI Engine frequency.
v++ -c --mode aie --target=hw -include="$(XILINX_VITIS)/aietools/include" --include="./aie" --include="./data" --include="./aie/kernels" --include="./" --freqhz=200000000 --aie.workdir=./Work aie/graph.cpp
Flag | Description |
---|---|
--target | Target how the compiler will build the graph. Default is hw . |
--include | All the typical include files needed to build the graph. |
--freqhz=200000000 | Sets all PLIO reference frequencies (in MHz). |
--aie.workdir | The location of where the work directory will be created. |
In this design, you will use three kernels called: MM2S, S2MM, and Polar_Clip, to connect to the PLIO. The MM2S and S2MM are AXI memory-mapped to AXI4-Stream HLS designs to handle mapping from DDR and streaming the data to the AI Engine. The Polar_Clip is a free running kernel that only contains two AXI4-Stream interfaces (input and output) that will receive data from the AI Engine, process the data, and send it back to the AI Engine. Clocking of these PLIO kernels is separate from the ADF Graph, and these are specified when compiling the kernel, and when linking the design together. There are different methods to acheive clocking.
Run the following commands.
v++ -c --mode hls --platform $PLATFORM_REPO_PATHS/xilinx_vck190_base_202410_1 /xilinx_vck190_base_202410_1 .xpfm
--freqhz=150000000 --config pl_kernels/mm2s.cfg \
v++ -c --mode hls --platform $PLATFORM_REPO_PATHS/xilinx_vck190_base_202410_1 /xilinx_vck190_base_202410_1 .xpfm
--freqhz=150000000 --config pl_kernels/s2mm.cfg \
v++ -c --mode hls --platform $PLATFORM_REPO_PATHS/xilinx_vck190_base_202410_1 /xilinx_vck190_base_202410_1 .xpfm
--freqhz=200000000 --config ./pl_kernels/polar_clip.cfg \
OR use MHz, for example:
v++ -c --mode hls --platform $PLATFORM_REPO_PATHS/xilinx_vck190_base_202410_1 /xilinx_vck190_base_202410_1 .xpfm
--freqhz=150MHz --config pl_kernels/mm2s.cfg \
OR prepare a config file and pass it during v++ compile, for example:
v++ -c --mode hls --platform $PLATFORM_REPO_PATHS/xilinx_vck190_base_202310_1/xilinx_vck190_base_202310_1.xpfm
--config ./pl_kernels/polar_clip.cfg \
In polar_clip.cfg:
[hls]
flow_target=vitis
syn.file=polar_clip.cpp
syn.cflags=-I.
syn.top=polar_clip
package.ip.name=polar_clip
package.output.syn=true
package.output.format=xo
package.output.file=polar_clip.xo
freqhz=200MHz
A brief explanation of the v++
options:
Flag/Switch | Description |
---|---|
-c |
Tells v++ to run the compiler. |
--mode |
Tells v++ to run the HLS mode for the PL compilation. |
--platform |
(required) The platform to be compiled towards. |
--freqhz |
Tells the Vitis compiler to use a specific clock defined by a nine digit number. Specifying this will help with the compiler make optimizations based on kernel timing. |
--config |
to specify the kernel config file that contains settings for synthesis like top function, kernel name etc. |
For additional information, see Vitis Compiler Command.
After completion, you will have the mm2s.xo
, s2mm.xo
, and polar_clip.xo
files ready to be used by v++
. The host application will communicate with these kernels to read/write data into memory.
Now that you have a compiled graph (libadf.a
), the PLIO kernels (mm2s.xo
, s2mm.xo
, and polar_clip.xo
), you can link everything up for the VCK190 platform.
A few things to remember in this step:
- For PLIO kernels, you need to specify their connectivity for the system.
- Specify the clocking per PL kernel.
- You need to determine the
TARGET
: hw or hw_emu.
To link kernels up to the platform and AI Engine, you will need to look at the system.cfg
file. For this design, the config file looks like this:
[connectivity]
nk=mm2s:1:mm2s
nk=s2mm:1:s2mm
nk=polar_clip:1:polar_clip
stream_connect=mm2s.s:ai_engine_0.DataIn1
stream_connect=ai_engine_0.clip_in:polar_clip.in_sample
stream_connect=polar_clip.out_sample:ai_engine_0.clip_out
stream_connect=ai_engine_0.DataOut1:s2mm.s
Here you might notice some connectivity and clocking options.
nk
: This defines your PL kernels as such:<kernel>:<count>:<naming>
. For this design, you only have one of eachs2mm
,mm2s
, andpolar_clip
kernels.stream_connect
: This tellsv++
how to hook up the previous two kernels to the AI Engine instance. Remember, AI Engine only handles stream interfaces.
With the changes made, you can now run the following command. In v++ link command, we have three ways to direct clocking in linker stage: --clock-id=<id_value>
, --freqhz
and –clock.freqHz
v++ --link --target hw --platform $PLATFORM_REPO_PATHS/xilinx_vck190_base_202210_1/xilinx_vck190_base_202210_1.xpfm
pl_kernels/s2mm.xo pl_kernels/mm2s.xo pl_kernels/polar_clip.xo ./aie/libadf.a --freqhz=200000000:mm2s.ap_clk --freqhz=200000000:s2mm.ap_clk
--config system.cfg --save-temps -o tutorial1.xsa
OR use system.cfg file to direct the clock using global freqhz
option and using [clock]
directive.
[connectivity]
nk=mm2s:1:mm2s
nk=s2mm:1:s2mm
nk=polar_clip:1:polar_clip
sc=mm2s.s:ai_engine_0.DataIn1
sc=ai_engine_0.clip_in:polar_clip.in_sample
sc=polar_clip.out_sample:ai_engine_0.clip_out
sc=ai_engine_0.DataOut1:s2mm.s
freqhz=200MHz:s2mm.ap_clk
[clock]
freqHz=100000000:polar_clip.ap_clk
Flag/Switch | Description |
---|---|
--link |
Tells v++ that it will be linking a design, so only the *.xo and libadf.a files are valid inputs. |
--target |
Tells v++ how far of a build it should go, hardware (which will build down to a bitstream) or hardware emulation (which will build the emulation models). |
--platform |
Same from the previous two steps. |
--freqhz |
Tells the Vitis compiler to use a specific clock defined by a nine digit number. Specifying this will help with the compiler make optimizations based on kernel timing. |
--config |
to specify the kernel config file that contains settings for synthesis like top function, kernel name etc. |
Once the linking is done, you can view clock report generated by v++ --link after pre-synthesis: automation_summary_pre_synthesis.txt
**IMPORTANT: Do not change anything in this view. This is only for demonstration purposes.**
-
As we can see that AIE compile frequency= 200 MHz (same as given in command in step 1)
-
To compile, PL kernel frequency for mm2s = 150 MHz (same as given in command in step 2.1)
-
To compile, PL kernel frequency for s2mm = 150 MHz (same as given in command in step 2.2)
-
To compile, PL kernel frequency for Polar_clip = 200 MHz (same as given in command in step 2.3)
To check the platform frequency, give command at terminal: platforminfo /proj/xbuilds/2024.1_daily_latest/internal_platforms/xilinx_vck190_base_202320_1/xilinx_vck190_base_202320_1.xpfm
Clock frequency used by Vitis for linking are derived in following way:
* Clock frequency used in linking for mm2s = 200 MHz (CLI)
* Clock frequency used in linking for s2mm = 200 MHz (CLI)
* Clock frequency used in linking for polar_clip = 100 MHz (config file)
Since these clock frequencies are not matching with the platform clock frequency, so vitis picked the clock frequency from the platform which is coming under the default tolerance (+/- 10%). If link frequency is outside the limit of tolerance new MMCM would be instantiated by Vitis to generate the clock frequency used in linking.
So, for linking, the clock frequency used by Vitis in a following way:
For mm2s:
Frequency given during linking = 200 MHz
Frequency used by Vitis = 208.33 MHz (platform clock coming under the default tolerance of clock frequency given in link command)
For s2mm:
Frequency given during linking = 200 MHz
Frequency used by Vitis = 208.33 MHz (platform clock coming under the default tolerance of clock frequency given in link command)
For polar_clip:
Frequency given during linking = 100 MHz
Frequency used by Vitis = 104.17 MHz (platform clock coming under the default tolerance of clock frequency given in link command)
NOTE: Any change to the system.cfg
file can also be done on the command line. Make sure to familiarize yourself with the Vitis compiler options by referring to the documentation here.
When the v++
linker is complete, you can compile the host code that will run on the Linux that comes with the platform. Compiling code for the design requires the location of the SDKTARGETSYSROOT or representation of the root file system, that can be used to cross-compile the host code.
-
Open
./sw/host.cpp
, and familiarize yourself with the contents. Pay close attention to API calls and the comments provided.Do take note that Xilinx Runtime (XRT) is used in the host application. This API layer is used to communicate with the PL, specifically the PLIO kernels for reading and writing data. To understand how to use this API in an AI Engine application, see Programming the PS Host Application.
The output size of the kernel run is half of what was allocated earlier. This is something to keep in mind. By changing the
s2mm
kernel from a 32-bit input/output to a 64-bit input/output, the kernel call will be adjusted. If this is not changed, it will hang because XRT is waiting for the full length to be processed when in reality half the count was done (even though all the data will be present). In thehost.cpp
, look at line 117 and 118 and comment them out. You should have uncommented the following line:xrtRunHandle s2mm_rhdl = xrtKernelRun(s2mm_khdl, out_bohdl, nullptr, sizeOut/2);
-
Open the
Makefile
, and familiarize yourself with the contents. Take note of theGCC_FLAGS
andGCC_LIB
.GCC_FLAGS
: Should be self-explanatory that you will be compiling this code with C++.GCC_LIB
: Has the list of all the specific libraries you will be compiling and linking with. This is the minimum list of libraries needed to compile an AI Engine application for Linux.
-
Close the makefile and run the command:
make host
.
With the host application fully compiled, you can now move to packaging the entire system.
To run the design on hardware using an SD card, you need to package all the files created. For a Linux application, you must make sure that the generated .xclbin
, libadf.a
, and all Linux info from the platform are in an easy to copy directory.
-
Open the
Makefile
with your editor of choice, and familiarize yourself with the contents specific to thepackage
task. -
In an easier to read command-line view, here is the command:
v++ --package --target hw --platform $PLATFORM_REPO_PATHS/xilinx_vck190_base_202410_1 /xilinx_vck190_base_202410_1 .xpfm \ --package.rootfs ${ROOTFS} \ --package.kernel_image ${IMAGE} \ --package.boot_mode=sd \ --package.image_format=ext4 \ --package.defer_aie_run \ --package.sd_file host.exe \ tutorial1.xsa libadf.a
NOTE: Remember to change the
${ROOTFS}
and${IMAGE}
to the proper paths.Here you are invoking the packaging capabilities of
v++
and defining how it needs to package your design.Switch/Flag Description --package.rootfs
This specifies the root file system to be used. In the case of the tutorial it is using the pre-built one from the platform. --package.kernel_image
This is the Linux kernel image to be used. This is also a using a pre-built one from the platform. --package.boot_mode
Used to specify how the design is to be booted. For this tutorial, an SD card will be used, and it will create a directory with all the contents needed to boot from one. --package.image_format
Tells the packager the format of the Kernel image and root file system. For Linux, this should be ext4
.--package.defer_aie_run
This tells the packager that when building the boot system to program the AI Engine, to stop execution. In some designs, you do not want the AI Engine to run until the application is fully loaded. --package.sd_file
Specify this to tell the packager what additional files need to be copied to the sd_card
directory and image. -
Run the command:
make package
. -
When the packaging is complete, do an
cd ./sw && ls
and notice that several new files were created, including thesd_card
directory. -
Format the SD card with the
sd_card.img
file.
When running the VCK190 board, make sure you have the right onboard switches flipped for booting from the SD card.
- Insert the SD card and turn ON the board.
- Wait for the Linux command prompt to be available on an attached monitor and keyboard.
- To run your application enter the command:
./host.exe a.xclbin
. - You should see a TEST PASSED which means that the application ran successfully!
IMPORTANT: To re-run the application, you must power cycle the board.
Modifying the target for both Step 3 and Step 5, link and package a design for hardware emulation, and run the emulation with the generated script, launch_hw_emu.sh
.
In this tutorial you learned the following:
- Adjusted clocking for PL Kernels and PLIO Kernels
- How to modify the
v++
linker options through the command-line, as well as the config file - How datawidth converters, clock-domain crossing, and FIFOs are inserted in
v++
- How to run an AI Engine application on a VCK190 board
Copyright © 2020–2024 Advanced Micro Devices, Inc