You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AI_Engine_Development/AIE/Design_Tutorials/08-n-body-simulator/Module_03_pl_kernels/README.md
+2-5Lines changed: 2 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,7 +62,7 @@ After coming up with 400 tile AI Engine design, the next step is the come up wit
62
62
|`packet_receiver`|Packet switching kernel that evaluates packet headers from incoming streams and reroutes data to one of 4 AXI4-Streams|499.5 MHz|
63
63
|`s2mm_mp`|Quad-channel data-mover that moves data from AXI4-Stream to DDR.|411 MHz|
64
64
65
-
Using Vivado timing closure techniques, you can increase the FMax if needed. To showcase the example, integrate using the 300 MHz clock. There is also a 400 MHz timing-closed design in the [beamforming tutorial](https://github.com/Xilinx/Vitis-Tutorials/tree/master/AI_Engine_Development/Design_Tutorials/03-beamforming).
65
+
Using Vivado timing closure techniques, you can increase the FMax if needed. To showcase the example, integrate using the 300 MHz clock. There is also a 400 MHz timing-closed design in the [beamforming tutorial](../../03-beamforming).
66
66
67
67

68
68
@@ -95,10 +95,7 @@ The `s2mm_mp` kernel is generated from the `kernel/spec.json` specification. Rev
After compiling the PL datamover kernels, you are ready to link the entire hardware design together in the next module, [Module 04 - Full System Design](../Module_04_full_system_design).
Copy file name to clipboardExpand all lines: AI_Engine_Development/AIE/Design_Tutorials/08-n-body-simulator/Module_04_full_system_design/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,9 +50,9 @@ The following image was taken from the Vivado project for the entire design. It
50
50
51
51
## References
52
52
53
-
*[Beamforming Tutorial - Module_04 - AI Engine and PL Integration](https://github.com/Xilinx/Vitis-Tutorials/tree/master/AI_Engine_Development/Design_Tutorials/03-beamforming)
53
+
*[Beamforming Tutorial - Module_04 - AI Engine and PL Integration](../../03-beamforming)
After compiling the host software, you are ready to create the sd_card.img and run the design on hardware in the next module, [Module 06 - SD Card and Hardware Run](../Module_06_sd_card_and_hw_run).
Copy file name to clipboardExpand all lines: AI_Engine_Development/AIE/Design_Tutorials/08-n-body-simulator/Module_07_results/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,8 +33,8 @@ Following is a table comparing the executions times to simulate 12,800 particles
33
33
|Name|Hardware|Algorithm|Average Execution Time for 1 Timestep (seconds)|
34
34
|---|---|--|---|
35
35
|Python NBody Simulator|x86 Linux Machine|O(N)|14.96|
36
-
|C++ NBody Simulator|A72 Embedded Arm Processor|O(N<sup>2</sup>)|120.487|
37
-
|AI Engine NBody Simulator|Versal AI Engine IP|O(N)|0.0118|
36
+
|C++ NBody Simulator|A72 Embedded Arm Processor|O(N<sup>2</sup>)|120.591|
37
+
|AI Engine NBody Simulator|Versal AI Engine IP|O(N)|0.0074065|
38
38
39
39
As you can see, the N-Body Simulator implemented on the AI Engine offers a x2,800 improvement over the Python O(N) implementation and a x24,800 improvement over the C++ O(N<sup>2</sup>) implementation. A vectorized C++ NBody Simulator O(N) implementation can be created with pthreads, but is left as an exercise for the user.
1. Obtain a license to enable beta devices in AMD tools (to use the VCK190 platform).
43
41
2. Obtain licenses for AI Engine tools.
44
42
3. Follow the instructions for the [Vitis Software Platform Installation](https://docs.amd.com/r/en-US/ug1393-vitis-application-acceleration/Vitis-Software-Platform-Installation) and ensure you have the following tools:
45
43
46
44
*[Vitis™ Unified Software Development Platform 2024.2](https://docs.amd.com/v/u/en-US/ug1416-vitis-documentation)
47
-
*[Xilinx® Runtime and Platforms (XRT)](https://docs.amd.com/r/en-US/ug1393-vitis-application-acceleration/Installing-Xilinx-Runtime-and-Platforms)
48
45
*[Embedded Platform VCK190 Base or VCK190 Base](https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/embedded-platforms.html)
49
46
50
47
### *Environment*: Setting Up Your Shell Environment
@@ -83,17 +80,15 @@ which aiecompiler
83
80
### HPC Applications
84
81
The goal of this tutorial is to create a general-purpose floating point accelerator for HPC applications. This tutorial demonstrates a x24,800 performance improvement using the AI Engine accelerator over the naive C++ implementation on the A72 embedded Arm® processor.
85
82
86
-
#### A similar accelerator example was implemented on the AMD UltraScale+™-based Ultra96 device using only PL resources [here](https://www.hackster.io/rajeev-patwari-ultra96-2019/ultra96-fpga-accelerated-parallel-n-particle-gravity-sim-87f45e).
87
-
88
83
89
84
|Name|Hardware|Algorithm Complexity|Average Execution Time to Simulate 12,800 Particles for 1 Timestep (seconds)|
90
85
|---|---|--|---|
91
86
|Python N-Body Simulator|x86 Linux Machine|O(N)|14.96|
92
-
|C++ N-Body Simulator|A72 Embedded Arm Processor|O(N<sup>2</sup>)|120.487|
93
-
|AI Engine N-Body SImulator|Versal AI Engine IP|O(N)|0.0118|
87
+
|C++ N-Body Simulator|A72 Embedded Arm Processor|O(N<sup>2</sup>)|120.591|
88
+
|AI Engine N-Body SImulator|Versal AI Engine IP|O(N)|0.007405|
94
89
95
90
### PL Data-Mover Kernels
96
-
Another goal of this tutorial is to showcase how to generate PL Data-Mover kernels from the [AMD Vitis Utility Library](https://docs.amd.com/r/en-US/Vitis_Libraries/utils/datamover/kernel_gen_guide.html). These kernels moves any amount of data from DDR buffers to AXI-Streams.
91
+
Another goal of this tutorial is to showcase how to generate PL Data-Mover kernels These kernels moves any amount of data from DDR buffers to AXI-Streams.
97
92
98
93
## The N-Body Problem
99
94
The N-Body problem is the problem of predicting the motions of a group of N objects which each have a gravitational force on each other. For any particle `i` in the system, the summation of the gravitational forces from all the other particles results in the acceleration of particle `i`. From this acceleration, we can calculate a particle's velocity and position (`x y z vx vy vz`) will be in the next timestep. Newtonian physics describes the behavior of very large bodies/particles within our universe. With certain assumptions, the laws can be applied to bodies/particles ranging from astronomical size to a golf ball (and even smaller).
@@ -272,8 +267,6 @@ By default, the Makefiles build the design for the VCK190 Production board (i.e.
272
267
273
268
*[N-body problem wiki page](https://en.wikipedia.org/wiki/N-body_problem)
Let's get started with running the python model of the N-Body simulator on an x86 machine in [Module 01 - Python Simulations on x86](Module_01_python_sims).
0 commit comments