-
Notifications
You must be signed in to change notification settings - Fork 471
Description
The mainline Xilinx DMA driver creates descriptors strictly on the boundaries of a page,
thus limiting them to only 4096 bytes. This way it makes very inefficient use of them.
I have reworked the DMA procesure from ground up. It merges contiguous
address ranges both in physical and DMA (bus) address spaces into as few descriptors as
possible. This reduces transfer overhead, as larger transfers no longer have to be split
and interrupted to be iteratively submitted to the IP core. It is particulary optimised
for modern CPUs with an IOMMU, that may be able to map the whole data buffer into a
contiguous DMA address range.
Furthermore, handling of the transfer has been drastically slimmed down to bring down the
latency as far as possible. It aspires to implement Figures 5-7 from PG195 in
most efficient manner. Among the improvements:
- Length of the adjacent descriptor blocks gets adapted to the Maximum Request Read
Size (MRRS) as PG195 (p. 24) commands. - The number of max descriptors per transfer could be adapted to FIFO size and
number of channels (per p. 26 in PG195). However, due to a bug in the IP Core
(see Known Issues) a transfer is limited to a single adjacent block. It is still
much larger than in the old driver: typically just under 4 GB, double of that if
the MRRS is 1024 bytes. - The memory for engines is allocated dynamically, which saves a little bit of
kernel memory.
In order to achieve these goals the driver has following limitations and becuase of that,
won't be submitted as PR here:
- Removed transfer queues and support for asynchronous I/O. Allegedly it is
broken, and almost noone uses it anyway. - The backward support for Linux kernels is limited to version 4.12
- The individual engines are each allowed to be opened and operated by a single thread.
This permits to mostly eliminate locking.
This reworked version includes my PRs (in stable form) as well as relevant PRs by others
(like alonbl's patch set) that doesn't concern XDMA procedure (since it was thrown out in
full anyway).
Other features:
- Reworked poll mode as compile-only option. Everything done in the same thread so there are no core migrations or context switches
- Reworked ioctl functions. Added ability to submit transfer request over ioctl. This is
primarily intended to circumvent 2 GB limit for read and write operations in Linux. - Reworked bypass BAR. It is now properly implemented, so it is possible to transmit
data on it. It could be useful for small transfers, that require low and stable latency. - Descriptor bypass is NOT supported
You can find the reworked driver on my repo under "reworked_xdma_main" branch.
https://github.com/Prandr/dma_ip_drivers
Further discussions are best conducted there.