Comparative study of DMA controllers in Pulp Systems and SoC-Hub
Amjad, Zeeshan (2025)
Amjad, Zeeshan
2025
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2025-12-05
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2025120311233
https://urn.fi/URN:NBN:fi:tuni-2025120311233
Tiivistelmä
In modern embedded system, Direct Memory Access (DMA) has enhanced data movement within system. This thesis presents a comparative analysis of DMA operations in the Parallel Ultra-Low-Power (PULP) platform and the SoC-Hub architecture. The work focuses on data transfer between L1–L2 memory in PULP and evaluates SoC-Hub’s Command-Execute-Interrupt DMA model, which integrates AXI4-Lite configuration, AXI4-Master transfers, 32/64-bit data widths, burst capability, and hardware-assisted ping-pong synchronization through its modular Burst Transfer Unit (BTU). Three experiments are performed on PULP system to measure latency, throughput and data transfer stability under different synchronization settings. Latency experiments measured transfer delay across different buffer sizes, showing higher cycle counts for small transfers due to setup overhead and steady performance for larger ones. Throughput analysis assessed data rate per cycle unveiling improved efficiency with increasing buffer size until reaching a saturation point. Sweep experiments varied transfer configurations and discovered balanced synchronization as the most efficient setup, minimizing stalls and contention in the memory system. The findings show that DMA efficiency in PULP depends on proper tuning of memory size and synchronization parameters. In comparison, SoC-Hub’s architecture was evaluated theoretically and the findings show that double-register ping-pong mechanism and modular BTU design aim to minimize reprogramming latency, improve data pipeline continuity, and support scalable high-performance memory operations.
