Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
  •   Etusivu
  • Trepo
  • TUNICRIS-julkaisut
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Efficient OpenCL system integration of non-blocking FPGA accelerators

Leppänen, Topi; Lotvonen, Atro; Mousouliotis, Panagiotis; Multanen, Joonas; Keramidas, Georgios; Jääskeläinen, Pekka (2023-03)

 
Avaa tiedosto
1_s2.0_S0141933123000182_main.pdf (1.325Mt)
Lataukset: 



Leppänen, Topi
Lotvonen, Atro
Mousouliotis, Panagiotis
Multanen, Joonas
Keramidas, Georgios
Jääskeläinen, Pekka
03 / 2023

Microprocessors and Microsystems
104772
doi:10.1016/j.micpro.2023.104772
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202303243135

Kuvaus

Peer reviewed
Tiivistelmä
<p>OpenCL functions as a portability layer for diverse heterogeneous hardware platforms including CPUs, GPUs, FPGAs, and hardware accelerators. However, OpenCL programs utilizing multiple of these devices in the same computing platform suffer from poor coordination between OpenCL implementations of different hardware vendors. This paper proposes a vendor-independent open source method for integrating custom FPGA accelerators into a common OpenCL platform. The accelerators are wrapped in a common hardware interface to enable efficient synchronization and data sharing between devices on the same chip. The provided software connects the accelerator to OpenCL runtime and enables the control of diverse FPGA accelerators with OpenCL command queues. The benefits of the integration methodology are demonstrated by creating FPGA accelerators with different development tools and integrating them together on two different types of FPGA devices while showing minimal integration overhead. Direct memory access of the accelerator to external memory is shown to increase the performance by a factor of 8. Non-blocking execution enabled by the on-chip synchronization between devices is shown to remove a 250 μs overhead from dependent kernel launches. Additionally, as a proof of concept and a case study, a fully OpenCL-controllable computing platform with two devices is implemented on an FPGA to compute CNN inference on a real-world input signal.</p>
Kokoelmat
  • TUNICRIS-julkaisut [23470]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste