Scalability optimizations for multicore soft processors
Leppänen, Topi (2021)
Leppänen, Topi
2021
Sähkötekniikan DI-ohjelma - Master's Programme in Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-03-10
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202102122067
https://urn.fi/URN:NBN:fi:tuni-202102122067
Tiivistelmä
The growth of single core performance and energy efficiency have been stagnating for decades. Multicore systems are an efficient way to do parallel computing at the level of threads. Additionally, specializing the processor architecture to better fit the application at hand is one approach to achieving the required energy and performance improvements.
TCEMC is a toolset under development at Tampere University which generates multicore application-specific instruction set processors. This thesis evaluates and improves the toolset. The toolset's ability to scale up the number of cores is tested by seeing how many cores can be fitted on a small field-programmable gate array device.
An 8-fold increase in performance is achieved with 24 cores, compared to the equivalent single core system. The external memory bandwidth can be utilized with 11.4% efficiency. The remaining bottlenecks of the multicore soft processors are highlighted which remain to be solved to unleash the full potential of customized multicore systems. The most important of these being the scalar access to the external memory, which is shown to be an inefficient way to utilize the external memory bandwidth.
TCEMC is a toolset under development at Tampere University which generates multicore application-specific instruction set processors. This thesis evaluates and improves the toolset. The toolset's ability to scale up the number of cores is tested by seeing how many cores can be fitted on a small field-programmable gate array device.
An 8-fold increase in performance is achieved with 24 cores, compared to the equivalent single core system. The external memory bandwidth can be utilized with 11.4% efficiency. The remaining bottlenecks of the multicore soft processors are highlighted which remain to be solved to unleash the full potential of customized multicore systems. The most important of these being the scalar access to the external memory, which is shown to be an inefficient way to utilize the external memory bandwidth.