Compilation of Single Program Multiple Data kernels with independent subgroup forward progress on CPUs
Nevalainen, Tapio (2025)
Nevalainen, Tapio
2025
Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2025-12-29
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2025122812148
https://urn.fi/URN:NBN:fi:tuni-2025122812148
Tiivistelmä
Heterogeneous computing often utilizes Single Program Multiple Data (SPMD) style programming models, such as OpenCL, that allow well-scalable parallelization of data-parallel problems on various hardware architectures. The OpenCL framework provides an abstraction of work through kernel functions and concurrently executing work-items, the instances of kernel, that can synchronize between each other on different granularities, termed subgroups and workgroups. When targeting CPUs, special compiler techniques are utilized to transform the kernels in a way that parallel resources of CPUs can be optimally leveraged. Enabling autovectorization, which entails usage of vector instructions, is an important part of this transformation.
This thesis work adds support for proper subgroup-level synchronization into the kernel compiler of an open-source OpenCL implementation PoCL. More specifically, the added feature covers a special case of independent subgroup synchronization, which allows groups of work-items to execute through kernel program independently of other groups. This allows PoCL to compile and execute programs that require this feature.
The implemented feature is successful in compiling programs that require independent subgroup synchronization, and there is no observed compiler regression in the case of other types of programs. The performance of the implementation is on par with the previous version of PoCL.
This thesis work adds support for proper subgroup-level synchronization into the kernel compiler of an open-source OpenCL implementation PoCL. More specifically, the added feature covers a special case of independent subgroup synchronization, which allows groups of work-items to execute through kernel program independently of other groups. This allows PoCL to compile and execute programs that require this feature.
The implemented feature is successful in compiling programs that require independent subgroup synchronization, and there is no observed compiler regression in the case of other types of programs. The performance of the implementation is on par with the previous version of PoCL.
