Abstract—This paper presents a demonstration setup for distributed real-time HEVC encoding on a multi-computer system. The demonstrated multi-level parallelization scheme is implemented in the practical Kvazaar open-source HEVC encoder. It allows Kvazaar to exploit parallelism at three levels: 1) Single Instruction Multiple Data (SIMD) optimized coding tools at the data level; 2) Wavefront Parallel Processing (WPP) and Overlapped Wavefront (OWF) parallelization strategies at the thread level; and 3) distributed slice encoding on multi-computer systems at the process level. This interactive demonstration allows visitors to gradually increase the degree of parallelism in Kvazaar and see the benefits of parallelization in live HEVC encoding. Exploiting all three levels of parallelism on a three-laptop setup speeds up Kvazaar by almost 21× over a non-parallelized single-core implementation of Kvazaar.

Keywords—HEVC, multi-level parallelization, distributed HEVC encoding, multi-computer systems

I. INTRODUCTION

High Efficiency Video Coding (HEVC) [1] creates the conditions for cost-effective video transmission and storage, but its inherent computational complexity calls for efficient parallel implementations. This is particularly true with high-quality, live high-definition (HD) video streaming.

This work demonstrates our multi-level parallelization scheme [2] that is designed for the open-source Kvazaar HEVC encoder [3] for its practical encoding capacity. This work extends Kvazaar parallelization to multi-computer systems and enables Kvazaar to exploit parallel encoding at the following three levels simultaneously:
1) Data level: Kvazaar coding tools are optimized for Single Instruction Multiple Data (SIMD) processor extensions.
2) Thread level: HEVC Coding Tree Units (CTUs) [1] are encoded on multiple CPU cores in parallel by using Wavefront Parallel Processing (WPP) and Overlapped Wavefront (OWF) parallelization techniques.
3) Process level: HEVC slices [1] are encoded on multiple computers by using Transmission Control Protocol (TCP) over Gigabit Ethernet (1GbE) for communication.

II. DEMONSTRATION SETUP

Fig. 1 depicts our end-to-end demonstrator showcased to the visitors. The raw 1080p (1920 × 1080) video is shot by a 4K action camera at 30 frames per second (fps). The raw feed is captured by a HDMI capture card, converted from YUV 4:2:2 to YUV 4:2:0 format, encoded to HEVC bit stream by Kvazaar, and sent over the GbE switch to the fourth laptop, which decodes it with an openHEVC decoder for playback. The components are detailed in Table I.

Kvazaar is configured to All-Intra (AI) coding configuration, medium preset [3], and the Quantization Parameter (QP) value of 32. The degree of Kvazaar parallelism can be gradually increased from a single-threaded implementation on a single laptop to a SIMD-optimized and multi-threaded configuration on three laptops.

III. VISITOR EXPERIENCE

Visitors can interact with the demonstrator through a Graphical User Interface (GUI) and monitor in real-time: 1) the coding speed graph; 2) the CPU usages of the laptops; and 3) the visual quality and frame rate of the decoded video.

The demonstrator seeks to illustrate the stringent requirements of a parallel encoder implementation in live HEVC coding. Visitors are able to switch on the fly between the different parallelization degrees (I-V) listed in Table II and see in real-time their impact on coding speed.

Table II: Coding speed with the demonstrated configurations

<table>
<thead>
<tr>
<th>Parallelization degree</th>
<th>I</th>
<th>II</th>
<th>III</th>
<th>IV</th>
<th>V</th>
</tr>
</thead>
<tbody>
<tr>
<td>SIMD optimizations</td>
<td>no</td>
<td>yes</td>
<td>yes</td>
<td>yes</td>
<td>yes</td>
</tr>
<tr>
<td>Thread parallelization</td>
<td>single single multi multi multi</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Process parallelization</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>Encoding speed</td>
<td>1fps</td>
<td>3fps</td>
<td>12fps</td>
<td>21fps</td>
<td>30fps</td>
</tr>
</tbody>
</table>

IV. REFERENCES


This work was supported in part by the European Celtic-Plus project VIRTUOSE and the Academy of Finland (decision no. 301820).