Single Image Super-Resolution Using Convolutional Neural Networks
Saarimäki, Ansse (2018)
Saarimäki, Ansse
2018
Tietotekniikka
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-12-05
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201811212701
https://urn.fi/URN:NBN:fi:tty-201811212701
Tiivistelmä
Enlargement of images is a common need in many applications. Although increasing the pixel count of an image is easy with simple interpolation methods, those fail to increase the amount of details in the image. Single image super-resolution (SISR) aims to solve this ill-posed problem of producing a high resolution (HR) image from a given low resolution (LR) image. A single LR image has always an infinite number of corresponding LR images, but some of those are more probable than others. This probability density can be estimated with machine learning techniques, and the most probable HR image can be constructed based on that estimate.
In recent years artificial neural networks have become the most popular machine learning methods. Convolutional neural networks (CNN) are a subtype of them, inspired by the human visual system. They are used extensively in all fields of image processing, including single image super-resolution. In this thesis different CNN based methods for SISR are compared, and their performance is analyzed using both quantitative and qualitative methods. In total four CNN methods were chosen, and they were compared to three other methods. One of the reference methods was based on more traditional machine learning, and the two others were based on self-similarity of the input images. In contrast to machine learning approach, self-similarity based methods utilize only information in the input image and do not require any training on external images.
The results show that CNN based methods outperform the alternative approaches in both quantitative metrics and qualitative analysis. The methods perform especially well with images that have clear structures and sharp edges, but highly textured images tend to be problematic. Six of the methods aim to minimize pixel-wise reconstruction error, which leads to overly smooth output on textured areas. One method was instead designed to maximize the perceptual quality of the images, at the cost of increased reconstruction error. It was able to generate very realistic textures in some cases, but had a tendency to hallucinate very implausible textures into flat areas. Also other CNN based methods tended to create erroneous but plausible details, which might be misleading in critical applications like medical imaging. CNN based SISR is more suitable for entertainment and other consumer applications, especially when the perceptually optimized methods are developed further.
In recent years artificial neural networks have become the most popular machine learning methods. Convolutional neural networks (CNN) are a subtype of them, inspired by the human visual system. They are used extensively in all fields of image processing, including single image super-resolution. In this thesis different CNN based methods for SISR are compared, and their performance is analyzed using both quantitative and qualitative methods. In total four CNN methods were chosen, and they were compared to three other methods. One of the reference methods was based on more traditional machine learning, and the two others were based on self-similarity of the input images. In contrast to machine learning approach, self-similarity based methods utilize only information in the input image and do not require any training on external images.
The results show that CNN based methods outperform the alternative approaches in both quantitative metrics and qualitative analysis. The methods perform especially well with images that have clear structures and sharp edges, but highly textured images tend to be problematic. Six of the methods aim to minimize pixel-wise reconstruction error, which leads to overly smooth output on textured areas. One method was instead designed to maximize the perceptual quality of the images, at the cost of increased reconstruction error. It was able to generate very realistic textures in some cases, but had a tendency to hallucinate very implausible textures into flat areas. Also other CNN based methods tended to create erroneous but plausible details, which might be misleading in critical applications like medical imaging. CNN based SISR is more suitable for entertainment and other consumer applications, especially when the perceptually optimized methods are developed further.