Image Manipulation and Animation Using Deep Generative Networks
Tripathy, Soumya (2024)
Tripathy, Soumya
Tampere University
2024
Tieto- ja sähkötekniikan tohtoriohjelma - Doctoral Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2024-08-23
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-3542-7
https://urn.fi/URN:ISBN:978-952-03-3542-7
Tiivistelmä
With the popularity of social media, image and video-based interactions have significantly increased. Generating and manipulating visual data needs to be simplified for seamless visual interactions among people. It can enable users, without any content creation background, to effortlessly create and share visual data. Apart from its use in social interactions, A faster and easy-to-use image manipulation tool has several applications in movie post-production, the animation industry, and virtual reality, among others. The advancement in data-driven deep learning models presents the most exciting avenue for creating such tools.
The success of Convolutional Neural Networks (CNNs) in discriminative modeling like object classification, detection, and segmentation has inspired its introduction to generative modeling, where it simulates the training data generation process. The Generative Adversal Networks (GAN) are the flagbearers of data-driven generative modeling at the current time. They have produced outstanding image quality for random generations of human faces, cars, animals, buildings, etc. However, the GANs have not been able to replicate similar success in controlled image manipulation or animation tasks like face reenactment, object animation, image translation, and others.
We study several new works in the direction of controlled image manipulation using GANs to improve such models’ quality, understanding, and usability. One of the critical aspects of high-quality image manipulation is having a large collection of training data containing inputs and ground truth manipulated images to learn the correspondence between them. We identify that such a requirement is too restrictive for many tasks. We propose strategies to harness a large number of weakly supervised data with few images having strong annotation to generate high-quality images using GANs.
In image manipulation tasks such as facial animation, we experimentally demonstrate that existing models cannot generate high-quality animation from single imvii ages. We propose two models that allow users to interact directly with the human interpretable features of the model to create high-quality videos of faces from a single input image. Our final work extends the face manipulation models to handle nonface objects like body parts and animals. We propose models to learn robust features from objects in the images that are shape or identity independent, and these features help generate high-quality animation at the output. Extensive experiments are performed to evaluate the quality of generated images from our model. Our extensive comparisons with contemporary approaches demonstrate that our models perform superior in various image manipulation and animation tasks.
The success of Convolutional Neural Networks (CNNs) in discriminative modeling like object classification, detection, and segmentation has inspired its introduction to generative modeling, where it simulates the training data generation process. The Generative Adversal Networks (GAN) are the flagbearers of data-driven generative modeling at the current time. They have produced outstanding image quality for random generations of human faces, cars, animals, buildings, etc. However, the GANs have not been able to replicate similar success in controlled image manipulation or animation tasks like face reenactment, object animation, image translation, and others.
We study several new works in the direction of controlled image manipulation using GANs to improve such models’ quality, understanding, and usability. One of the critical aspects of high-quality image manipulation is having a large collection of training data containing inputs and ground truth manipulated images to learn the correspondence between them. We identify that such a requirement is too restrictive for many tasks. We propose strategies to harness a large number of weakly supervised data with few images having strong annotation to generate high-quality images using GANs.
In image manipulation tasks such as facial animation, we experimentally demonstrate that existing models cannot generate high-quality animation from single imvii ages. We propose two models that allow users to interact directly with the human interpretable features of the model to create high-quality videos of faces from a single input image. Our final work extends the face manipulation models to handle nonface objects like body parts and animals. We propose models to learn robust features from objects in the images that are shape or identity independent, and these features help generate high-quality animation at the output. Extensive experiments are performed to evaluate the quality of generated images from our model. Our extensive comparisons with contemporary approaches demonstrate that our models perform superior in various image manipulation and animation tasks.
Kokoelmat
- Väitöskirjat [4901]