Active Disaster Recovery Strategy for Applications Deployed Across Multiple Kubernetes Clusters, Using Service Mesh and Serverless Workloads
Moshfeghifar, Amirhossein (2022)
Moshfeghifar, Amirhossein
2022
Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2022-08-16
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202206305939
https://urn.fi/URN:NBN:fi:tuni-202206305939
Tiivistelmä
The popularity of cloud computing has gained significantly throughout the recent years. There would be no cloud computing without Virtualization technologies. Virtualization is the foundation of cloud computing, and containerization is the next generation. Kubernetes is one of the most highly used container orchestration solutions available. It provides clusters with a set of control planes and workers to manage the containers' lifecycles. Deploying an application across multiple clusters provides features such as high availability, isolation, and scalability to the system. Kubernetes is a great tool for managing a single cluster; however, it has limitations in multi-cluster management. One of the fundamental approaches to multi-cluster Kubernetes is utilizing a Kubernetes network service mesh solution. This way, all clusters are meshed across the network. However, another big challenge is architecting an application deployment across geo-graphically separated clusters. Any failure in one cluster or a running application service can impact other clusters causing a disaster in the whole system. In this thesis, we propose and design an active disaster recovery strategy for applications that are spread across multiple Kubernetes clusters, eliminating the failure points. Meanwhile, part of the application will run on a serverless platform hosted on one of the clusters to provide higher performance and optimize resource utilization. Such use cases are the clusters running on the edge of the cloud or backup clusters running in the same region in case there is a burst of unpredictable incoming traffic to the system. The performance and resource utilization of the designed solution was evaluated by running several experiments. The experiments simulate several failure scenarios, and the designed architect availability was promising and practical to implement.