As enterprises move Kubernetes into production and increase the number of Kubernetes clusters and applications in use, they need to deliver the same “enterprise-level” services as for other production applications. Implementing Kubernetes backup is critical to protect your applications in the event of an accident, system failure, or deliberate attack. You need an effective and appropriate backup strategy—in addition to whatever built-in resiliency and data protection features your applications may have.
There are several use cases that your Kubernetes backup and recovery strategy should satisfy:
- Recover entire clusters in case of disaster
- Recover a specific application (for instance after data corruption)
- Migrate a cluster from one environment to another (on-prem to cloud or vice versa)
The problem with backing up Kubernetes is that it is easy to get it wrong, making it difficult (or impossible) to restore a cluster or application to a functional state without a heroic effort. This blog explores Kubernetes backup risks and Kubernetes backup best practices by examining seven common Kubernetes backup and recovery mistakes.
Why is Backing Up Kubernetes Different?
Backing up an application running on Kubernetes is not like backing up an app running in a virtual machine. Kubernetes environments are far more dynamic. A single application in production may consist of hundreds of components, including containers/pods, ConfigMaps, certificates, secrets, and volumes.
Instances of containers come and go according to load or other factors, and data is written to separate persistent volumes or PVs, which can be created dynamically. To enable a successful restore, a Kubernetes application backup has to capture all of this information. A full Kubernetes cluster backup must include: all Kubernetes control plane data stored in /etcd, all namespaces, and all PVs.
Kubernetes Backup and Recovery Mistakes
Operators and developers make errors, even the most robust hardware and software can crash, and with cyber crime reaching new heights, your backups are often the last line of defense against malfeasance. Because Kubernetes backup and recovery is essential to the continued success of your business, here are seven common mistakes you will want to avoid at all costs:
Mistake 1: Managing backups with manually written scripts
If your environment is still small, it may be possible to write scripts that use Kubernetes APIs to backup or snapshot the pods, services, ConfigMaps, data stores, and secrets associated with each application. The problem with this approach is it simply doesn’t scale. As the number of clusters and applications you have deployed grows, it becomes impossible to keep up. And, as in any environment that depends too heavily on manual scripts, your operations become brittle. It’s difficult to keep track of everything, you end up with different variants for different clusters, and only a few experts can troubleshoot failures.
Mistake 2: Failure to automate backup processes
The purpose of Kubernetes is to enable you to automate your operations to the greatest extent possible. Automation is essential for operations at scale. Any task that has to be performed on a regular basis, should be automated. This includes your cluster and application backups. Because Kubernetes environments change quickly, you’ll want the ability to perform ad hoc backups in addition to scheduled ones.
Mistake 3: Managing backups using multiple tools
Many Kubernetes environments have grown organically over time, so it’s not that unusual to have multiple backup tools:
- You may be using different tools for different Kubernetes distributions, or different tools for on-premises clusters versus backups running in public cloud.
- You may use an existing tool or script for backing up /etcd (control plane) while relying on another tool or the capabilities of your storage platform to backup PVs.
This can all seem like it’s working fine until disaster strikes. Having multiple tools almost always introduces Kubernetes recovery issues. How do you coordinate restores from two or more tools and restore to a single point in time? What if you have to restore the same application in multiple different clusters? This is when it pays to have a single tool with everything dead simple (or as simple as possible) so that restores work the same everywhere with no surprises.
Mistake 4: Not backing up persistent volumes
If you really want your persistent volumes to be persistent, you have to back them up. Kubernetes environments are dynamic, so your backups need to keep up with the state of your cluster and applications. If your backups are only designed to back up the PVs you know about, sooner or later it will fail to back everything up. Whether homegrown or commercial, your Kubernetes backup deployment needs to be able to inspect the state of the cluster to determine what PVs to back up. Policy-based solutions can ensure that you’re backing up PVs at the necessary interval to deliver the desired service level.
Mistake 5: Not having the ability to monitor backup execution
There’s no worse feeling for an administrator than initiating an urgent restore only to discover that the backups have been silently failing for the last two weeks. Whatever backup tool you use, it absolutely must have the ability to monitor backup execution and validate success. This capability has become all the more important now that a favorite trick of sophisticated ransomware attacks is to silently disable backups. When you can’t recover, your only choice is to pay.
Mistake 6: Not storing backup credentials securely
As your Kubernetes environment expands to multiple clusters, you’ll need separate backup credentials for each cluster. These credentials can be a major point of vulnerability. At a minimum, they need to be managed and protected from unauthorized access. Ideally, you need a Kubernetes backup deployment that automates handling of backup and other credentials.
Our recent blog, Securing Kubernetes: Applying Zero Trust Principles to Your Kubernetes Environment explains how to apply built-in Kubernetes capabilities to prevent unauthorized access to your Kubernetes clusters.
Mistake 7: Failure to meet compliance requirements
Every industry has regulatory requirements that must be met, and most organizations also have their own corporate governance. It’s important to know what these requirements are, ensure that you meet or exceed them, and have the ability to demonstrate compliance to auditors and security teams.
Regular reporting can uncover gaps and overlooked Kubernetes backup best practices, reducing the risk of data loss or data breach and preventing excessive downtime and hits to your brand.
Kubernetes Backup and Restore at Rafay
Rafay’s Backup & Restore Service avoids the backup mistakes and eliminates the backup headaches described above. It supports critical backup/restore, disaster recovery, and migration use cases for Kubernetes clusters and applications running on-premises, in the cloud, and at the edge.
Rafay implements Kubernetes backup best practices in a solution that is easy-to-use, scalable, and secure. Create and manage backup policies, backup locations, and restore policies on a per-project basis, and leverage policies across clusters to achieve greater operational consistency and eliminate common Kubernetes backup risks.
With Rafay’s Backup & Restore Service, you can:
- Centrally configure, automate and operationalize disaster recovery (DR)
- Leverage easy to use workflows to create and manage backups for your entire fleet of clusters
- Eliminate manual handling of backup credentials for enhanced security
- Use flexible controls to specify what data to include or exclude
Ready to find out why so many enterprises and platform teams have partnered with Rafay to streamline Kubernetes operations? Sign up for a free trial today and follow our quickstart guide to see what the Rafay Backup & Restore Service can do.