The Kubernetes Current Blog

Kubernetes Cluster Backup & Restore with Velero and Rafay

Kubernetes cluster backup and restore is one of the top priorities for DevOps teams and KubeMasters running containerized applications in production. Without this critical capability, organizations run the risk of a disruption of service and unplanned downtime. 

The Rafay Kubernetes Management Cloud provides the means for administrators to configure and perform backups for their clusters quickly using popular solutions such as Velero and several others. In this blog, I will describe how administrators can use Velero and Rafay to backup and restore Kubernetes cluster resources.

What is Velero?

Velero is a popular open source tool that can be used to backup and restore Kubernetes cluster resources and persistent volumes. Velero can be used for:

  • Disaster Recovery
  • Data Migration
  • Data Protection

Velero utilizes custom resource definitions (CRDs) to backup and restore the Kubernetes cluster resources. Velero uses Amazon S3 API-compatible object storage as its backup target providing users with a plethora of storage options that provide Amazon S3 compatible APIs. 

Organizations that wish to standardize on using Velero for backup and restore operations can also use Rafay’s cluster blueprinting capability to ensure Velero is employed for all or a subset of clusters. Cluster blueprints enable DevOps teams to install, manage, and enforce the use of a baseline set of software addons across a fleet of Kubernetes clusters to ensure consistency and achieve compliance. 

Installing & Configuring Velero using Rafay

For this exercise, we will use an AWS S3 bucket as the target for Kubernetes cluster backup. Alternatively, we can also use MinIO as the target since it provides compatibility with AWS S3 APIs. 

Configuring S3 Bucket for Cluster Data Backup

Velero can be used to backup both Kubernetes objects and persistent volumes. In this example, we will configure it to backup and restore Kubernetes objects.

Step 1: Create AWS S3 bucket

AWS S3 bucket name to use for cluster backup: k8s-data-backup

AWS Region: us-east-1

Step 2: Configure IAM policy to allow access to the S3 bucket 

Create an AWS credential for an AWS IAM user which has the permissions for the cluster backup S3 bucket as per the IAM policy below.

IAM Policy for the S3 credentials:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": [
            "Effect": "Allow",
            "Action": [
            "Resource": [

Step 3: Create AWS Credentials 

Create an AWS access key ID and access key secret for the above IAM user. This will be used for programmatic access to AWS to ensure data can be written and read from the specified s3 bucket. 

Velero Cluster Backup Configuration

We want Velero’s resources to be deployed to a dedicated namespace on a target cluster. Below are the steps to deploy Velero to the target cluster:

Step 1: Create Velero namespace

Create “velero” namespace in Rafay. This will allow the Rafay controller to dynamically create this namespace when a cluster blueprint is deployed to a target cluster.

Step 2: Create an addon for cloud credential secret to access backup storage location

Create a S3 cloud credential secret using below YAML template:

apiVersion: v1
kind: Secret
  name: velero-aws-creds
  namespace: velero
type: Opaque
  cloud: |-
<base64 encoding of the cloud credential file>

The cloud credential file format is as follows:

aws_access_key_id: "your_aws_s3_access_key"
aws_secret_access_key: "your_aws_s3_secret_access_key"

Create an addon for this cloud credential as Kubernetes YAML type in “velero” namespace using the above secret YAML file:

Step 3: Create an addon service for Velero controller 

a) Download the latest Velero helm chart

    1. helm repo add vmware-tanzu
    2. helm fetch vmware-tanzu/velero

b) Create a custom-values.yaml file and include the following custom values to deploy the Velero controller. In this example, we have configured a backup schedule which will automatically execute every four hours and backup all Kubernetes objects to the specified S3 bucket. With Velero, it is also possible to exclude the objects that need to be backed up. 

## Configuration settings that directly affect the Velero deployment YAML.

## Enable initContainers for velero-plugin-for-aws
# Init containers to add to the Velero deployment's pod spec. At least one plugin provider image is required.
  - name: velero-plugin-for-aws
    image: velero/velero-plugin-for-aws:v1.1.0
    imagePullPolicy: IfNotPresent
      - mountPath: /target
        name: plugins
## Parameters for the `default` BackupStorageLocation and VolumeSnapshotLocation,
## and additional server settings.
  provider: aws
    name: aws
    bucket: k8s-data-backup
      region: us-east-1
    name: aws
      region: us-east-1
    AWS_SHARED_CREDENTIALS_FILE: /credentials/cloud
# Information about the Kubernetes service account Velero uses.
    create: true
    name: velero
# Info about the secret to be used by the Velero deployment, which
# should contain credentials for the cloud provider IAM account you've
# set up for Velero.
  useSecret: true
  existingSecret: velero-aws-creds
# Whether to deploy the restic daemonset for backup pvc's
deployRestic: false
## Schedule to backup cluster data every 4 hours
     schedule: "0 */4 * * *"
       ttl: "240h"
       includeClusterResources: true
       ## Exclude objects from below namespaces when doing the backup
       - kube-system
       - rafay-infra
       - rafay-system
       - velero
       defaultVolumesToRestic: false
       storageLocation: aws
       - aws

c) Create a Helm 3 addon using the above helm chart and custom values for Velero controller:

d) Add the Velero addon services created above to the cluster’s custom blueprint along with other cluster add-on services:

As you can see from the above example, in one simple operation, this custom cluster blueprint allows the administrator to deploy a large number of cluster-wide services such as (but not limited to):

  1. IBM Spectrum Scale Container Storage Interface (CSI) for persistent volumes (PV)
  2. MetalLB load balancer (LB)
  3. NGINX ingress controller
  4. Alerts integration with OpsGenie
  5. Velero for Kubernetes cluster backup and restore

Custom blueprints enable seamless repeatability and reproducibility. DevOps teams can also perform this operation in a fully automated manner with version-controlled manifests stored in their Git repositories. 

e. Deploy the custom blueprint to the cluster

f. Verify cluster backup:

  1. Go to S3 bucket and make sure that the backup object data are created
  2. Run kubectl get backup -n velero to verify backup is created as scheduled
  3. Run kubectl describe backup <backup_name> -n velero to check the backup snapshot status


Restoring The Cluster From Backup

In case of a Disaster Recovery (DR) or cluster migration, the Kubernetes cluster objects can be restored using the cluster backup snapshot. 

Create a k8s-restore.yaml file with the content below to restore the Kubernetes cluster from the desired <backup_name> and deploy it to the cluster:

kind: Restore
  name: k8s-restore
  namespace: velero
  backupName: <backup_name>

Below is the process to restore the Kubernetes cluster data:

  1. Obtain the backup snapshot information from kubectl get backup -n velero
  2. Identify which backup snapshot to restore the cluster data from by name and check to make sure that the backup snapshot was completed successfully from  kubectl describe backup <backup_name> -n velero
  3. Deploy the above restore manifest to the cluster kubectl apply -f k8s-restore.yaml -n velero


Kubernetes Cluster Backup, Restore and Disaster Recovery Made Easy with Velero and Rafay

Kubernetes cluster backup and recovery is critical for business continuity and avoiding extended applications/cluster downtime. Rafay offers flexibility for DevOps teams to natively backup and restore clusters to the Rafay Kubernetes Management Cloud or integrate with separately managed Amazon S3-compatible backup solutions such as Velero in a seamless manner. 

In addition to supporting standalone Kubernetes cluster backup and restore solutions like Velero, Rafay Kubernetes Management Cloud also provides a fully integrated and turnkey backup solution for organizations. This enables organizations to avoid the learning curve associated with Velero and allows admins to configure and perform backups for their clusters and workloads in just a few minutes. In an upcoming blog, we will describe how this dramatically changes the user experience for administrators and reduces the ongoing operational burden. 

For more information check out these links:

backup , cluster , cluster backup , devops , disaster recovery , helm , helm chart , helm charts , K8s , Kubernetes , kubernetes tutorial , persistent volume , restore , velero

Trusted by leading companies