The Kubernetes Current Blog

Mastering Amazon EKS Upgrades with Rafay’s Kubernetes Operations Platform

Introduction

Amazon Elastic Kubernetes Service (EKS) is a popular managed Kubernetes service that simplifies containerized applications’ deployment, scaling, and management. As with any technology, EKS requires periodic upgrades to ensure the latest features, security patches, and performance improvements are in place. This blog post will discuss best practices for upgrading EKS and how the Rafay platform can help streamline the process.

Best Practices for EKS Upgrades

1. Understand the EKS upgrade process

Before upgrading an EKS cluster, it’s essential to understand the upgrade process. AWS regularly releases new versions of EKS, including updates to the Kubernetes control plane and worker nodes. Upgrading an EKS cluster involves upgrading both the control plane and worker nodes.

AWS provides two types of upgrades: i) major version upgrades and ii) minor version upgrades. Major version upgrades require more planning and preparation than minor version upgrades because they include significant changes to the Kubernetes API and may require changes to your applications or infrastructure.

To stay up-to-date with Amazon EKS releases, follow their official page Amazon EKS Kubernetes versions.

2. Planning and Scheduling Upgrades

The initial step in upgrading an EKS cluster is to plan the upgrade, which involves creating a plan that includes the following information:

  • The version that the EKS is being upgraded
  • Find the Prerequisites and any retired versions or API from the Amazon EKS release official page. You can use tools like kube-no-trouble or pluto to identify the deprecated APIs
  • Upgrade critical add-ons and components—for example, CoreDNS, kube-proxy, VPC CNI, and storage drivers
  • The upgrade strategy, including whether to perform a rolling upgrade or a blue-green upgrade
  • Impact of the upgrade on your applications and infrastructure
3. Testing Upgrades in Non-Production Environments

Before applying an upgrade to a production environment, testing the upgrade in a non-production environment, such as a staging or development cluster, is crucial. This will help identify potential issues and ensure compatibility with your applications.

Note: Testing the upgrades on a cluster with the copy of your production cluster workloads is recommended.

4. Backup Data and Configuration

Ensure that you have a backup of your cluster’s data and configuration before starting the upgrade process. It can include snapshots of your persistent storage volumes, Kubernetes manifests, and Helm charts. This ensures that you can restore your cluster to its previous state in case of any issues during the upgrade. You can use tools like Velero to backup and restore your EKS cluster.

5. The Upgrade: Control Plane

Before upgrading the control plane, you first need to ensure that all the prerequisites you have covered in the second step must be addressed for a successful upgrade.

For example, for EKS 1.26, as per the official doc, there are two key changes,

  • Firstly, the VPC CNI plugin version has to be 1.12 or higher
  • Secondly, Containerd runtimes version has to be containerd version 1.6.0 or higher

You also need to check for the deprecated API versions; if any, then handle them accordingly for your workloads.

Once the above steps are complete, you can safely upgrade the control plane. There are different ways to upgrade the control plane depending upon the interface you or your organization have chosen for the lifecycle management; for example, if you have chosen the AWS console, then directly go to the console and upgrade the cluster with the next major version.

Note: Before updating the Control plane, check the version of your Kubernetes cluster and worker node. If your worker node version is older than your current Kubernetes version, you must update your worker node first to bring it to the current version of the control plane and then only proceed with upgrading your control plane.

6. The Upgrade: Worker Nodes

After upgrading the control plane, you can proceed with upgrading the worker nodes. This involves upgrading the Kubernetes version and other dependencies on the worker nodes, such as the Docker engine or the AWS Node Termination Handler. Tools like eksctl or the AWS CLI can upgrade the worker nodes.

7. Monitor and Validate the Upgrade

Closely monitor the upgrade process and validate the health of your applications during and after the upgrade. You can use AWS tools like CloudWatch logs and metrics to monitor the upgrade. You can also use kubeconfig to track the changes to your EKS cluster during the upgrade.

Problems With The Above Approach

Upgrading your system is a complex process involving multiple intricate steps that are not intuitive and prone to human errors. Without the right expertise, even a single missed step could result in costly downtime, impacting your business value.
Moreover, undertaking such an upgrade would require considerable time and effort from your skilled SRE team. Rafay’s solution eliminates these concerns.

By leveraging Rafay’s Kubernetes Operations platform, you gain the advantage of a streamlined upgrade process. Our expertly designed solution simplifies the steps, minimizing the potential for errors and reducing the risk of downtime. This lets your team focus on more valuable tasks, optimizing efficiency and productivity.

Consider the numbers: On average, a single engineer takes around seven working days to complete the upgrade process, consuming approximately 45 working hours (6-7 working hours a day). By adopting our platform, you could save a significant amount of time and free up your talented team for more impactful projects.

Not only this, but our solution also offers a remarkable improvement in time efficiency. You can now reduce the total time taken for an upgrade from 7 days to approximately half a day, representing a whopping 200+% improvement. Imagine the impact this can have on your operations, enabling you to implement upgrades swiftly and stay ahead of the competition.

Don’t let the complexities of upgrades hinder your progress. Embrace our platform and unleash the true potential of your resources.

How Rafay Can Help

The Rafay platform is designed to simplify Kubernetes operations, including EKS upgrades. Here’s how Rafay can help streamline the upgrade process:

Managed EKS Lifecycle

Rafay offers a managed EKS lifecycle, which includes provisioning, scaling, and upgrading EKS clusters. With Rafay, you can easily schedule and manage upgrades without the need for manual intervention or complex scripts.

Multi-Cluster Management

Manage multiple EKS clusters across different AWS accounts and regions using Rafay’s unified control plane. This allows you to apply consistent policies and best practices across all your EKS clusters.

Backup and Restore

Rafay’s built-in backup and restore capabilities enable you to easily back up your cluster’s data and configuration before upgrading, minimizing the risk of data loss.

Canary and Blue-Green Deployments

Rafay supports advanced deployment strategies like canary and blue-green deployments, enabling you to perform gradual rollouts and minimize the impact of EKS upgrades on your applications.

Canary approach is well suited for scenarios where a lower blast radius is required, and costs for additional infrastructure are not a major concern. With a canary upgrade strategy, the risk of compatibility and stability is removed by performing this validation out of band on the canary cluster with the new Kubernetes version

Fig 1.1 – Canary Deployment Strategy

Blue/Green approach is well suited for scenarios where an extremely low blast radius is required. With a blue-green upgrade strategy, users have the advantage of running their applications for an extended time period (Blue/Green type) on two clusters, one with the “older” Kubernetes version and the replacement with the “newer” Kubernetes version. Administrators have the option to switch back and forth between the old and new clusters as required.

Fig 1.2 – Blue Green Deployment Strategy

Monitoring and Observability

Gain deep insights into your EKS clusters and applications with Rafay’s integrated monitoring and observability tools. Use real-time data to validate the success of an upgrade and quickly identify issues.

Conclusion

Upgrading your Amazon EKS is necessary to maintain a secure, high-performing, and feature-rich environment. Following the best practices outlined in this blog and adopting Rafay’s Kubernetes Operations Platform will ensure your EKS upgrades go smoothly, with minimal disruption to your applications and infrastructure. It is essential to keeping your Kubernetes environment up-to-date, secure, and ready to leverage the latest features.

Try It Out

Sign up here for a free trial and try it out yourself. Also, check out our “Getting Started Guide” and documentation for a number of hands-on exercises that will familiarize you with the capabilities of Rafay’s Kubernetes Operations platform. Watch our YouTube channel for step-by-step tutorials and more.

Authors

Trusted by leading companies