5 Practices for Kubernetes Operations with Amazon EKS

In the past several years, organizations of all sizes and verticals have helped to accelerate their IT development pipelines using containerized applications orchestrated by Kubernetes (K8s) and the cloud. But to achieve optimum efficiency, many of these organizations are looking to add other management services.

One of the most popular choices for managed Kubernetes is Amazon Elastic Kubernetes Service (EKS). But as organizations expand adoption of Amazon EKS, the number of K8s clusters and apps can lead to significant operational challenges, including observability, upgrade management, security and developer productivity.

To address these challenges, a platform/site reliability engineering (SRE) team must look for scalable ways to securely manage all their EKS clusters across all accounts and regions.

Be it spot-based worker node provisioning, Amazon EKS Distro (EKS-D) or secure application deployment across multiple EKS clusters configured with private endpoints, platform teams need to centralize management to create a holistic approach to operating Kubernetes clusters on AWS.

This article covers ways teams can streamline the use of Amazon EKS and maximize the benefits of this robust Kubernetes management solution.

Filling the Operational Gap for Kubernetes

Enterprises trying to scale modern applications often encounter an “operational gap” between what their Kubernetes strategy enables and what their organization needs to thrive.

This operational gap is largely driven by three common factors:

Cluster scale where standardization becomes increasingly complicated and challenging as the number of clusters grows.
Cluster geography where a growing number of availability zones and AWS regions makes managing applications and infrastructure increasingly difficult.
Ensuring proper access, as more people in the organization see the benefits of K8s and want to use it, configuring and maintaining access control by cluster becomes unscalable.

For large enterprises using Kubernetes or a managed service like EKS, it is essential to enable the following capabilities to get the most out of the platform and help bridge these operational gaps. Let’s start by exploring these core areas and the important questions that come with them:

Automation

The key question to ask early on is this: “How can we streamline all the cluster and application deployments to keep up with the demands of the business?”

Enterprises operating multiple clusters often run into a common challenge of managing the life cycle of fleet(s) in parallel. The key is to create operational practices for automating cluster and application deployments, Kubernetes upgrades and administrative tasks. This will reduce errors, increase productivity and deliver faster time to market for modern applications.

First, enable the power of continuous deployment from a GitOps operating model (implementing a version control system) to automatically deploy changes to Kubernetes clusters. Being able to create any number of pipelines consisting of multiple stages that can be executed sequentially one after another can help centralize every aspect of the process for managing both operations and development.

Second, enable the simplest process for upgrading Kubernetes versions on Amazon EKS or Amazon EKS-D clusters, regardless of needing in-place upgrades or a migration to a new cluster. Focus on automating preflight checks, upgrades to the cluster and validating the changes faster to help simplify and standardize application lifecycle management. By automating mundane tasks, admins can lower the likelihood of human-caused errors, increase overall productivity and allow their teams to focus on innovation.

Security

The next important question is “how can we secure all our clusters and applications across multiple AWS zones and regions to restrict usage to the right people, making sure all actions can be audited?” Most large IT organizations use identity management and access control for business applications, but creating and maintaining roles become crucial in multicluster environments where, for the sake of efficiency, a single AWS admin may be assigned to a group of clusters. This can create an inherent security risk if an attacker breaches a single account with access to all the clusters within.

Consider increasing your security posture with a role-based access control and zero-trust access, which can be governed by policies and integrated with your corporate single sign-on solution. This helps to ensure that all applications require strong authentication and secure credentials and treats all network connections as untrustworthy unless proven otherwise.

The goal is to allow the right users to access clusters from anywhere — even from behind firewalls — while maintaining a full audit trail by user and commands executed.

Visibility

One of the great things about Kubernetes is that it allows you to run applications across multiple regions, availability zones and clouds. To ensure resources are used effectively and managed across multiple accounts, clusters and AWS regions, platform/SRE teams need full visibility across their entire infrastructure, including on-premises and remote/edge locations, no matter which K8s distribution is employed.

Understanding the status and health for every Amazon EKS and Amazon EKS-D cluster through a detailed, at-a-glance dashboard view is critical for production workloads. Having a single view of all clusters and apps makes it easier for cluster admins to visualize, diagnose and resolve incidents proactively and get the most out of Amazon EKS, especially as internal Kubernetes adoption increases.

Governance

Ensuring compliance with internal policies and industry regulations such as HIPAA, PC or GDPR is a fundamental requirement for a newly operational Kubernetes infrastructure. Generating automated workflows with standardized and approved templates for clusters and applications is critical.

Consistency is key when governing the use of Kubernetes through policies, especially for elements such as security, storage and visibility across your entire K8s infrastructure. Ideally, different internal groups can use multiple sets of preapproved cluster configurations at different development stages. Doing so not only simplifies administration but helps to minimize the risk of mismanagement and vulnerabilities. This includes the ability to quickly detect, block and notify enterprise administrators of any changes within cluster and application configurations to eliminate out-of-bounds clusters and potential security and support issues.

Resources

Conceptually, Kubernetes allows for easier, faster and disposable clusters that internal and external clients can use cost-effectively. However, to benefit from these fast and flexible clusters, enterprises need to implement new processes to build, integrate, access, maintain and upgrade K8 clusters.

This requires hiring K8s experts that are hard to find and keep because the demand for talent is high and supply is low. Having a centralized platform that reduces complexity and allows for streamlined operations becomes a key component for a successful deployment of large-scale Kubernetes environments.

Kubernetes is becoming an increasingly popular choice for enterprises that want to empower their IT organizations to operate at velocity and scale. But as organizations scale their Kubernetes practice to thrive in the cloud with tools like Amazon EKS, deeper integration can successfully fill the operational gaps with Kubernetes and help you get the most out of your cloud investment.

This article was originally published in The New Stack.

Author

Kyle Hunter

View all posts

Streamline AI/ML Adoption: Expert Strategies to Conquer IT Hurdles and Accelerate Growth.