Kubernetes as a platform allows teams to scale up applications at a rapid pace but this may come at a significant cost if done inefficiently.
According to the 2021 CNCF survey, more than 68% of organizations reported an increase in Kubernetes spend over the past year. More than half reported a 20% plus increase in spend year-over-year, and only a handful of organizations (20%) were able to predict an increase. Adding to the problem, more than 68% of organizations either do not monitor costs or simply rely on monthly estimates.
A lot of organizations are spending more than they actually need to because of a lack of timely visibility into their cost structure. In this blog, we will discuss different models to effectively manage your K8s costs.
Breaking down the cost structure and ensuring the right people in the organization have access to it
There are several models for setting up clusters and sharing resources between teams in customer environments.
- Teams run separate clusters in dedicated accounts
- Teams are allocated a dedicated node group in a shared cluster
- Teams are allocated namespaces in a shared cluster
The first and most important step to effectively manage K8s costs is to set up monitoring mechanisms that provide a good overview of the cost structure. A good monitoring mechanism is half the problem solved.
All cloud vendors including AWS, GCP and Azure provide a reasonably detailed bill that should let you calculate costs for running K8s resources such as clusters and nodes, and they are in most cases adequate to deal with the first two models.
However, if your organization allocates namespaces in a shared cluster, providing visibility into the cost structure would require collection and aggregation of very granular resource utilization metrics from clusters. You would also want the ability to tie this cost to the actual services you run. And, show the right information to the right folks in the organization. Drawing the lines on who gets to see what is one of the most challenging parts of this exercise. Enabling access to cost metrics, only to users based on their assigned role, increases transparency and reduces the time for cost optimization exercises.
Implementing chargeback or showback programs
After you’ve successfully set up a structure to bubble up costs at the right granularity to the right set of people, you will be able to make the respective stakeholders (application owners, namespace administrators, etc.) aware of their spend. This is what is called a “showback” of cost.
In some cases, you may want to go a step further and charge individual teams or projects based on their utilization of shared infrastructure resources, referred to as “chargeback”. A lot of organizations have a chargeback program to identify and allocate K8s costs to the respective cost centers.
Dealing with Idle costs
One of the elements that needs to be clearly defined in an effective chargeback program is around shared or idle costs. Dealing with idle capacity is one of the most debated topics while creating a chargeback program.
Any K8s cluster is bound to have idle capacities. These costs are unavoidable and as an organization you need to come up with an appropriate model to allocate these costs. There are a few effective models being used by organizations today.
This is a very simple model designed to make it easy for organizations in their initial stages of their chargeback journey. As the name suggests, costs are shared with all the tenants (teams or projects) equally, at all levels.
Say, a cluster has 3 namespaces with each of the namespaces allocated to a team. Shared or Idle costs in this case are allocated across all 3 namespaces (teams) equally.
With an allocation model, costs are divided between the tenants based on their original resource allocation. Assuming that the largest tenant requires the largest buffer, this model ensures the largest tenant contributes the largest share of the costs, whether used and unused.
For example let’s say a cluster has 3 namespaces with each of the namespaces allocated to a specific team. Assuming Namespace 1 has 30% of allocated resources, Namespace 2 has 20% and Namespace 3 has 50%, shared or Idle costs will be 30%, 20% and 50% respectively.
Kubernetes is very powerful and helps teams and applications scale like never before. Costs associated with Kubernetes are on a rise and organizations need to start thinking about cost before the dreaded bill shock. There is a dire need for finer cost visibility into organizations K8s infrastructure and applications that run on it and setting it up in-house is a tedious task.
In an upcoming blog post, we will describe how Rafay’s Kubernetes Operations Platform (KOP) can help provide:
- Full visibility into cost and utilization metrics for clusters helping customers optimize spend
- Ability to create chargeback groups to track and bill internal teams based on resource consumption for shared cluster scenarios