Kubernetes Clusters as a Service

Self-Service Access to Clusters for Your Developers

Provide developers, cloud operations, and all cloud users with self-service access to Kubernetes clusters using proven templates with guardrails included.

Why Clusters-as-a-Service?

Modern applications require modern orchestration in the form of Kubernetes. Enterprises that streamline the process of setting up Kubernetes clusters by providing self-service access for developer and cloud operation teams gain significant benefits.

Increase Deployment Velocity

Deployments are 4x faster when clusters are available on demand vs. submitting tickets and waiting for infrastructure.

Simplify Cluster Management

The reuse of templatized clusters with policy built-in reduces ongoing management overhead.

Reduce Cognitive Load

When you free your developers from calling operations teams you pave the way for them to deliver true value to your business.

Unique Rafay Capabilities for Cluster-as-a-Service

Dozens of enterprise platform teams leverage these unique features to rapidly build cluster-as-a-service automation with Rafay and delight their developers.

K8s Lifecycle Management

Multi-cloud & distro K8s

Support for running clusters in the cloud and on-premise

Provisioning Anywhere

Provisioning support for AWS (EKS), Azure (AKS), GCP (GCP), OCI, Bare Metal (Upstream Kubernetes) and Edge (Upstream Kubernetes)

Infrastructure as Code (IaC)

Support for TF or GitOps first approaches Support for private Git repos

Automated K8s Upgrades

Orchestrate K8s upgrades in a phased manner across fleet of clusters. The platform team may want to centrally perform Day 2 ops or delegate that to downstream teams

API Deprecation Checks

Ability to check for API deprecations before a K8s version upgrade

Pre/Post Hooks for K8s upgrades

Examples of hooks include measuring application performance pre and post cluster upgrade and Initiate backup operation before cluster upgrade

Staged Rollout of Node OS upgrades

Ability to orchestrate Node OS upgrades (e.g. AMIs) in a phased manner across fleet of clusters

Bring Pre-existing clusters into Compliance easily

Ability to enforce the same guardrails as newly provisioned clusters

Disaster Recovery

Ability to do a backup/restore of Control Plane configuration (in the case of an unmanaged K8s distro) + Data Plane (for stateful applications)


Logging “who did what?” + exporting audits to an external system (e.g. Splunk, Datadog) necessary to demonstrate compliance

Developer Self Service

Flexible User Interfaces

Ability to consume the platform through the preferred interface: UI, Backstage, GitOps or CMDBs (e.g. ServiceNow)

Simple Process for Compute

No time consuming ticket driven process where the Platform team has to manually provision clusters

Visualization of Resources

Ability for end users to quickly look at resources in the clusters and perform operations via the UI. This is especially useful for non savvy users

Streamlined Kubectl Access

Do not mandate VPNs or needing to go through a bastion etc.

Temporary Kubectl Access

It shall be possible to implement a “break glass” procedure so that the application team user has temporary access to debug applications in a prod cluster

Visibility of Policy Violations

View into “what resources” are violating policies so that it is easy to remediate and course correct (for future actions)

Visibility of Resources and Cost

To help with scenarios such as: application right sizing exercises and requesting platform team for additional compute

Repository of Approved Apps

Integrated, low touch experience for installing applications that have been scanned for vulnerabilities etc.

Add-On Lifecycle Management

Installation of Add-ons

Ensure that clusters are always born with the right set of add-ons Examples for add-ons include security and monitoring tools

Cluster Overrides

Ability to inject values to a manifest dynamically so that a single add-on can be used org-wide, e.g. AWS Load Balancer Helm chart requires configuration of “clusterName”

Drift Detection & Blocking

Ensure that add-ons are always in a compliant state without needing expensive reconciliation tools

Automated Add-on Updates

Support for a staged roll out model for updating add-ons across clusters

Configurable Pre/Post Hooks for Updates

Ability to run custom scripts to determine whether the applications are running fine after an add-on update

Version Control for Add-ons

To ensure that only blessed versions of add-ons can be used and older versions can be retired/made unavailable Also allows organizations to demonstrate compliance

Golden Packs for Add-ons

Allow downstream teams to include more add-ons based on their requirements while ensuring the baseline set of add-ons mandated by the platform team always gets installed

Centralized Visibility

Visibility into add-ons/versions across clusters in the organization


Platform to Support Multiple Teams

Central platorm that can deliver “cluster as a service” to multiple teams within the organization with access to resources controlled by user identity

Curated List of Cluster Templates

Workflows to centrally define templates and share templates selectively with downstream users (Developers, SREs) in a consistent manner Ensures that provisioned clusters conform to approved guidelines

Version Control for Cluster Templates

To ensure that only blessed versions of templates can be used and older versions can be retired/made unavailable Also allows organizations to demonstrate compliance

Overrides for Cluster Templates

“One size fits all” approach doesn’t work for enterprises It shall be possible for end users to override cluster template configurations selectively, e.g. instance types, regions (as deemed permissible by the platform team)


Just in Time User Kubectl Access

Implementing K8s RBAC at scale with company’s IDP as source of truth without the need to implement expensive solutions such as bastions, VPNs etc.

Kubectl Access Audits

Centralized visibility into user actvities + ability to export audits to an external system (e.g. Splunk, Datadog)

Compliance Benchmarks

Ongoing scans against benchmarks such as CIS, NSA hardening recommendations etc. Ability to securely access the fleet of clusters to run periodic scans and centrally aggregate the benchmark reports


Centralized enforcement of policies for security, reliability and operational efficiency. Centralized visibility into policy violations Examples include: Only allow images from blessed repos & ensure that pods are running with appropriate privileges

Network Policies

Control egress/ingress traffic patterns for clusters (N/S traffic)

Chargeback/ Showback

Collect of Granular utilization metrics from clusters Flexible chargeback/showback models

Cost Optimization

Implement capabilities such as:TTL: e.g. cluster will be automatically deprovisioned; Schedule: e.g. cluster “scale down” and “scale up” based on configured schedule

Identify Underutilized Clusters

Collect of Granular utilization metrics from clusters to show usage by CPU, Memory

Deployment Options

SaaS and Self-Hosted

Self-hosted airgapped option may be necessary for highly regulated industries such as public sector and biotech

Download the Templates

More downloadable templates are coming soon. So, to get started providing self-service access to clusters in your enterprise, talk to us about one of the templates below.

CaaS on GCP


Google Kubernetes Engine on Google Cloud Platform


CaaS on vCluster


vCluster on any Kubernetes


CaaS on vSphere


vSphere in Private Data Center


CaaS on EKS


Elastic Kubernetes Service on Amazon Web Services


CaaS on ECS


Elastic Container Service on Amazon Web Services


CaaS on Azure


Azure Kubernetes Service on Azure


CaaS on Upstream Kubernetes


Upstream Kubernetes in Private Data Center, Bare Metal, Edge using PhoenixNAP


CaaS on OKE


Clusters Using Oracle Container Engine for Kubernetes (OKE)

White Paper
Hybrid Cloud Meets Kubernetes

Learn how to Streamline Kubernetes Ops in Hybrid Clouds with AWS & Rafay

"Easily operate and rapidly deploy applications anywhere across multi-cloud and edge environments."

Aamir Hussain

SVP Chief Product Officer, Verizon Business

"Rafay’s unified view for Kubernetes Operations & deep DevOps expertise has allowed us to significantly increase development velocity."

Alec Rooney


"The big draw was that you could centralize the lifecycle management & operations."

Beth Cohen

Cloud Technology Strategist, Verizon Business