Running GPU Infrastructure on Kubernetes: What Enterprise Platform Teams Must Get Right
Scaling GPUs on Kubernetes is a governance problem, where utilization, cost control, and access define success.
Read Now

At KubeCon Europe 2026, NVIDIA made a set of significant open-source contributions that advance how GPUs are managed in Kubernetes. These developments span across: resource allocation (DRA), scheduling (KAI), and isolation (Kata Containers). Specifically, NVIDIA donated its DRA Driver for GPUs to the Cloud Native Computing Foundation, transferring governance from a single vendor to full community ownership under the Kubernetes project. The KAI Scheduler was formally accepted as a CNCF Sandbox project, marking its transition from an NVIDIA-governed tool to a community-developed standard. And NVIDIA collaborated with the CNCF Confidential Containers community to introduce GPU support for Kata Containers, extending hardware-level workload isolation to GPU-accelerated workloads. Together, these contributions move GPU infrastructure closer to a first-class, community-owned, scheduler-integrated model.
Kubernetes' Device Plugin framework has been the standard mechanism for exposing GPUs since v1.8. While widely adopted, it has limitations:
The Dynamic Resource Allocation (DRA) API introduces a more expressive model through:
NVIDIA's DRA driver extends this model for GPUs, enabling:
This shifts GPU allocation toward a scheduler-visible, declarative workflow, enabling more precise placement and improved utilization.
Distributed AI workloads introduce requirements that are not fully addressed by the default Kubernetes scheduler, including:
The KAI Scheduler explores these requirements with:
This reflects a broader trend toward domain-specific schedulers that extend Kubernetes for AI/ML workloads.
GPU multi-tenancy introduces isolation challenges, particularly in regulated or shared environments.
Kata Containers address this by running each pod inside a lightweight virtual machine:
When combined with emerging hardware security capabilities, this provides a foundation for running sensitive workloads on shared GPU infrastructure with stronger isolation guarantees than standard containers.
While these projects introduce critical primitives, platform teams still need a way to standardize and operate them consistently across clusters.
In Rafay, this is achieved through Blueprints, versioned specifications that define cluster add-ons, policies, and configuration baselines. Blueprints act as the mechanism for turning upstream components into repeatable platform standards across GPU-enabled environments.
A GPU platform blueprint typically includes:
Blueprints are versioned and continuously reconciled, allowing platform teams to:
This approach enables organizations to manage heterogeneous GPU fleets while maintaining a consistent operational model.
The developments presented at KubeCon EU 2026 reflect a broader shift in GPU infrastructure within Kubernetes:
For platform teams, the challenge is no longer just provisioning GPUs, but operationalizing them as governed infrastructure , spanning allocation, scheduling, and isolation across the Kubernetes control plane.

Scaling GPUs on Kubernetes is a governance problem, where utilization, cost control, and access define success.
Read Now
.png)
A deep dive into OpenClaw as a gateway-centric AI runtime and how platform teams can deploy, secure, and scale it as a governed service on Kubernetes.
Read Now
.png)
AI at scale demands flexible GPU billing. Rafay helps cloud providers move beyond pay-as-you-go to unlock utilization, revenue, and enterprise-ready consumption.
Read Now