Advancing GPU Scheduling and Isolation in Kubernetes

March 25, 2026

At KubeCon Europe 2026, NVIDIA made a set of significant open-source contributions that advance how GPUs are managed in Kubernetes. These developments span across: resource allocation (DRA), scheduling (KAI), and isolation (Kata Containers). Specifically, NVIDIA donated its DRA Driver for GPUs to the Cloud Native Computing Foundation, transferring governance from a single vendor to full community ownership under the Kubernetes project. The KAI Scheduler was formally accepted as a CNCF Sandbox project, marking its transition from an NVIDIA-governed tool to a community-developed standard. And NVIDIA collaborated with the CNCF Confidential Containers community to introduce GPU support for Kata Containers, extending hardware-level workload isolation to GPU-accelerated workloads. Together, these contributions move GPU infrastructure closer to a first-class, community-owned, scheduler-integrated model.



1. Dynamic Resource Allocation (DRA): Toward Scheduler-Aware GPUs

Kubernetes' Device Plugin framework has been the standard mechanism for exposing GPUs since v1.8. While widely adopted, it has limitations:

  • Scheduling is largely based on integer resource counts, with limited awareness of device attributes
  • Topology information (e.g., NUMA, NVLink) is not fully integrated into scheduling decisions
  • Sharing and allocation semantics are often implemented out-of-tree or vendor-specific

The Dynamic Resource Allocation (DRA) API introduces a more expressive model through:

  • DeviceClass – describes device capabilities
  • ResourceSlice – represents allocatable capacity
  • ResourceClaim / ResourceClaimTemplate – declarative workload requests

NVIDIA's DRA driver extends this model for GPUs, enabling:

  • Attribute-based scheduling aligned with device capabilities
  • Integration with MIG and time-slicing mechanisms
  • Better coordination between the scheduler and allocation lifecycle

This shifts GPU allocation toward a scheduler-visible, declarative workflow, enabling more precise placement and improved utilization.

2. KAI Scheduler: AI-Aware Scheduling Semantics

Distributed AI workloads introduce requirements that are not fully addressed by the default Kubernetes scheduler, including:

  • Gang scheduling for coordinated multi-pod execution
  • Fair sharing of GPUs across teams
  • Avoiding partial allocations that degrade job efficiency

The KAI Scheduler explores these requirements with:

  • Gang scheduling semantics for all-or-nothing placement
  • Hierarchical queues with Dominant Resource Fairness (DRF)
  • Support for sub-GPU allocation strategies, depending on device capabilities
  • Pre-scheduling simulation to reduce preemption overhead

This reflects a broader trend toward domain-specific schedulers that extend Kubernetes for AI/ML workloads.

3. Kata Containers: Strengthening GPU Multi-Tenancy

GPU multi-tenancy introduces isolation challenges, particularly in regulated or shared environments.

Kata Containers address this by running each pod inside a lightweight virtual machine:

  • Each workload runs in a dedicated microVM
  • GPUs are exposed via VFIO passthrough
  • Isolation is enforced at the hardware virtualization boundary

When combined with emerging hardware security capabilities, this provides a foundation for running sensitive workloads on shared GPU infrastructure with stronger isolation guarantees than standard containers.

From Upstream Capabilities to Platform Standards

While these projects introduce critical primitives, platform teams still need a way to standardize and operate them consistently across clusters.

In Rafay, this is achieved through Blueprints, versioned specifications that define cluster add-ons, policies, and configuration baselines. Blueprints act as the mechanism for turning upstream components into repeatable platform standards across GPU-enabled environments.

A GPU platform blueprint typically includes:

  • NVIDIA GPU Operator — driver lifecycle and GPU component management
  • KAI Scheduler — deployed as a managed add-on
  • Prometheus/NVIDIA DCGM Exporter — GPU observability
  • OPA Gatekeeper — policy enforcement
  • Network policies — namespace isolation
  • Kata Containers runtime — for stronger workload isolation

Blueprints are versioned and continuously reconciled, allowing platform teams to:

  • Enforce consistent configuration across clusters
  • Detect and remediate configuration drift
  • Support cluster-specific variations (e.g., MIG vs time-slicing) without duplicating definitions

This approach enables organizations to manage heterogeneous GPU fleets while maintaining a consistent operational model.

Key Takeaways

The developments presented at KubeCon EU 2026 reflect a broader shift in GPU infrastructure within Kubernetes:

  • From node-local, opaque resources → to scheduler-visible, attribute-rich resources
  • From ad hoc scheduling and sharing → toward structured, policy-aware allocation models
  • From best-effort isolation → toward stronger multi-tenant and security boundaries

For platform teams, the challenge is no longer just provisioning GPUs, but operationalizing them as governed infrastructure , spanning allocation, scheduling, and isolation across the Kubernetes control plane.

Share this post

Want a deeper dive in the Rafay Platform?

Book time with an expert.

Book a demo
Tags:

You might be also be interested in...

Product

Running GPU Infrastructure on Kubernetes: What Enterprise Platform Teams Must Get Right

Scaling GPUs on Kubernetes is a governance problem, where utilization, cost control, and access define success.

Read Now

Product

OpenClaw on Kubernetes: A Platform Engineering Pattern for Always-On AI

A deep dive into OpenClaw as a gateway-centric AI runtime and how platform teams can deploy, secure, and scale it as a governed service on Kubernetes.

Read Now

Product

Flexible GPU Billing Models for Modern Cloud Providers — Powering the AI Factory with Rafay

AI at scale demands flexible GPU billing. Rafay helps cloud providers move beyond pay-as-you-go to unlock utilization, revenue, and enterprise-ready consumption.

Read Now