Back

Advancing GPU Scheduling and Isolation in Kubernetes

March 25, 2026

No items found.

At KubeCon Europe 2026, NVIDIA made a set of significant open-source contributions that advance how GPUs are managed in Kubernetes. These developments span across: resource allocation (DRA), scheduling (KAI), and isolation (Kata Containers). Specifically, NVIDIA donated its DRA Driver for GPUs to the Cloud Native Computing Foundation, transferring governance from a single vendor to full community ownership under the Kubernetes project. The KAI Scheduler was formally accepted as a CNCF Sandbox project, marking its transition from an NVIDIA-governed tool to a community-developed standard. And NVIDIA collaborated with the CNCF Confidential Containers community to introduce GPU support for Kata Containers, extending hardware-level workload isolation to GPU-accelerated workloads. Together, these contributions move GPU infrastructure closer to a first-class, community-owned, scheduler-integrated model.

‍

1. Dynamic Resource Allocation (DRA): Toward Scheduler-Aware GPUs

Kubernetes' Device Plugin framework has been the standard mechanism for exposing GPUs since v1.8. While widely adopted, it has limitations:

Scheduling is largely based on integer resource counts, with limited awareness of device attributes
Topology information (e.g., NUMA, NVLink) is not fully integrated into scheduling decisions
Sharing and allocation semantics are often implemented out-of-tree or vendor-specific

The Dynamic Resource Allocation (DRA) API introduces a more expressive model through:

DeviceClass – describes device capabilities
ResourceSlice – represents allocatable capacity
ResourceClaim / ResourceClaimTemplate – declarative workload requests

NVIDIA's DRA driver extends this model for GPUs, enabling:

Attribute-based scheduling aligned with device capabilities
Integration with MIG and time-slicing mechanisms
Better coordination between the scheduler and allocation lifecycle

This shifts GPU allocation toward a scheduler-visible, declarative workflow, enabling more precise placement and improved utilization.

‍

2. KAI Scheduler: AI-Aware Scheduling Semantics

Distributed AI workloads introduce requirements that are not fully addressed by the default Kubernetes scheduler, including:

Gang scheduling for coordinated multi-pod execution
Fair sharing of GPUs across teams
Avoiding partial allocations that degrade job efficiency

The KAI Scheduler explores these requirements with:

Gang scheduling semantics for all-or-nothing placement
Hierarchical queues with Dominant Resource Fairness (DRF)
Support for sub-GPU allocation strategies, depending on device capabilities
Pre-scheduling simulation to reduce preemption overhead

This reflects a broader trend toward domain-specific schedulers that extend Kubernetes for AI/ML workloads.

‍

3. Kata Containers: Strengthening GPU Multi-Tenancy

GPU multi-tenancy introduces isolation challenges, particularly in regulated or shared environments.

Kata Containers address this by running each pod inside a lightweight virtual machine:

Each workload runs in a dedicated microVM
GPUs are exposed via VFIO passthrough
Isolation is enforced at the hardware virtualization boundary

When combined with emerging hardware security capabilities, this provides a foundation for running sensitive workloads on shared GPU infrastructure with stronger isolation guarantees than standard containers.

‍

From Upstream Capabilities to Platform Standards

While these projects introduce critical primitives, platform teams still need a way to standardize and operate them consistently across clusters.

In Rafay, this is achieved through Blueprints, versioned specifications that define cluster add-ons, policies, and configuration baselines. Blueprints act as the mechanism for turning upstream components into repeatable platform standards across GPU-enabled environments.

A GPU platform blueprint typically includes:

NVIDIA GPU Operator — driver lifecycle and GPU component management
KAI Scheduler — deployed as a managed add-on
Prometheus/NVIDIA DCGM Exporter — GPU observability
OPA Gatekeeper — policy enforcement
Network policies — namespace isolation
Kata Containers runtime — for stronger workload isolation

Blueprints are versioned and continuously reconciled, allowing platform teams to:

Enforce consistent configuration across clusters
Detect and remediate configuration drift
Support cluster-specific variations (e.g., MIG vs time-slicing) without duplicating definitions

This approach enables organizations to manage heterogeneous GPU fleets while maintaining a consistent operational model.

‍

Key Takeaways

The developments presented at KubeCon EU 2026 reflect a broader shift in GPU infrastructure within Kubernetes:

From node-local, opaque resources → to scheduler-visible, attribute-rich resources
From ad hoc scheduling and sharing → toward structured, policy-aware allocation models
From best-effort isolation → toward stronger multi-tenant and security boundaries

For platform teams, the challenge is no longer just provisioning GPUs, but operationalizing them as governed infrastructure , spanning allocation, scheduling, and isolation across the Kubernetes control plane.

Share this post

Want a deeper dive in the Rafay Platform?

Book time with an expert.

Tags:

You might be also be interested in...

Product

Rafay and NVIDIA DSX OS: Turning Open-Source Components into a Consumable AI Cloud

AI factory operators have solved the GPU capacity question. The harder one is turning that capacity into production AI services. Rafay integrates NVIDIA DSX OS to ship it as a consumable AI cloud.

Read Now

No items found.

Product

How Rafay Turns NeoClouds and Telco AI Clouds into Token-Metered Revenue Engines

Learn how telcos and NeoClouds can turn sovereign AI infrastructure into token-metered services with Rafay, enabling inference APIs, billing, governance, and monetization.

Read Now

No items found.

News

Rafay and Dell Technologies Forge a Faster Path to Production AI

Dell and Rafay are forging a faster path to production AI by delivering a powerful solution to help enterprises, telcos and neoclouds to build and scale sovereign AI platforms with confidence. With a full-stack approach and automation at its core, this joint offering supports innovation while ensuring operational control, compliance, data sovereignty and rapid ROI.

Read Now

No items found.

Advancing GPU Scheduling and Isolation in Kubernetes

‍1. Dynamic Resource Allocation (DRA): Toward Scheduler-Aware GPUs

2. KAI Scheduler: AI-Aware Scheduling Semantics

3. Kata Containers: Strengthening GPU Multi-Tenancy

From Upstream Capabilities to Platform Standards

Key Takeaways

Want a deeper dive in the Rafay Platform?

You might be also be interested in...

Rafay and NVIDIA DSX OS: Turning Open-Source Components into a Consumable AI Cloud

How Rafay Turns NeoClouds and Telco AI Clouds into Token-Metered Revenue Engines

Rafay and Dell Technologies Forge a Faster Path to Production AI

‍

1. Dynamic Resource Allocation (DRA): Toward Scheduler-Aware GPUs