Back

Custom GPU Resource Classes in Kubernetes

July 10, 2025

No items found.

In the modern era of containerized machine learning and AI infrastructure, GPUs are a critical and expensive asset. Kubernetes makes scheduling and isolation easier—but managing GPU utilization efficiently requires more than just assigning something like

In this blog post, we will explore what custom GPU resource classes are, why they matter, and when to use them for maximum impact. Custom GPU resource classes are a powerful technique for fine-grained GPU management in multi-tenant, cost-sensitive, and performance-critical environments.

Before We Begin:

If you are new to GPU sharing approaches, we recommend reading the following introductory blogs: Demystifying Fractional GPUs in Kubernetes and Choosing the Right Fractional GPU Strategy.

What Are Custom GPU Resource Classes?

By default, Kubernetes exposes GPUs through a single resource name: nvidia.com/gpu. As an end user, you have no idea how the underlying GPU is setup and configured. For example, the GPU type you will use may fall into one of the following:

Full exclusive GPUs
Time-sliced shared GPUs
MIG (Multi-Instance GPU) slices
Fractional (e.g., ¼) allocations

Custom resource classes allow administrators to define new GPU resource names that are more obvious and apparent for users. These names are configured by the GPU device plugin (typically via the NVIDIA GPU Operator) and allow you to expose multiple logical GPU types from the same physical hardware.

Some examples are shown below.

1. nvidia.com/gpu-time-slice

As the custom resource class name suggests, this is a time sliced GPU

2. nvidia.com/gpu-mig-1g.5gb

As the custom resource class name suggests, this is a MIG GPU instance with 1g.5gb of memory.

3. nvidia.com/gpu-fraction-0.25

As the custom resource class name suggests, this is a fractional (0.25) GPU

Why Custom Resource Classes Matter?

We sometimes get asked by customers as to why does custom resource classes matter? Here are some common reasons we can think of:

Better Scheduling and Workload Matching

Different workloads can have vastly different GPU requirements. For example,

Dev notebooks or small inference tasks only need a fraction of a GPU.
Real-time inference needs isolated and predictable performance.
Training jobs require full, exclusive access.

Enabling Multi-Tenancy

In shared environments such as internal ML platforms, GPU clouds, or research clusters, custom classes allow administrators to achieve the following:

Partition GPU usage across teams
Enforce resource quotas per class
Prevent one user from monopolizing all full GPUs

Cost Optimization

As we all know, GPU costs add up quickly. Using full GPUs for lightweight jobs is inefficient. Custom classes enable the following:

Time-sliced sharing for low-duty jobs
MIG slices for sandbox or model testing
Fine-grained billing per resource type

Transparency and Observability

Custom resource names make GPU usage explicit in YAMLs and dashboards. For example, when you use the following YAML

How to Set It Up?

Custom resource classes need to be defined in the NVIDIA GPU Operator’s Helm values.yaml file. You can use an override such as the following:

In this example, the configuration exposes each physical GPU as 4 logical time-sliced units. Users can then request for a time sliced unit with the following YAML :

Conclusion

Custom GPU resource classes offer the flexibility, cost-efficiency, and isolation required for scalable and sustainable GPU operations in Kubernetes. Whether you’re a platform engineer, ML researcher, or infrastructure architect, adopting this pattern can dramatically improve your cluster’s GPU utilization and user experience.

Share this post

Want a deeper dive in the Rafay Platform?

Book time with an expert.

Tags:

You might be also be interested in...

ArgoCD Reconciliation Explained: How It Works and Why It Matters

ArgoCD is a powerful GitOps controller for Kubernetes, enabling declarative configuration and automated synchronization of workloads.

Read Now

No items found.

Choosing the Right Fractional GPU Strategy for Cloud Providers

As demand for GPU-accelerated workloads soars across industries, cloud providers are under increasing pressure to offer flexible, cost-efficient, and isolated access to GPUs.

Read Now

No items found.

Demystifying Fractional GPUs in Kubernetes: MIG, Time Slicing, and Custom Schedulers

As GPU acceleration becomes central to modern AI/ML workloads, Kubernetes has emerged as the orchestration platform of choice.

Read Now

No items found.