Running GPU Infrastructure on Kubernetes: What Enterprise Platform Teams Must Get Right
Scaling GPUs on Kubernetes is a governance problem, where utilization, cost control, and access define success.
Read Now
.png)
The GPU cloud market is evolving fast. At NVIDIA GTC 2026, one theme rang loud and clear: enterprises are no longer experimenting with AI, they are committing to it at scale. Training frontier models, fine-tuning domain-specific LLMs, and running large-scale inference workloads on NVIDIA gear require sustained, predictable access to high-end GPU infrastructure. That kind of commitment demands a billing model to match.
If you are running a GPU cloud business, you already know that a simple pay-as-you-go model doesn't cut it anymore. Your enterprise customers want options and your ability to offer those options is a direct competitive advantage. That's where Rafay comes in.
Rafay already supports on-demand and Monthly Recurring Charge (MRC) billing for GPU instances. With the upcoming Release 39, we are adding a third and critically important model: Reservations.
Together, these three models give you, the GPU cloud provider, the tools to serve every type of customer, from developers spinning up a quick experiment to enterprises running months-long model training workloads.
Consider a real-world scenario: a customer needs 16 NVIDIA H200 GPUs on bare-metal servers to complete a model training and fine-tuning project over six months. They can't afford to show up on day one and find capacity unavailable. They need a guarantee.
That's exactly what the Reservations feature delivers. Once you accept a reservation request, the customer is guaranteed access to that capacity for the agreed term. And importantly, billing begins immediately upon acceptance of the reservation regardless of whether instances are actively running. This mirrors how enterprise GPU commitments work in the real world, and gives you, the provider, predictable revenue from day one.
This feature was built in close consultation with GPU cloud providers and their customers. Here's what went into the design.

In the broader cloud world, "reservation" can sometimes be a soft commitment. In the GPU cloud, that ambiguity simply doesn't exist. When an enterprise has committed budget to secure NVIDIA H200 capacity for a six-month training run, they expect that capacity to be there, no exceptions.
Rafay's reservation system enforces this at the platform level. Reserved capacity is ring-fenced: if demand from on-demand or MRC instances would encroach on reserved capacity, the system prevents it. Your customers who have paid for a reservation will always have access to what they've been promised.
No two GPU cloud businesses manage inventory the same way, and Rafay's reservation system is designed to reflect that. As a provider, you're free to define the terms and intervals under which you accept reservations.
Reservations are configured across five key dimensions:
That last point is particularly powerful. If you offer two different VM SKUs backed by the same GPU type, a customer can hold a single reservation and draw capacity across both giving them operational flexibility without requiring multiple contracts.


Enterprise AI teams operate under strict cost controls. To support this, Rafay allows you to configure reservations so that customers cannot consume beyond their reserved allotment. This is an opt-in guardrail, but for customers who need hard budget ceilings on GPU spend especially those running multi-month training workloads it's an essential feature.

Rafay integrates with leading billing providers including Monetize360, Solvimon, and Amdocs, ensuring that information on on-demand charges, MRC subscriptions, and reservation commitments flow seamlessly into your existing invoicing and revenue operations workflows.
As NVIDIA continues to expand the capabilities of its GPU lineup from the Hopper generation to Blackwell, the demand for long-term, high-density GPU capacity is only going to grow. Enterprises making six- and seven-figure investments in AI infrastructure expect their cloud provider to support the kind of commercial structure that matches their planning horizon.
With Rafay's billing framework, you can meet customers where they are: flexible on-demand access for those still exploring, MRC contracts for teams with predictable workloads, and reservations for enterprises that need committed capacity to power their most critical AI initiatives.

Scaling GPUs on Kubernetes is a governance problem, where utilization, cost control, and access define success.
Read Now

GOpen-source momentum, driven in part by NVIDIA, is pushing GPUs into Kubernetes as native resources, with advances in allocation, scheduling, and isolation.
Read Now
.png)
A deep dive into OpenClaw as a gateway-centric AI runtime and how platform teams can deploy, secure, and scale it as a governed service on Kubernetes.
Read Now