Part 2: Self-Service Fractional GPU Memory with Rafay GPU PaaS
In Part-2, we show how you can provide users the means to select fractional GPU memory.
Read Now
As demand for GPU-accelerated workloads soars across industries, cloud providers are under increasing pressure to offer flexible, cost-efficient, and isolated access to GPUs. While full GPU allocation remains the norm, it often leads to resource waste—especially for lightweight or intermittent workloads.
In the previous blog, we described the three primary technical approaches for fractional GPUs. In this blog, we’ll explore the most viable approaches to offering fractional GPUs in a GPU-as-a-Service (GPUaaS) model, and evaluate their suitability for cloud providers serving end customers.
MIG is Ideal for Cloud Providers because of the following reasons.

Time Slicing can be an effective approach for the following reasons:

This approach is not ideal for cloud providers because of the following reasons:

The table below provides a scoring matrix comparing the three approaches.
For GPU cloud providers delivering fractional access to external customers, MIG (Multi-Instance GPU) offers the best combination of:
If your hardware supports it (A100, L40, H100), MIG should be your default fractional strategy. Use time slicing for budget-friendly plans and custom schedulers only in internal or cooperative environments.

In Part-2, we show how you can provide users the means to select fractional GPU memory.
Read Now

This is Part-1 in a multi-part series on end user, self service access to Fractional GPU based AI/ML resources.
Read Now

ArgoCD is a powerful GitOps controller for Kubernetes, enabling declarative configuration and automated synchronization of workloads.
Read Now