BLOG

Choosing the Right Fractional GPU Strategy for Cloud Providers

July 14, 2025
Mohan Atreya
Mohan Atreya
No items found.

As demand for GPU-accelerated workloads soars across industries, cloud providers are under increasing pressure to offer flexible, cost-efficient, and isolated access to GPUs. While full GPU allocation remains the norm, it often leads to resource waste—especially for lightweight or intermittent workloads.

In the previous blog, we described the three primary technical approaches for fractional GPUs. In this blog, we’ll explore the most viable approaches to offering fractional GPUs in a GPU-as-a-Service (GPUaaS) model, and evaluate their suitability for cloud providers serving end customers.

Best Choice: MIG (Multi-Instance GPU)

MIG is Ideal for Cloud Providers because of the following reasons.

Advantage Benefit to Cloud Providers
Strong isolation Prevents noisy neighbors and ensures SLA-grade stability
Predictable performance Enables tiered plans with performance guarantees
Hardware-enforced partitioning Ensures consistent resource separation across tenants
Supports metering and quotas Maps well to billing per GPU instance / tenant
Kubernetes integration Easy to expose via nvidia.com/mig-<profi<code style="font-size: inherit;">le> resources

Runner-up: Time Slicing

Time Slicing can be an effective approach for the following reasons:

  • Cost-effective shared plans
  • R&D and dev/test environments
  • Elastic compute for batch or exploratory ML

⚠️ Limitations

  • Weaker isolation: No resource partitioning at hardware level
  • Inconsistent performance under load
  • Harder to bill precisely per tenant

Experimental: Custom Schedulers (e.g., KAI)

This approach is not ideal for cloud providers because of the following reasons:

  • No hardware isolation or enforcement
  • Requires complex scheduler extensions
  • Difficult to align with metering and billing
  • Resource usage enforcement is manual or cooperative

Conclusion: MIG is the Most Production-Ready

The table below provides a scoring matrix comparing the three approaches.

Criteria MIG Time Slicing Custom Scheduler (KAI)
Performance Isolation ✅ Strong ⚠️ Weak ⚠️ Soft
SLA-Friendly ✅ Yes ❌ Limited ❌ Not suitable
Billing Accuracy ✅ Per MIG profile ⚠️ Difficult ⚠️ Not enforceable
Multi-Tenancy Support ✅ Excellent ⚠️ Risk of contention ⚠️ Manual only
Deployment Complexity ⚠️ Moderate ✅ Low ❌ High
Hardware Support ❌ A100/L40 only ✅ All NVIDIA GPUs ✅ All NVIDIA GPUs

For GPU cloud providers delivering fractional access to external customers, MIG (Multi-Instance GPU) offers the best combination of:

  • ✅ Security and isolation
  • ✅ Performance predictability
  • ✅ Quota and billing integration
  • ✅ Multi-tenant scalability

If your hardware supports it (A100, L40, H100), MIG should be your default fractional strategy. Use time slicing for budget-friendly plans and custom schedulers only in internal or cooperative environments.

Share this post

Want a deeper dive in the Rafay Platform?

Book time with an expert.

Book a demo
Tags:

You might be also be interested in...

Product

Part 2: Self-Service Fractional GPU Memory with Rafay GPU PaaS

In Part-2, we show how you can provide users the means to select fractional GPU memory.

Read Now

Product

Self-Service Fractional GPUs with Rafay GPU PaaS

This is Part-1 in a multi-part series on end user, self service access to Fractional GPU based AI/ML resources.

Read Now

ArgoCD Reconciliation Explained: How It Works and Why It Matters

ArgoCD is a powerful GitOps controller for Kubernetes, enabling declarative configuration and automated synchronization of workloads.

Read Now