The AI & Cloud-Native Infrastructure Blog

Stay updated with the latest news and insights on AI and cloud-native infrastructure through Rafay's highly active blog site

  • All

Powering GPU Cloud Billing: Rafay + Monetize360 Integration

In the fast-evolving world of GPU cloud services and AI infrastructure, accurate, flexible, and real-time billing is no longer optional — it’s mission critical. That’s why Rafay has partnered with Monetize360 to deliver an end-to-end pricing, billing, and revenue management… Read More

Image for Slash EKS Cluster Costs by 20-30% Instantly with AWS Graviton

Slash EKS Cluster Costs by 20-30% Instantly with AWS Graviton

If you’re running Kubernetes workloads on Amazon EKS backed by Intel-based instances, you’re leaving significant savings on the table. In this blog, we will look at how many Rafay customers have been able to immediately cut compute costs by ~20-30% with minimal… Read More

Image for What Is a Sovereign Cloud and Why Does It Matter?

What Is a Sovereign Cloud and Why Does It Matter?

A sovereign cloud is a cloud computing solution that ensures data remains within a country’s borders and complies with local laws. By adhering to strict regulations, sovereign clouds provide enhanced security and data governance crucial for industries like government, healthcare,… Read More

Image for Introduction to Slurm-The Backbone of HPC

Introduction to Slurm-The Backbone of HPC

This is part-1 in a blog series on Slurm. In the first part, we will provide some introductory concepts about Slurm. We are not talking about the fictional soft drink in the world of Futurama. Instead, this blog is about Slurm (Simple… Read More

Image for Self-Service Slurm Clusters on Kubernetes with Rafay GPU PaaS

Self-Service Slurm Clusters on Kubernetes with Rafay GPU PaaS

In the previous blog, we discussed how Project Slinky bridges the gap between Slurm, the de facto job scheduler in HPC, and Kubernetes, the standard for modern container orchestration. Project Slinky and Rafay’s GPU Platform-as-a-Service (PaaS) combined provide enterprises and cloud… Read More

Image for Project Slinky: Bringing Slurm Scheduling to Kubernetes

Project Slinky: Bringing Slurm Scheduling to Kubernetes

As high-performance computing (HPC) environments evolve, there’s an increasing demand to bridge the gap between traditional HPC job schedulers and modern cloud-native infrastructure. Project Slinky is an open-source project that integrates Slurm, the industry-standard workload manager for HPC, with Kubernetes, the de… Read More

Image for Using Cilium as a Kubernetes Load Balancer: A Powerful Alternative to MetalLB

Using Cilium as a Kubernetes Load Balancer: A Powerful Alternative to MetalLB

In Kubernetes, exposing services of type LoadBalancer in on-prem or bare-metal environments typically requires a dedicated "Layer 2" or "BGP-based" software load balancer—such as MetalLB. While MetalLB has been the go-to solution for this use case, recent advances in Cilium, a… Read More

Image for Unlocking Sovereign AI: Rafay’s Role in NVIDIA and Accenture’s Strategic Alliance

Unlocking Sovereign AI: Rafay’s Role in NVIDIA and Accenture’s Strategic Alliance

In the rapidly evolving landscape of artificial intelligence (AI), nations and enterprises are increasingly prioritizing sovereignty—gaining control over their data, infrastructure, and AI capabilities. This shift is not merely about compliance; it's about fostering innovation, ensuring security, and maintaining cultural… Read More

Image for Accelerating Sovereign AI: Rafay’s Strategic Integration with NVIDIA’s Enterprise AI Factory

Accelerating Sovereign AI: Rafay’s Strategic Integration with NVIDIA’s Enterprise AI Factory

In the rapidly evolving landscape of artificial intelligence (AI), enterprises and nations are increasingly prioritizing sovereignty—gaining control over their data, infrastructure, and AI capabilities. Recognizing this imperative, Rafay Systems has integrated its platform with NVIDIA's Enterprise AI Factory, a comprehensive… Read More

Image for Cost Management for SageMaker AI: The Case for Strong Administrative Guardrails

Cost Management for SageMaker AI: The Case for Strong Administrative Guardrails

Enterprises are increasingly leveraging Amazon SageMaker AI to empower their data science teams with scalable, managed machine learning (ML) infrastructure. However, without proper administrative controls, SageMaker AI usage can lead to unexpected cost overruns and significant waste. In large organizations… Read More

Image for Simplifying AI Workload Delivery for Platform Teams in 2025

Simplifying AI Workload Delivery for Platform Teams in 2025

AI workloads are growing more complex by the day, and platform teams are under immense pressure to deliver them at scale—securely, efficiently, and with speed. Modern AI workloads require specialized hardware such as GPUs and TPUs to provide the computational… Read More

Image for Get Started with BioContainers using Rafay

Get Started with BioContainers using Rafay

In this step-by-step guide, the Bioinformatics data scientist will use Rafay's end user portal to launch a well resourced remote VM and run a series of BioContainers with Docker. Prerequisites Access to Rafay's end user self-service portal (i.e. Developer Hub)… Read More