AI & ML FAQs | Rafay AI Infrastructure Platform

GPU/AI/ML FAQs

Find answers to common questions about Rafay's neocloud, sovereign AI cloud, and enterprise infrastructure orchestration and operationalization offerings and solutions and learn how they can benefit you.

What does Rafay do or provide around AI/ML or cloud-native adoption?

Rafay provides infrastructure orchestration and workflow automation for enterprises, cloud providers, neoclouds, and sovereign AI clouds. The Rafay Platform delivers a Platform-as-a-Service (PaaS) experience that enables companies to create customized compute environments for developers and data scientists. Rafay’s platform enables faster development and deployment of new capabilities while maintaining necessary controls and guardrails. By simplifying the process of implementing complex platforms, Rafay reduces the need for large teams of experts. In essence, Rafay streamlines cloud-native and AI/ML adoption by offering a ready-to-use platform that balances speed, efficiency, and security for businesses.

Does Rafay offer a GPU PaaS?

Yes, Rafay provides infrastructure orchestration and workflow automation for cloud-native (Kubernetes) and AI use cases for enterprises, cloud providers, neoclouds, and Sovereign AI clouds. Rafay helps companies deploy a Platform-as-a-Service (PaaS) experience that supports both CPU-only and GPU-accelerated compute environments. Platform teams can quickly set up and deliver customized self-service experiences for developers and data scientists, typically within days or weeks. This flexible platform allows end-users to easily access the computational resources they need, whether it’s standard CPU processing or more powerful GPU capabilities. Rafay’s solution streamlines the deployment and management of diverse computing environments, making it easier for organizations to support a wide range of applications, from standard software to complex AI/ML projects.

What does Rafay offer for ML workbenches?

Rafay provides curated ML workbenches that offer developers and data scientists an experience similar to Amazon SageMaker or Google VertexAI, but at a more competitive price point. The platform includes out-of-the-box services such as Notebooks-as-a-Service, with pre-compiled environments featuring TensorFlow, PyTorch, and other popular libraries for immediate productivity. For those preferring a job-based model, Rafay offers Ray-as-a-Service, allowing data scientists to focus on their work without dealing with infrastructure complexities. Advanced teams can opt for a Kubeflow-based ML workbench, which manages pipelines, experiment tracking, and model repositories. These solutions enable data science teams to work efficiently with their preferred tools while Rafay handles the underlying infrastructure management.

What does Rafay offer for GenAI playgrounds?

Rafay provides a controlled, cost-effective Generative AI playground for organizations new to GenAI. This environment allows data scientists to train, tune, and serve GenAI models, enabling efficient experimentation and development without significant investment or infrastructure complexity. It’s ideal for businesses looking to explore GenAI capabilities while managing costs and maintaining control over their AI initiatives.

Who uses Rafay's platform for AI/ML initiatives?

Rafay’s AI/ML platform is utilized by various organizations, particularly in the financial services sector. We’re also collaborating with major GPU vendors for specialized use cases. A notable public example of a company using our AI/GPU stack is MoneyGram, a global leader in cross-border P2P payments and money transfers.

How does Rafay’s platform accelerate time-to-value for AI/ML projects?

Without Rafay, platform teams implement complex platforms internally over multiple years and with large teams of experts.With Rafay, platform teams can deliver a finely tuned PaaS experience to internal users in weeks.

How does Rafay ensure compliance and governance for enterprise AI initiatives?

Rafay applies its proven governance and control features, originally developed for cloud-native projects, to AI/GPU initiatives. These capabilities include blueprinting, access management, chargebacks, and auditing/logging. This approach ensures that enterprises can maintain compliance and control over their AI projects, just as they do with other cloud-native initiatives. By leveraging these established features, Rafay helps organizations accelerate AI adoption while maintaining the necessary governance standards, ultimately leading to increased revenues and lower total cost of ownership for both cloud-native and AI/ML projects.

How does Rafay's platform streamline AI/ML infrastructure management for enterprise adoption?

Rafay enables enterprise platform teams to deliver a PaaS experience for GPU resources, both on-premises and in the cloud. The platform offers a cost-effective alternative to services like Amazon SageMaker or Google VertexAI, providing ML workbenches with similar functionality. Rafay’s self-service model and hierarchical experience sharing allow platform teams to selectively offer compute and ML workbench experiences to different teams, optimizing access to expensive GPU resources. Additionally, the platform includes chargeback capabilities to ensure fair cost allocation among internal teams. This comprehensive approach simplifies AI/ML infrastructure management, accelerating enterprise adoption while maintaining cost control and resource efficiency

Does Rafay provide AI/ML workbenches and other tooling?

Yes, Rafay offers a comprehensive suite of AI/ML tools. The platform provides out-of-the-box workbenches based on Kubeflow and KubeRay, delivered as fully managed services. This allows users to access sophisticated AI/ML platforms without dealing with infrastructure complexities. Additionally, Rafay includes a low-code/no-code framework that enables partners to rapidly develop and deploy specialized AI solutions such as verticalized agents, co-pilots, and document translation services. This combination of ready-to-use workbenches and a flexible development framework streamlines the adoption and customization of AI/ML tools for various enterprise needs, accelerating time-to-market for new AI capabilities.

Is GPU virtualization supported?

Yes. The Rafay Platform supports three GPU sharing modes that operators can offer to tenants in self-service: full passthrough (one physical GPU per workload, optimal for large training runs), NVIDIA MIG (Multi-Instance GPU) partitioning (up to seven isolated MIG instances per A100 or H100, each with dedicated memory and compute), and time-slicing (multiple workloads sharing a GPU in time-multiplexed fashion, suited for lower-intensity inference or development workloads). Operators configure which sharing modes are available per SKU through PaaS Studio; tenants select the appropriate GPU size from the catalog without needing to understand the underlying partitioning mechanism. Security and compute isolation between MIG instances is enforced at the NVIDIA hardware level; chargeback data is collected per MIG instance or per time-slice allocation for granular cost attribution across tenants and business units.

How does Rafay solve for chargeback and billing?

Rafay offers a comprehensive solution for chargebacks and billing. The platform collects granular chargeback information on resource usage, which can be easily exported to customers’ existing billing systems for further processing and distribution. Rafay allows for customizable chargeback group definitions to align with organizational structures or projects. Both group definition and data collection can be carried out programmatically, enabling efficient and accurate billing processes.

How is Rafay different from Run.AI?

Run:AI focuses on providing fractional/virtualized GPU consumption and a proprietary scheduler optimized for AI/GenAI workloads, replacing the default Kubernetes scheduler. Rafay, however, provides a more comprehensive platform that manages the full lifecycle of underlying Kubernetes clusters and environments. Rafay offers an out-of-the-box experience to deploy and consume Run:AI on Rafay’s GPU PaaS, while also providing its own GPU virtualization and AI-friendly Kubernetes scheduler for customers preferring a single-vendor solution. Essentially, Rafay can either complement Run:AI’s offerings or provide a standalone solution that covers similar functionalities along with broader infrastructure management capabilities, giving customers flexibility in their AI infrastructure choices.

Does Rafay support NVIDIA NIMs/NIM?

Yes, Rafay supports NVIDIA NIM (NVIDIA Inference Microservices). NIM is NVIDIA’s proprietary solution for delivering packaged inferencing capabilities. It comes pre-configured with NVIDIA’s in-house models and has been optimized for use with a wide range of open-source models, including Meta’s Llama variants. While NIM is often viewed as an alternative to the open-source kServe package, Rafay’s platform supports both NIM and kServe. This flexibility allows customers to choose their preferred inference endpoint and deploy it effortlessly on GPU instances using the Rafay platform. By supporting multiple inferencing solutions, Rafay enables organizations to leverage the most suitable tools for their specific AI/ML needs while maintaining a consistent and manageable infrastructure.

Why consider Rafay's solution over AWS SageMaker or Google Vertex AI?

While AWS SageMaker and Google Vertex AI offer fully managed services, Rafay’s Kubernetes and Kubeflow-based MLOps solution provides distinct advantages. It offers vendor agnosticism, allowing deployment across various cloud providers or on-premises, thus avoiding vendor lock-in. Rafay’s approach enables greater customizability, giving users more control over their infrastructure and workloads. It can also be more cost-efficient, as managing your own Kubernetes clusters allows for optimized resource utilization. This combination of flexibility, control, and potential cost savings makes Rafay’s solution appealing for organizations seeking a tailored and adaptable MLOps environment that can evolve with their specific needs and infrastructure preferences.

How does Rafay's solution fit into existing AWS/Google Cloud workflows?

Rafay’s MLOps platform is designed to seamlessly integrate with existing cloud ecosystems, including AWS and Google Cloud. The solution supports integration with various cloud services, allowing organizations to leverage their current investments and workflows. Rafay’s platform excels in hybrid and multi-cloud environments, providing a unified interface to manage MLOps workflows consistently across different infrastructures. This approach enables businesses to maintain their existing cloud relationships while gaining the added benefits of Rafay’s flexible, vendor-agnostic platform. By bridging the gap between different cloud environments, Rafay allows organizations to optimize their MLOps processes without disrupting established workflows, offering a smooth transition and enhanced capabilities for AI/ML initiatives.

Will managing Kubernetes and Kubeflow add complexity compared to fully managed services?

While Kubernetes and Kubeflow management can be complex, Rafay’s platform is specifically designed to simplify these processes. The solution addresses potential complexity in three key ways:User-Friendly Interface: Rafay provides an intuitive UI and automation tools that significantly reduce the complexity typically associated with Kubernetes. Managed Kubernetes Service:

The platform offers managed Kubernetes services that handle cluster provisioning, scaling, and maintenance, allowing teams to focus on developing models rather than managing infrastructure.

Expert Support: Rafay provides comprehensive support and documentation to help teams navigate any challenges, effectively reducing the learning curve. This approach enables organizations to harness the power and flexibility of Kubernetes and Kubeflow without the added complexity.

What about the cost? Are there hidden expenses in managing our own infrastructure?

Rafay aims to provide transparent and potentially cost-saving solutions for managing AI/ML infrastructure. The platform addresses cost concerns in three key areas:Transparent Pricing: Rafay offers clear pricing models without hidden fees that can be associated with fully managed services. Cost Control: By managing your own infrastructure through Rafay, you can optimize resource usage and avoid over-provisioning, potentially leading to significant cost savings. Avoiding Vendor Premiums: Fully managed services often come with a premium for convenience. Rafay enables you to balance convenience and cost effectively. This approach allows organizations to have greater control over their infrastructure costs while still benefiting from the ease of use provided by Rafay’s platform.

What's Rafay's stance on support and reliability compared to established providers?

Rafay is committed to providing enterprise-grade support and reliability, comparable to established providers like AWS and Google. The platform offers dedicated support teams to assist with any issues, ensuring minimal downtime and quick resolutions. Rafay’s technology stack is built on mature, widely adopted open-source technologies like Kubernetes and Kubeflow, which are trusted across the industry. This foundation provides a robust and reliable infrastructure for AI/ML workloads. Additionally, Rafay’s focus on MLOps allows for specialized support that may not be available with more generalized cloud providers. By combining proven technologies with dedicated, specialized support, Rafay aims to deliver a reliable and well-supported platform that meets the high standards expected in enterprise environments.

How do Rafay's GPU PaaS and MLOps offerings benefit an AWS sales team?

Rafay’s offerings complement AWS services in two key ways, benefiting both customers and AWS sales teams. For customers using SageMaker and Bedrock, Rafay enhances AWS’s ecosystem with additional cloud-native and Kubernetes management capabilities.For customers hesitant to use SageMaker or Bedrock, Rafay provides a similar experience that can be fully deployed within AWS accounts, addressing concerns about cost or data exposure. Crucially, Rafay’s solutions drive direct compute consumption on AWS, contributing to customers’ Enterprise Discount Program (EDP) commitments. This helps AWS sales teams meet their targets and potentially expand future EDPs, making Rafay a valuable partner in the AWS ecosystem that can increase overall AWS usage and revenue.

What integrations does Rafay support, and what SLA and roadmap information is available?

Rafay is designed to integrate with existing cloud, AI, security, identity, and developer ecosystems. The platform supports infrastructure automation, GitOps workflows, enterprise identity providers, observability tools, Kubernetes environments, and leading AI frameworks and services, allowing organizations to operationalize AI infrastructure without replacing existing tools and processes.

Rafay also integrates with popular cloud-native, AI, and enterprise technologies, including NVIDIA AI software, infrastructure automation tools, monitoring platforms, identity providers, and developer workflows.

Support SLAs vary by deployment model and customer requirements. Customers should contact Rafay for the latest SLA options, support plans, and product roadmap information.

Does Rafay offer customizable solutions and professional services for industry and compliance needs?

Yes. Rafay supports custom environment templates, custom SKUs, white-labeled portals, API-based portal embedding, self-hosted and air-gapped deployments, third-party application onboarding, and professional services to help customers tailor workflows, compute packages, AI services, policies, and marketplace experiences to specialized requirements.

Rafay enables organizations to package compute and application environments as reusable SKUs and templates. These can include Kubernetes clusters, virtual clusters, bare-metal servers, SLURM clusters, NVIDIA NIM services, Run:ai, NVIDIA Cloud Functions, custom AI applications, and AI Blueprints. Customers can define provisioning logic, inputs, outputs, icons, documentation, policies, TTLs, quotas, billing metadata, and access rights.

What hardware/software vendor partnerships are critical for Rafay's AI platform, and what customization is possible?

Rafay’s AI infrastructure ecosystem includes close alignment with NVIDIA and a growing network of infrastructure, OEM, systems integration, and software partners. Rafay supports NVIDIA GPU infrastructure and NVIDIA AI software, including NVIDIA NIM, NVIDIA NeMo, NVIDIA Cloud Functions, NVIDIA AI Enterprise software, NVIDIA-certified systems, and NVIDIA-validated reference architectures.

Rafay is also aligned with key infrastructure and solution partners such as Cisco, Dell, Penguin Solutions, Accenture, and others that help organizations design, deploy, and operationalize AI infrastructure at scale. For example, Rafay works with Cisco AI PODs to help transform validated AI infrastructure into self-service GPU cloud platforms, and collaborates with systems integrators such as Accenture to support enterprise and sovereign AI deployments.

At the software layer, Rafay supports integrations with ecosystem tools such as Run, Ray, Kubeflow, Jupyter, SLURM, and other AI, MLOps, and cloud-native technologies. This allows customers to bring their preferred hardware, software, AI tools, and operational workflows into a governed platform model.

Rafay is positioned as the orchestration, governance, and consumption layer across these environments. Customers can customize compute SKUs, AI service catalogs, white-labeled portals, application templates, billing metadata, policies, quotas, identity controls, and deployment models to fit their infrastructure strategy, partner ecosystem, compliance requirements, and commercial AI services.

What are Rafay's deployment requirements and air-gapped capabilities?

Rafay supports multiple deployment models, including fully managed SaaS, self-hosted, and air-gapped deployments. The platform can be deployed across public clouds, private data centers, sovereign cloud environments, and highly regulated environments while providing a consistent self-service and governance experience.

For self-hosted deployments, organizations require dedicated infrastructure that meets Rafay's sizing and high-availability recommendations. SaaS deployments primarily require secure outbound connectivity between managed environments and the Rafay platform. Rafay also supports a range of enterprise infrastructure options, including NVIDIA-certified systems and modern x86-64 or qualified ARM-based environments for AI and GPU cloud use cases.

Because infrastructure, networking, and hardware requirements vary by deployment model and use case, organizations should consult Rafay's deployment documentation or work with a Rafay solutions architect to determine the appropriate configuration.

Can Rafay provide specific ROI examples or case studies?

Yes. Rafay's materials cite customer outcomes such as 63% lower cloud costs, 4x more frequent deployments, 76% lower MTTR, 20 to 25% of developer time regained, and platform teams reduced to small teams managing large-scale infrastructure. The TELUS case study also describes Rafay's role in powering a sovereign, developer-ready AI Studio with self-service provisioning, governance, usage metering, and marketplace-style AI services.

How does Rafay differentiate itself from competitors, hyperscalers, and in-house builds?

Rafay differentiates by combining the following capabilities into one platform:

Self-service GPU and AI service consumption
GPU orchestration and utilization management
Multi-tenancy and governance
SKU management
Billing, chargeback, and usage visibility
AI application and inference service delivery
Kubernetes lifecycle management
NVIDIA integrations
Air-gapped and sovereign deployment options
White-labeled cloud provider experiences

Rafay is positioned beyond basic Kubernetes management because it helps organizations operationalize, govern, and monetize AI infrastructure as a platform.

Hyperscaler services such as EKS, AKS, and GKE provide managed Kubernetes within specific clouds. Rafay provides a cross-environment orchestration and consumption layer that can manage Kubernetes, GPU resources, self-service workflows, governance, chargeback, and AI services across public clouds, private data centers, sovereign environments, and air-gapped deployments. While NVIDIA provides accelerated hardware and AI software components, Rafay operationalizes those components into self-service, multi-tenant platforms.

Building an in-house platform requires teams to assemble Kubernetes management, GPU scheduling, multi-tenancy, RBAC, quotas, cost attribution, self-service portals, billing, templates, AI service catalogs, observability, and compliance controls. Rafay reduces that build burden by providing these capabilities as a platform, helping teams accelerate time-to-market and avoid long-term platform debt.

What is the typical implementation timeline, and how does Rafay assist with migration from other platforms?

Rafay supports rapid time-to-value, with customer and partner examples of launching self-service GPU clouds in days and production-ready GPU cloud offerings in under six weeks, depending on the deployment scenario.

However, actual implementation timelines vary based on factors such as infrastructure readiness, security and compliance requirements, air-gapped deployment needs, integrations, service catalog complexity, and available internal resources. As a result, implementation timelines and professional services requirements should be assessed and scoped based on the customer's specific environment, deployment model, and business objectives.

Additionally, Rafay supports onboarding and managing existing Kubernetes environments, applying standardized blueprints and policies, integrating CI/CD and GitOps workflows, deploying AI services such as NVIDIA NIM and Run:ai, and creating repeatable environment templates for AI and cloud-native workloads. This is most effective when framed as part of Rafay's broader self-service consumption and governance model, rather than as a standalone Kubernetes migration story.

Does Rafay offer free trials, POC programs, or sandbox environments?

Rafay can support proofs of concept or pilot evaluations, but prospects should confirm current POC options directly with Rafay.

How is the Rafay Platform priced, and what are the available pricing tiers and payment terms?

Rafay pricing depends on deployment model, scale, support requirements, use cases, and whether the customer needs SaaS, self-hosted, air-gapped, GPU orchestration, AI services, or professional services. Prospects should contact Rafay for a tailored quote.

The Rafay Platform supports multiple consumption and deployment models, including SaaS, self-hosted, air-gapped, GPU PaaS, Kubernetes management, SKU management, and AI services. Specific tiers, service levels, GPU/CPU fees, and usage-based costs should be confirmed with Rafay Sales. Payment terms, subscription models, long-term contract commitments, and cancellation policies should be confirmed with Rafay's sales or legal team.

What professional services, training programs, and onboarding does Rafay offer?

Rafay provides services to help customers implement the Rafay Platform, set up orchestration agents, onboard environments, create custom compute and application SKUs, operationalize GPU PaaS offerings, and support cloud provider or enterprise teams during deployment. The platform also provides enablement through documentation, customer success, and professional services. Prospects should contact Rafay for current training options.

A typical Rafay onboarding process includes platform deployment or SaaS setup, network and identity integration, orchestration agent installation, environment and SKU configuration, Kubernetes or GPU resource onboarding, governance policy definition, self-service portal setup, usage metering configuration, documentation, and customer success or service delivery support. Exact onboarding steps vary by SaaS, self-hosted, air-gapped, enterprise, or cloud provider deployment model.

Can Rafay help with cost optimization, and what reporting and analytics does it provide?

Yes. Rafay helps optimize costs by improving resource utilization, reducing idle GPU and CPU capacity, enforcing quotas, supporting TTLs and schedules, enabling cost visibility, and providing usage data for chargeback and billing. Rafay's materials cite customer outcomes, including lower cloud costs and improved infrastructure efficiency.

Rafay improves cost optimization through GPU pooling, GPU slicing, quota enforcement, TTLs, schedules, SKU-based provisioning, usage dashboards, billing APIs, chargeback groups, cost reports, and per-tenant usage tracking. This helps organizations identify underutilized resources, attribute costs to teams or tenants, reduce waste, and make GPU and CPU consumption more predictable.

Rafay provides reporting and analytics for resource usage, GPU inventory, allocation, SKU usage, instance activity, user attribution, health status, uptime, failed instances, tenant usage, cost allocation, billing metrics, token consumption, and infrastructure utilization. Usage data can be consumed through dashboards, APIs, CSV exports, or integrated with external finance, billing, SIEM, monitoring, or FinOps systems.

How does Rafay help with Kubernetes cluster lifecycle management?

Kubernetes lifecycle management is one foundation of the Rafay Platform, but it should be positioned as part of Rafay's broader AI infrastructure consumption layer. Rafay supports Kubernetes lifecycle management across public cloud, private cloud, bare metal, and edge environments, including provisioning, upgrades, policy enforcement, blueprinting, fleet operations, add-on management, drift detection, centralized visibility, secure access, and lifecycle automation for large cluster fleets.

How does the Rafay Platform ensure governance, compliance, and standardization?

Rafay enforces governance through enterprise SSO, RBAC, role- and project-based access controls, network policies, resource quotas, cluster policies, OPA Gatekeeper integrations, audit logging, zero-trust access, standardized blueprints, GitOps workflows, and policy-driven environment templates.

The platform also provides hierarchical multi-tenancy and tenant isolation for secure governance at scale. Together, these controls help organizations standardize infrastructure delivery while maintaining security, compliance, and operational consistency. Rafay is SOC 2 Type II compliant, with annual audits performed by independent auditors.

How does the Rafay Platform enable self-service for developers and data scientists?

Rafay enables developers and data scientists to launch approved infrastructure and AI environments through web portals, APIs, CLIs, and catalog-based SKUs. Users can request GPU-powered workspaces, clusters, inference services, notebooks, model environments, and application stacks without manual tickets, while platform teams retain control through policies, quotas, RBAC, and audit logs.

Developers and data scientists can get started through self-service portals, APIs, CLIs, pre-approved SKUs, curated environment templates, NVIDIA NIM templates, Jupyter notebooks, Ray, Kubeflow, Run:ai, virtual clusters, GPU slices, and marketplace-style catalogs. Rafay reduces setup friction by automating infrastructure, networking, quotas, RBAC, and dependency management behind the scenes.

What are the key features and capabilities of Rafay's AI Token Factory?

Key Token Factory capabilities include:

Token-metered usage
Multi-tenant AI service delivery
Governance and isolation
OpenAI-compatible APIs for developers
Elastic GPU scaling for inference workloads
Usage tracking for chargeback and billing
White-label API portals
Integration with billing and finance systems

What is Rafay's AI Token Factory, and how does it monetize AI services?

Rafay's AI Token Factory converts GPU inference infrastructure into governed, token-metered AI services. Instead of exposing raw GPUs or clusters, organizations can publish model APIs that developers and applications consume. Token usage is tracked, governed, and attributed, enabling providers to monetize AI capabilities as services rather than simply selling compute capacity.

This shifts the economic model of AI consumption from infrastructure-based pricing to service-based pricing. Rather than charging for GPU hours or cluster access, organizations can meter and bill based on token consumption, supporting usage-based business models such as model-as-a-service, inference-as-a-service, AI marketplaces, customer billing, and internal chargeback.

The greatest value comes from enabling organizations to package and deliver AI through APIs while maintaining governance and consumption visibility. Common use cases include developer-facing AI platforms, tenant-specific model APIs, sovereign AI services, and consumption-based AI offerings for enterprises, telecom providers, and neocloud operators.

How does Rafay enable and orchestrate AI factories?

Rafay helps organizations build and run AI factories by providing the operational layer that turns raw GPU infrastructure into secure, governed, self-service AI platforms. It helps organizations expose GPUs, clusters, AI tools, model services, and inference APIs through catalogs and SKUs while enforcing multi-tenancy, quotas, policies, usage visibility, and cost attribution.

Rafay provides the operating layer for AI factories. NVIDIA and infrastructure vendors provide accelerated hardware and AI software components; Rafay makes that infrastructure consumable, governable, scalable, and monetizable through orchestration, multi-tenancy, self-service access, policy enforcement, usage visibility, and service catalogs.

What are the key capabilities, solutions, and types of services that can be launched with the Rafay Platform?

Rafay provides the operational layer organizations need to deliver self-service AI infrastructure and cloud services at scale. Core capabilities include multi-tenancy, governance, policy enforcement, infrastructure orchestration, cost visibility, chargeback, identity and access controls, and lifecycle management for AI and cloud-native environments.

Using Rafay, organizations can package infrastructure into consumable services such as GPU-as-a-Service, Kubernetes-as-a-Service, SLURM-as-a-Service, AI Models-as-a-Service, inference services, developer workspaces, AI infrastructure platforms, and marketplace-style AI offerings. These services can be delivered through self-service portals, APIs, and catalogs while maintaining governance, security, and operational control.

Who is the target audience for the Rafay Platform?

Rafay is built for enterprises, GPU cloud providers, NeoClouds, sovereign AI cloud operators, telcos, and platform teams. The strongest fit is organizations that need to operate AI and cloud-native infrastructure at scale while enforcing governance, isolation, cost controls, self-service access, and consumption-based monetization.

What are the general benefits of using the Rafay Platform?

The Rafay Platform helps organizations:

Accelerate infrastructure consumption
Improve developer productivity
Increase GPU and CPU utilization
Enforce governance
Reduce operational overhead
Enable cost attribution
Enable monetizable AI services

Rafay also helps infrastructure teams provide a cloud-like experience without forcing them to build a full platform internally.

What core problems does the Rafay Platform address for businesses?

Rafay addresses the gap between owning infrastructure and making it usable. Many organizations invest heavily in GPUs, Kubernetes, and cloud infrastructure, but teams still face manual provisioning, ticket queues, fragmented governance, low utilization, and limited cost visibility. Rafay solves this by turning compute infrastructure into self-service, policy-governed environments that developers, data scientists, and tenants can consume on demand.

What is the Rafay Platform?

The Rafay Platform helps enterprises, NeoClouds, cloud service providers, telcos, and sovereign AI clouds turn GPU and CPU infrastructure into consumable AI platforms, self-service cloud environments, and revenue-generating services. From the moment infrastructure is available, whether GPUs, CPUs, bare metal, virtual machines, Kubernetes clusters, or HPC environments, Rafay makes it productive, governed, and ready for consumption.

The platform provides the operational layer needed to package infrastructure into governed services that developers, data scientists, enterprise teams, and external customers can consume on demand. For enterprises, Rafay enables developer self-service, AI workload delivery, Kubernetes and GPU lifecycle management, governance, cost visibility, and policy enforcement across hybrid environments.

For NeoClouds, telcos, cloud service providers, and sovereign AI clouds, Rafay provides the platform foundation to launch differentiated AI services, monetize compute investments, and deliver cloud-like customer experiences with multi-tenancy, billing, chargeback, SKU management, white-label portals, Token Factory for token-metered AI delivery, and SLURM-as-a-Service for elastic HPC.

In practical terms, Rafay helps organizations move up the stack from managing infrastructure to delivering AI services, from allocating hardware to monetizing consumption, and from operating clusters to running production-grade AI platforms.

What is the relationship between AI Token Factory and a Token Delivery Network?

AI Token Factory is a platform capability that helps enable a Token Delivery Network.

A Token Delivery Network describes the distributed architecture for delivering AI inference from the best available endpoint. Rafay Token Factory provides capabilities for exposing model APIs, metering token usage, enforcing tenant controls, and supporting monetization across distributed GPU infrastructure.

How does Rafay help operators move from GPU infrastructure to AI services?

Rafay helps operators transform GPU infrastructure into self-service AI platforms with governance, multi-tenancy, metering, catalogs, API access, and monetization workflows.

With Rafay, operators can package compute and AI services into SKUs, deploy model endpoints, expose self-service APIs, enforce RBAC and policy controls, monitor usage, and support token-metered consumption. This helps organizations move from managing raw GPU infrastructure to delivering AI services that developers and customers can consume directly.

How does a Token Delivery Network support AI monetization?

A Token Delivery Network supports AI monetization by turning model access into a measurable service.

Instead of selling only GPU hours or infrastructure access, operators can expose AI models, agents, and applications through APIs. Usage can be measured at the token level, packaged into SKUs, assigned to tenants or customers, and connected to billing, chargeback, or consumption-based pricing models.

Is a Token Delivery Network only for edge AI?

A Token Delivery Network is not limited to edge AI. A TDN can span centralized data centers, cloud regions, sovereign data centers, enterprise private clouds, neocloud GPU environments, and programmable edge locations.

The core idea is not that every inference request must run at the edge. The core idea is that inference should run from the best available endpoint based on proximity, performance, policy, sovereignty, capacity, and cost.

Who needs a Token Delivery Network?

Token Delivery Networks are relevant for telcos, neoclouds, sovereign cloud providers, enterprises, and platform operators that need to deliver AI inference across distributed GPU infrastructure.

These organizations may need to support low-latency inference, regional or sovereign AI services, internal AI platforms, model-as-a-service offerings, AI marketplaces, or token-metered API consumption. A TDN helps them turn distributed GPU capacity into consumable AI services.

What open-source and proprietary components make up the Rafay stack?

Rafay's platform is built predominantly on open-source components with a focused set of proprietary layers. Open-source components include Kubernetes, Prometheus, OpenTelemetry, Slurm, Terraform and OpenTofu, Helm, Harbor, MLflow, Argo Workflows, vLLM, NVIDIA GPU Operator, Calico or Cilium, CoreDNS, Keycloak, and SPIFFE/SPIRE. The proprietary components are the Rafay controller, middleware, workflow engine, metering engine, PaaS Studio SKU designer, and Token Factory. No proprietary data formats are used — models are stored in ONNX, SafeTensors, and HuggingFace formats; pipelines use Argo or Kubeflow YAML; infrastructure uses Terraform and Helm — which means operators are not locked into Rafay-specific artifacts if they choose to migrate in the future.

How do Rafay and systems integrator or OEM partners divide the solution?

Rafay delivers the software platform: orchestration, multi-tenancy, governance, self-service portals, metering, billing integration, and AI service delivery. Systems integrator and hardware OEM partners deliver the physical foundation — servers, networking, storage hardware, rack integration, and facility-level services — along with regional professional services and field support. The two layers combine into a single solution where Rafay operationalizes the partner's infrastructure into a self-service, multi-tenant AI platform. In the Build-Validate-Operate-Transfer methodology, Rafay leads the software platform across all four phases while the SI provides hardware integration, local professional services, and on-site hands during the Operate and Transfer phases.

Can I bring my own corporate identity provider to Rafay?

Yes. Rafay federates with external corporate identity providers via SAML 2.0 and OIDC, including Okta, Azure Active Directory, Keycloak, Google Workspace, GitHub, Microsoft, and SailPoint, as well as any SAML 2.0-compliant provider. Users authenticate through their existing identity provider with MFA enforced at that layer, so Rafay does not manage credentials directly. Identity provider group membership maps automatically to platform roles, so access governance stays driven by the organization's existing directory rather than requiring manual role assignment in Rafay. Rafay's IAM layer supports a five-tier hierarchy of organization, project, and workspace, with fine-grained RBAC and predefined or custom roles assignable at each level.

What compute form factors can I consume as a Rafay tenant?

Rafay lets tenants consume bare metal servers, GPU-accelerated VMs, managed Kubernetes clusters, Slurm HPC clusters, and containers from a single self-service catalog. Bare metal provides dedicated GPU servers with no virtualization overhead, which is optimal for large-scale training workloads. GPU-accelerated VMs provide isolated compute with live migration and snapshot capabilities. Managed Kubernetes clusters are provisioned on demand with full administrative rights scoped to the tenant's environment. Tenants can select the form factor that fits each workload — and can use multiple form factors within the same tenancy — without needing to request access through a platform team.

Can I white-label the Rafay customer portal under my own brand?

Yes. The Rafay platform supports full white-label customization, so GPU cloud providers and neoclouds can present the self-service portal entirely under their own brand. Logo, colors, domain name, and product name are configurable per white-labeled partner. Language, number format, currency display, and unit systems are configurable at the tenant level. This delivers a branded, hyperscaler-style self-service experience to end customers without building a portal from scratch.

How does multi-tenancy work for neoclouds sharing a GPU fleet across customers?

Rafay enforces hard multi-tenancy so many customers can share one GPU fleet without sharing blast radius, data, or network access. Network isolation is implemented with per-tenant VRF and VLAN for north-south traffic and InfiniBand PKEY isolation via NVIDIA UFM for east-west GPU traffic. Each bare metal tenant receives a dedicated provisioning head node, and storage uses dedicated namespaces, access zones, and per-tenant bucket policies. Operators govern the entire fleet from a single control plane while every tenant remains cleanly separated — with their own quotas, RBAC policies, and performance isolation that prevents noisy-neighbor effects.

How does Rafay improve GPU utilization and margins for neoclouds and GPU cloud providers?

Rafay increases GPU utilization by enabling shared, fractional GPU consumption through NVIDIA MIG partitioning and time-slicing, which allow multiple workloads or tenants to share physical GPUs that would otherwise sit idle between large training jobs. Quota-based allocation and self-service provisioning keep more of the fleet active at any given time, reducing the stranded capacity that drives down utilization rates. On the margin side, Rafay helps providers move up the value stack from commodity GPU-hour rental toward token-metered AI services — which command stronger price points and higher margins than raw compute. Providers can offer foundation, compute, and AI SKUs additively on the same fleet, shifting their revenue mix toward AI services without re-platforming.

What service levels does Rafay support for a production AI cloud?

Rafay supports enterprise service-level agreements covering platform availability, incident response and resolution times, and security patching, with prioritized severity tiers for production environments. During the Operate phase of the Build-Validate-Operate-Transfer methodology, Rafay runs and maintains the platform under agreed SLA targets while the operator's team shadows for handover — providing a live, SLA-backed operating period rather than a handover on day one. Specific availability targets and incident response commitments are defined per engagement based on the operator's requirements and the scale of the deployment.

How does Rafay help federate multiple AI gigafactory sites across regions?

Rafay manages multiple geographically distributed AI gigafactory sites from a single logical control plane, enabling operators to treat a multi-site GPU fleet as one governed resource pool. The platform provides a federated resource view, cross-site workload scheduling, federated identity and policy management, secure site interconnection, and per-site data sovereignty enforcement, with multi-provider interoperability through open APIs. Each site maintains its own sovereignty boundary and can operate independently while being orchestrated centrally — supporting federated European AI gigafactory models where different national sites must remain jurisdictionally separate. Operators can manage the full fleet, including tenant onboarding, node lifecycle, and capacity allocation, from one control plane without collapsing the per-site isolation that sovereignty requirements demand.

What hardware does Rafay's platform support?

Rafay's platform is hardware-agnostic by design and supports current and future NVIDIA GPU generations. Validated configurations include NVIDIA GB200 and GB300 NVL72 rack-scale systems, HGX B100, B200, and B300 NVL8 multi-GPU servers, and RTX PRO 6000 servers. ARM and x86 CPU partitions are managed through a unified control plane, with ARM64-native OS images and GPU drivers deployed where required. On the storage side, Rafay integrates with WekaFS, DDN A3I, VAST Data, Ceph, Dell PowerScale and ObjectScale, Lightbits, and NVMe-oF through native APIs. For networking, Rafay supports NVIDIA Cumulus, Cisco, Netris, and Aviz for switching, and Palo Alto Networks, Fortinet, Juniper, and VyOS for firewalls.

How does Rafay keep data and operations within national or EU sovereignty boundaries?

Rafay enforces data residency and operational sovereignty by running entirely on the operator's own premises, within their own facilities and jurisdiction. The Rafay controller, all operational tooling, telemetry, logs, and configuration data run locally — nothing is routed through Rafay-hosted infrastructure or leaves the sovereign boundary as a condition of platform operation. Data residency is enforced architecturally rather than by policy alone: because there is no required external connectivity, tenant data cannot leave the environment even in the event of a misconfiguration. EU-based systems integrator partners provide in-region delivery, support, and professional services, keeping the full operational chain within the required geographic boundary. Rafay holds SOC 2 certification and provides a structured EU Cloud Sovereignty self-assessment mapped to the European Commission framework.

How does Rafay align with ENS and EU sovereignty frameworks?

Rafay supports the security measures defined by ENS (Esquema Nacional de Seguridad) Nivel Alto across confidentiality, integrity, availability, and traceability. This includes encryption at rest and in transit, immutable audit logs, high-availability and disaster recovery capabilities, and complete audit trails for tenant and administrative activity. Rafay holds SOC 2 certification and provides a structured EU Cloud Sovereignty self-assessment mapped to the European Commission framework. EU and Spanish regulatory certifications are pursued in collaboration with EU-based systems integrator partners who provide the regional delivery and support infrastructure needed to satisfy in-country requirements.

How quickly can a neocloud go from raw GPUs to billable services with Rafay?

A neocloud can typically reach its first billable AI services in approximately six to eight weeks using the Rafay Platform, compared to the many months a from-scratch platform build would require. Rafay provides the multi-tenant operating layer, self-service portal, SKU design tools, metering engine, and billing integration that neoclouds would otherwise need to build themselves. Once the platform is running, operators can expand their service catalog — adding new GPU SKUs, inference endpoints, or AI service tiers — without re-platforming. The accelerated timeline means neoclouds can begin generating token-metered revenue while their infrastructure is still scaling, rather than waiting for a complete build-out.

What is the Build-Validate-Operate-Transfer (BVOT) methodology?

Build-Validate-Operate-Transfer is Rafay's structured methodology for deploying and handing over a production AI cloud to an operator's team. The Build phase installs and integrates the full software stack and delivers runbooks and as-built diagrams. Validate confirms functional and failure-scenario behavior, including a complete shutdown and startup cycle, before any production traffic runs. During Operate, Rafay runs the platform under agreed service levels while the operator's team shadows and learns every Level 1, 2, and 3 procedure — covering tenant onboarding, node lifecycle, upgrades, and incident response. Transfer formally hands over all operational artifacts, trains the team, and requires demonstrated independent operation before acceptance is signed off. The methodology is designed so that sovereign and enterprise operators can self-sustain the platform without ongoing Rafay involvement.

Is Rafay NVIDIA AI Cloud Ready and NVIDIA-validated?

Yes. Rafay is an NVIDIA AI Cloud Ready validated ISV and is compliant with the NVIDIA NCP (NCP) software reference architecture. Rafay is qualified for NVIDIA offtake demand models, supports API-driven consumption of infrastructure, and enables service-based delivery of AI workloads including NVIDIA NIM and NeMo. The Rafay Platform supports NVIDIA GB200 and GB300 NVL72 rack-scale systems, HGX B100, B200, and B300 NVL8 multi-GPU servers, and RTX PRO 6000 servers, and is designed to support future NVIDIA hardware generations. Rafay integrates natively with NVIDIA Base Command Manager (BCM) for bare metal provisioning and NVIDIA Unified Fabric Manager (UFM) for InfiniBand isolation, and powers reference deployments across the NVIDIA ecosystem.

Can the Rafay Platform run fully air-gapped with no external connectivity?

Yes. The Rafay Platform supports fully air-gapped deployment with zero required external connectivity, making it suitable for classified, regulated, and sovereign environments where outbound internet access is prohibited. Every external dependency is replaced by an on-premises equivalent: software updates via local mirror registries, license validation via a local license server, identity via an on-premises Keycloak provider, DNS via CoreDNS, time via a local NTP or PTP server, and certificates via on-premises PKI. After initial setup, no outbound connections are required for day-to-day platform operation, upgrades, or tenant management. This architecture enforces data residency and operational isolation at the infrastructure level rather than relying on policy controls alone.

Why are telcos positioned to lead Token Delivery Networks?

Telcos have the physical substrate that TDNs require: distributed networks, metro data centers, fiber, edge locations, enterprise relationships, and trusted positions with regulated customers. As inference workloads move closer to users and sovereign requirements tighten, telcos are naturally positioned to become AI service delivery platforms. The missing layer is the software to govern, meter, and monetize AI services from that distributed infrastructure — which Rafay provides.

What is a token hub?

A Token Hub is a storefront or marketplace where developers, enterprises, or applications can discover, access, and acquire token-based AI services.

A Token Hub can present available models, inference APIs, agents, or AI applications as consumable services. Through a Token Hub, operators can package AI services into SKUs, control access by tenant or customer, meter usage by token, and connect consumption to billing, chargeback, or monetization workflows.

What is token-metered AI service delivery?

Token-metered AI service delivery is a consumption model where AI services are exposed as APIs and every generated output token is measured, attributed, and priced — replacing the GPU-hour as the primary unit of commercial exchange. Instead of selling raw compute access, operators deliver governed model endpoints where usage is tracked at the token level for billing, tenant chargeback, quota enforcement, and cost attribution across business units or customers. This model benefits operators because it decouples revenue from raw GPU utilization: a well-governed token-metered service can be priced on value delivered rather than on compute consumed, supporting higher margins and more predictable revenue. It benefits tenants and end users because consumption is transparent and attributable, making AI spending auditable in the same way SaaS licensing is. Rafay's Token Factory platform provides the metering engine, API exposure layer, and billing integration that make token-metered delivery operationally feasible at scale.

What is programmable edge in inferencing?

A programmable edge is a distributed compute environment where workloads and inference endpoints can be deployed dynamically based on real-time signals such as latency, cost, capacity, power availability, user location, policy, and sovereignty requirements.

For AI inference, the programmable edge allows model endpoints to run closer to users, devices, applications, or data sources when proximity improves performance, compliance, or cost efficiency. In a Token Delivery Network, programmable edge environments can become locations where AI services are deployed and consumed through APIs.

What is Rafay Token Factory?

Rafay Token Factory is part of the Rafay Platform that converts GPU inference infrastructure into governed, token-metered AI services exposed through APIs.

With Rafay Token Factory, organizations can deploy API-accessible model endpoints, track token-level usage, enforce tenant isolation and RBAC, package AI services through SKUs and catalogs, integrate with billing or chargeback systems, and deploy AI services across distributed infrastructure.

Rafay Token Factory helps operators move from raw GPU resale toward AI service monetization by making inference consumable through standardized APIs and measurable through token-based usage.

Why do Token Delivery Networks matter?

Token Delivery Networks matter because AI inference is becoming more distributed, latency-sensitive, and consumption-driven.

As AI moves into applications, browsers, agents, robots, industrial systems, and enterprise workflows, organizations need a way to deliver model responses from the best available endpoint. A TDN gives operators a framework for deploying AI services across distributed infrastructure while maintaining governance, usage visibility, and token-based economics.

What is token-metered AI service delivery?

What role does Rafay play in a Token Delivery Network?

Rafay provides the operational layer that turns distributed GPU infrastructure into a governed Token Delivery Network. Specifically, Rafay deploys and manages model inference endpoints across geographically distributed sites, exposes those endpoints as token-metered API services, enforces access control and tenant isolation across the delivery network, applies routing policies based on proximity, capacity, and data sovereignty constraints, and collects per-token usage data for billing and chargeback. Without this operational layer, a distributed GPU fleet is a collection of hardware sites — Rafay is what makes it a coherent AI service delivery network. Operators using Rafay to build a TDN do not need to build their own routing, metering, policy enforcement, or billing infrastructure; those capabilities are built into the platform.

How is a Token Delivery Network different from a CDN?

A Content Delivery Network, or CDN, caches and delivers static or pre-generated content. A Token Delivery Network coordinates real-time AI inference, where tokens are generated dynamically by models running on GPU infrastructure.

The difference is that CDNs deliver content that already exists, while TDNs support AI responses that must be generated at request time. In a TDN, tokens become the unit of AI service consumption, and applications access models through API endpoints deployed across distributed infrastructure.

What is a Token Delivery Network?

A Token Delivery Network, or TDN, is a distributed AI service architecture that delivers model responses from the best available inference endpoint based on proximity, performance, policy, sovereignty, capacity, and cost.

A TDN helps applications consume AI services from the right location without requiring developers to manually choose or manage the underlying GPU infrastructure. Instead of treating inference as a centralized service, a Token Delivery Network enables AI responses to be served from distributed model endpoints across data centers, cloud regions, sovereign environments, or programmable edge locations.

What is serverless inferencing?

Serverless inference allows teams to deploy and run AI models without provisioning or managing underlying infrastructure. Instead of configuring clusters or managing GPUs, developers interact with simple APIs that scale automatically based on demand.

Rafay turns GPU infrastructure into on-demand inference services—eliminating operational friction and accelerating time to production.

‍

What is an AI inference platform?

An AI inference platform is a scalable environment for deploying and managing AI models in production. It handles request routing, GPU allocation, scaling, monitoring, and performance optimization. In enterprise environments, inference platforms are critical for supporting token factories that must generate tokens reliably and efficiently at scale.

What is an inference engine in AI?

An inference engine is the system that runs a trained AI model to generate predictions or text in real time. In large language models, the inference engine processes input tokens and produces output tokens. Its efficiency directly impacts response speed, scalability, and cost per token.

How does LLM token generation work?

LLM token generation works by tokenizing an input prompt, running it through a trained neural network, and predicting the next most probable token. This process repeats sequentially until the full response is produced. Each new token is influenced by the tokens that came before it, which allows models to generate coherent text.

What is a token in AI?

A token in AI is a unit of text that a language model processes. Instead of reading full words or sentences, AI models break text into smaller pieces called tokens, which can be whole words, parts of words, punctuation, or symbols. Large language models generate responses one token at a time, and token counts determine context limits, performance, and cost.

What is an AI token factory?

An AI Token Factory is the operating layer that transforms GPU infrastructure into governed, consumable AI services.

Instead of exposing raw GPUs or unmanaged clusters, organizations deliver production-ready model APIs that are:

Token-metered for transparent usage tracking
Multi-tenant with strict isolation and RBAC
Quota-controlled to prevent runaway spend
Governed by policy and compliance guardrails
Monetizable through usage-based billing

Serverless inference is how models are delivered. A Token Factory is how they are scaled, controlled, and turned into repeatable services.

Consider it a system designed to generate, process, and manage large volumes of AI model tokens at scale. It combines model serving, orchestration, and optimized inference infrastructure to efficiently convert compute resources into high-throughput token generation for production AI applications.

‍

Is Rafay suitable for development, testing, or proof-of-concept use cases?

Yes. Rafay is commonly used to deliver Kubernetes-as-a-Service for development, testing, and production workloads. While we don’t provide a generic sandbox, teams often evaluate Rafay through guided demos or scoped proof-of-concept discussions that reflect their intended environment, tooling, and governance requirements.

How can I see Rafay working in a real Kubernetes environment?

The best way to see Rafay in action is through a guided Kubernetes platform demo. These demos are based on real customer architectures and use cases — not disposable sandbox clusters. During the walkthrough, we demonstrate how teams use Rafay to provision, govern, and operate Kubernetes across cloud, on-prem, or GPU infrastructure.

What’s the difference between a Kubernetes sandbox and a guided demo?

A Kubernetes sandbox is typically a single, isolated cluster with limited configuration and no real governance, identity, or multi-tenant context. A guided Rafay demo shows how Kubernetes is delivered and operated in real-world environments — including self-service provisioning, access controls, policy enforcement, and multi-tenant usage. This provides a more realistic view of how teams evaluate and run Kubernetes at scale.

Do you offer a free Kubernetes sandbox or trial?

Rafay no longer offers a self-serve Kubernetes sandbox or free trial. While we previously offered a lightweight sandbox experience, it no longer reflected how customers use Rafay in real production environments. Today, we focus on guided demos and architecture-based walkthroughs that more accurately demonstrate the platform’s capabilities.

Who uses AI factories?

AI factories are used by enterprises, cloud service providers, and sovereign AI clouds that need to scale AI workloads efficiently, maximize GPU utilization, and deliver AI as a production service rather than isolated projects. You can see how Rafay worked with Canadian telecommunications provider Telus in this case study.

Is Rafay an AI factory?

Rafay is not a GPU manufacturer or model provider. Rafay provides an infrastructure orchestration and consumption platform that enables organizations to operate AI factories by turning AI infrastructure into a governed, self-service platform. Learn more about AI factories here: https://rafay.co/ai-and-cloud-native-blog/what-is-an-ai-factory

What role does Rafay play in AI factories?

Rafay provides the control plane for AI factories, handling orchestration, multi-tenancy, governance, and self-service access to AI infrastructure across cloud, on-prem, and sovereign environments.

How can I see the Kubernetes management offering of the Rafay Platform?

The best way to see the Rafay Platform's Kubernetes management offering is to schedule a conversation with us. We offer guided Kubernetes platform demos based on real customer environments — not generic sandboxes. We will tailor the walk through to your specific use cases and needs.

Why we don’t offer a generic Kubernetes sandbox.

Rafay is designed to operate Kubernetes the way it runs in the real world — with governance, multi-tenancy, identity, and production-grade infrastructure. A generic, self-serve sandbox wouldn’t reflect how the platform actually works or the value it delivers. Instead, we offer guided demos and architecture-based walkthroughs that show how teams use Rafay to deliver Kubernetes-as-a-Service in production environments.

How does Rafay ensure compliance and governance for enterprise AI initiatives?

Who can use self-service compute?

Self-service compute is designed for enterprises and service providers alike. It can benefit teams across various departments, from IT to development. Anyone needing flexible computing resources can leverage this solution.

Is self-service compute secure?

Yes, self-service compute platforms incorporate robust security measures. These include access controls, encryption, and compliance with industry standards. Your data and resources are protected throughout the process.

What are the benefits of self-service compute?

Self-service compute through the Rafay platform delivers four concrete benefits for enterprises and GPU cloud providers. First, it eliminates the infrastructure ticketing bottleneck: developers and data scientists can provision GPU environments in approximately 30 seconds rather than waiting days or weeks for manual provisioning. Second, it improves GPU utilization by reducing the idle time that accumulates when resources sit waiting for manual allocation — quotas and automated reclamation keep more of the fleet active. Third, it enforces governance automatically: every resource deployment is governed by pre-defined quotas, RBAC policies, and chargeback rules without requiring platform team involvement in each transaction. Fourth, it scales the platform team's leverage — a small infrastructure team can support hundreds of active tenants because the catalog model replaces per-request work with one-time SKU design.

How does self-service compute work?

Rafay's self-service compute works through a governed SKU catalog that platform operators define and tenants consume. A platform team uses PaaS Studio to design compute SKUs — specifying instance type, GPU model, storage options, networking configuration, and associated quota limits — and publishes them to the DevHub portal. Tenants browse the catalog, select the compute type they need, and deploy it; provisioning is automated end-to-end through Rafay's workflow engine, typically completing in approximately 30 seconds for cluster-level resources. Every deployment is governed automatically: quota checks prevent over-consumption, RBAC ensures tenants only see resources they are authorized to use, and chargeback data is collected per deployment for cost attribution. The platform team retains full visibility into utilization across all tenants without managing individual requests.

What is self-service compute?

Self-service compute is a model where developers, data scientists, and platform engineers provision GPU and CPU resources on demand through a portal or API — without filing tickets, waiting for infrastructure team intervention, or navigating manual approval workflows. In the Rafay platform, self-service compute is delivered through the DevHub portal, which presents a governed catalog of pre-approved compute SKUs — bare metal servers, GPU-accelerated VMs, managed Kubernetes clusters, and Slurm partitions — that tenants can deploy in approximately 30 seconds. Each resource request is automatically governed by the tenant's assigned quota, RBAC policies, and organizational chargeback rules, so self-service speed does not come at the cost of governance or cost control. Platform teams define what is available and at what limits; tenants consume within those bounds without requiring hand-holding.

How do I get started with AI Infrastructure?

Getting started with AI infrastructure on the Rafay Platform begins with a guided architecture review rather than a generic sign-up flow. Rafay's team works with prospective customers through a structured discovery process to understand GPU fleet composition, target workload types (training, inference, or both), multi-tenancy requirements, and integration needs with existing identity, billing, and storage systems. From there, Rafay proposes a deployment architecture and conducts a guided demo or proof-of-concept walkthrough using a live environment that reflects how the platform operates in production — not a simplified sandbox. For organizations ready to deploy, Rafay uses the Build-Validate-Operate-Transfer (BVOT) methodology: a structured program that installs and validates the platform, operates it under SLA while your team learns it, and formally transfers operational ownership with documented runbooks and certified personnel. To start the process, contact Rafay's sales team at rafay.co to schedule an architecture review.

Is AI Infrastructure scalable?

Yes, AI Infrastructure is designed to be scalable. Organizations can easily expand their resources to accommodate growing data and processing needs. This flexibility ensures that businesses can adapt to changing demands without disruption.

What is AI infrastructure management?

AI infrastructure management is the practice of turning GPUs, compute, AI platforms, and related resources into governed, self-service services that can be consumed, managed, and scaled efficiently. It combines infrastructure provisioning with governance, multi-tenancy, usage controls, and service delivery.

How does AI Infrastructure work?

AI infrastructure works by combining GPU compute, high-speed networking, distributed storage, and an orchestration layer that schedules workloads across those resources. In a production AI environment, GPU servers are connected via high-bandwidth, low-latency fabrics — typically NVIDIA InfiniBand or RoCE for GPU-to-GPU communication — and attached to high-throughput storage systems optimized for large model checkpoint and dataset I/O. The orchestration layer (Kubernetes, Slurm, or both) schedules training and inference workloads across the GPU fleet, manages resource allocation and queuing, and enforces multi-tenant isolation between teams or customers. Rafay adds the governance, self-service, and operations layer on top of this physical stack: automating cluster provisioning, enforcing quota and RBAC policies, collecting chargeback data, managing the software lifecycle, and exposing compute through developer-facing portals and APIs — so platform teams can operate a production AI cloud without building all of those capabilities from scratch.

How do I get started with GPU orchestration?

Getting started with GPU orchestration on the Rafay Platform involves a short discovery process to match the deployment model to your environment. Rafay supports GPU orchestration across on-premises bare metal, colocation, sovereign cloud, and hyperscaler environments — and the right starting point depends on your GPU fleet, intended workload types, and whether you are building for a single organization or for multi-tenant service delivery. The quickest path to a working environment is a guided demo or architecture walkthrough with Rafay's team, which covers live platform operation and is designed around your specific infrastructure rather than a generic product tour. For organizations with hardware already available, a proof-of-concept deployment can typically be stood up within days. Contact Rafay at rafay.co to schedule a discovery call and architecture review.