Operate Token Delivery Networks for Distributed AI Inference
Rafay enables providers to deploy and operate inference endpoints across distributed points of presence. The Rafay Platform delivers the operational workflows and controls that make it easy for providers to centrally deploy generative AI models across the network, manage endpoint lifecycle, meter token usage, etc.. The result is a highly performant Token Delivery Network.

What Is a Token Delivery Network?
A Token Delivery Network, or TDN, is a distributed inference network that brings AI model endpoints closer to users, applications, agents, physical AI systems, and other consumers of generative AI models.
With all applications beginning to leverage generative AI models to deliver improved user experiences, the need for model endpoints to be present closer to devices is driving many providers to begin investing in TDNs. Tokens can be allotted centrally but consumed across the network, resulting in the best of both worlds: Simplified governance with improved performance.
CDNs deliver static content
CDNs serve content from static edges that are located closer to user populations to improve the load time of media (images, videos, etc.) in browsers. Raw data transfer is tracked and the size of the data transferred serves as the usage meter for these interactions.

TDNs deliver curated insights
TDNs serve models from programmable edges that are located closer to user and machine populations to improve the speed with which humans and machines are able to make complex decisions with the help of a battery of specialized models. Tokens serve as the usage meter for these interactions.

Trusted by leading enterprises, neoclouds and service providers



























TDNs ♥️ Rafay
The Rafay Platform delivers a suite of capabilities – from edge cluster bringup and lifecycle management to multi-edge inference workload deployment across a dynamic set of programmable edges – that are required to power Token Delivery Networks. The Rafay Platform also tracks token usage at a granular level, enabling transparent monetization models that drive new revenue streams for providers.
With Rafay, providers can deploy, govern, meter, and operate a distributed set of inference endpoints across many locations.
Why Token Delivery Networks are the Next AI Infrastructure Wave
As GenAI becomes embedded within applications, agents, physical AI systems, and enterprise workflows, the accelerated computing infrastructure delivering the requisite genAI models needs to move closer to where the decisions are being made. Providers need a way to deploy, govern, meter, and operate inference endpoints across distributed locations — turning fragmented compute into a coordinated Token Delivery Network. This is where the Rafay Platform shines.
Applications Are Becoming Model-Reliant
AI is becoming embedded across applications, devices, agents, and workflows. As this trend continues, more digital interactions will involve applications calling AI models to deliver better user experiences, automate work, and power real-time intelligence.
Tokens become the meter for how those model interactions are measured, governed, and monetized.
Model Interaction Performance Depends on Proximity
As more applications interact with AI models, the quality of the user experience depends on how quickly and reliably those interactions happen.
TDNs are designed to make model interactions more responsive, resilient, and scalable by distributing inference capacity closer to where AI applications are used.
GPU Supply Is Non-Contiguous
AI compute is increasingly getting deployed wherever power is avaiable: in small metro data centers, telco edge locations, traditional carrier facilities, and sovereign sites.
Sub-1MW power sites are easier to secure across a geography than 100MW+campuses. For inference, that distributed footprint can become an advantage because compute will organically get placed closer to where AI is used.
Providers Have the Right Assets
Telcos, neoclouds, and Sovereign AI providers already have pieces of the required footprint: distributed locations, regional infrastructure, network access, power, and customer relationships. The missing layer is software to make those assets programmable for AI inference.
The Monetization Model Is Shifting
GPU hours are an infrastructure metric. Tokens are a service metric. Providers that move from raw GPU resale to governed, token-metered AI services can participate more directly in the economics of AI inference.
Not Everyone Should be Forced to Invest in Hyperscaler Infrastructure
Anthropic and OpenAI may be able to invest in global GPU infrastructure, but most model builders would prefer to partner with TDNs to deliver their models to the market.
How Rafay Enables Token Delivery Networks
Token Factory
Converts GPU inference infrastructure into governed, token-metered AI services exposed through APIs. The product bridge from TDN thought leadership to deployable capability.
Programmable Edge Orchestration
Deploys and manages inference endpoints across distributed data centers, sovereign regions, and edge-adjacent sites, making non-contiguous compute consumable as one coordinated platform.
Self-Service Portals and APIs
Lets developers and customers consume AI services and model endpoints without manual provisioning. OpenAI-compatible APIs reduce integration friction.
SKU and Service Catalog Management
Packages compute, model endpoints, agents, notebooks, and blueprints into catalog-based offerings operators can sell, tier, or white-label under their own brand.
Multi-Tenancy and Governance
Enforces isolation, RBAC, quotas, policy, and secure access across teams, tenants, customers, and regions without sacrificing shared infrastructure efficiency.
Usage Metering, Chargeback, and Billing APIs
Tracks token consumption, attributes cost and revenue, and feeds billing workflows — the commercial layer that makes token-metered AI services monetizable.
Rafay Token Factory
A key component of the Rafay Platform, Rafay Token Factory transforms GPU inference infrastructure into governed, token-metered AI services exposed through APIs.
Operators use Rafay Token Factory to define model endpoints, package them as API-accessible AI services, enforce tenant isolation and quota, track token consumption, and integrate usage data into billing and chargeback workflows.
→ API-exposed model endpoints with tenant isolation and RBAC
→ Token-level usage tracking for billing, chargeback, and attribution
→ SKU and catalog management for AI service packaging
→ Deployable across data centers, sovereign regions, and edge sites

Industry-Specific TDN Use Cases
Telcos: From Connectivity to Tokens
Many telcos across the globe own points of presence with sufficient power and connectivity in place to be transformed into programmable edges to address AI use cases. Rafay helps telcos turn these assets into inference focused compute hubs that collectively form a TDN.
Sovereign AI Clouds: Local AI Services, Governed
Sovereign AI clouds require model inference to remain within jurisdictional boundaries, with data residency, compliance, and tenant isolation at the core. Rafay delivers the operational layer for local AI services from in-country infrastructure — with policy controls, tenant isolation, usage visibility, and data residency baked in.
Neoclouds: Turn distributed GPU capacity into an inference network
For training, contiguous GPU clusters matter. For inference, distributed pockets of compute can become an advantage. Rafay helps neoclouds pool non-contiguous compute across regions, deploy model endpoints consistently, and operate a TDN from a central control plane.
Infrastructure-to-Consumption in Practice
The TDN use case is already supported with the Rafay Platform Token Factory offering. A number of Rafay customers and partners have operationalized the TDN concept across real-world sovereign, enterprise, and hyperscaler-adjacent deployments.
Telus AI Studio
Sovereign AI Cloud
TELUS AI Studio is a sovereign, developer-ready AI platform built on Rafay — with self-service GPU compute, curated model catalogs, reusable blueprints, governance, chargeback, and usage metering. The canonical proof that distributed AI infrastructure can become a governed, token-metered AI service platform.
Cisco AI PODs + Rafay
Infrastructure to AI Services
Cisco provides AI POD infrastructure. Rafay turns it into a self-service GPU cloud with SKU management, GPU slicing, quota enforcement, and an AI workload catalog — demonstrating the infrastructure-to-consumption model that underpins Token Delivery Network architecture.
NVIDIA GPU PaaS
Reference Architecture
Rafay's NVIDIA GPU PaaS reference architecture shows how cloud providers can deliver GPU and CPU resources, AI services, NVIDIA NIM endpoints, self-service portals, chargeback, policy enforcement, and white-labeled experiences as an integrated platform.
NVIDIA Telco AI Factories
Market Validation
NVIDIA's May 2026 technical publication on building token-metered AI services for telco AI factories validates the core TDN thesis: telco AI factory economics are shifting from GPU-hour consumption toward token-metered AI service delivery — precisely where Rafay and Token Factory operate.
Frequently Asked Questions
Clear, extractable answers for developers, operators, and the AI systems that index this page.
A Token Delivery Network, or TDN, is a distributed AI service architecture that delivers model responses from the best available inference endpoint based on proximity, performance, policy, sovereignty, capacity, and cost.
A TDN helps applications consume AI services from the right location without requiring developers to manually choose or manage the underlying GPU infrastructure. Instead of treating inference as a centralized service, a Token Delivery Network enables AI responses to be served from distributed model endpoints across data centers, cloud regions, sovereign environments, or programmable edge locations.
A Content Delivery Network, or CDN, caches and delivers static or pre-generated content. A Token Delivery Network coordinates real-time AI inference, where tokens are generated dynamically by models running on GPU infrastructure.
The difference is that CDNs deliver content that already exists, while TDNs support AI responses that must be generated at request time. In a TDN, tokens become the unit of AI service consumption, and applications access models through API endpoints deployed across distributed infrastructure.
Rafay provides the operational layer that turns distributed GPU infrastructure into a governed Token Delivery Network. Specifically, Rafay deploys and manages model inference endpoints across geographically distributed sites, exposes those endpoints as token-metered API services, enforces access control and tenant isolation across the delivery network, applies routing policies based on proximity, capacity, and data sovereignty constraints, and collects per-token usage data for billing and chargeback. Without this operational layer, a distributed GPU fleet is a collection of hardware sites — Rafay is what makes it a coherent AI service delivery network. Operators using Rafay to build a TDN do not need to build their own routing, metering, policy enforcement, or billing infrastructure; those capabilities are built into the platform.
Rafay Token Factory is part of the Rafay Platform that converts GPU inference infrastructure into governed, token-metered AI services exposed through APIs.
With Rafay Token Factory, organizations can deploy API-accessible model endpoints, track token-level usage, enforce tenant isolation and RBAC, package AI services through SKUs and catalogs, integrate with billing or chargeback systems, and deploy AI services across distributed infrastructure.
Rafay Token Factory helps operators move from raw GPU resale toward AI service monetization by making inference consumable through standardized APIs and measurable through token-based usage.
A programmable edge is a distributed compute environment where workloads and inference endpoints can be deployed dynamically based on real-time signals such as latency, cost, capacity, power availability, user location, policy, and sovereignty requirements.
For AI inference, the programmable edge allows model endpoints to run closer to users, devices, applications, or data sources when proximity improves performance, compliance, or cost efficiency. In a Token Delivery Network, programmable edge environments can become locations where AI services are deployed and consumed through APIs.
A Token Hub is a storefront or marketplace where developers, enterprises, or applications can discover, access, and acquire token-based AI services.
A Token Hub can present available models, inference APIs, agents, or AI applications as consumable services. Through a Token Hub, operators can package AI services into SKUs, control access by tenant or customer, meter usage by token, and connect consumption to billing, chargeback, or monetization workflows.
Telcos have the physical substrate that TDNs require: distributed networks, metro data centers, fiber, edge locations, enterprise relationships, and trusted positions with regulated customers. As inference workloads move closer to users and sovereign requirements tighten, telcos are naturally positioned to become AI service delivery platforms. The missing layer is the software to govern, meter, and monetize AI services from that distributed infrastructure — which Rafay provides.
AI Token Factory is a platform capability that helps enable a Token Delivery Network.
A Token Delivery Network describes the distributed architecture for delivering AI inference from the best available endpoint. Rafay Token Factory provides capabilities for exposing model APIs, metering token usage, enforcing tenant controls, and supporting monetization across distributed GPU infrastructure.
Rafay helps operators transform GPU infrastructure into self-service AI platforms with governance, multi-tenancy, metering, catalogs, API access, and monetization workflows.
With Rafay, operators can package compute and AI services into SKUs, deploy model endpoints, expose self-service APIs, enforce RBAC and policy controls, monitor usage, and support token-metered consumption. This helps organizations move from managing raw GPU infrastructure to delivering AI services that developers and customers can consume directly.
A Token Delivery Network supports AI monetization by turning model access into a measurable service.
Instead of selling only GPU hours or infrastructure access, operators can expose AI models, agents, and applications through APIs. Usage can be measured at the token level, packaged into SKUs, assigned to tenants or customers, and connected to billing, chargeback, or consumption-based pricing models.
A Token Delivery Network is not limited to edge AI. A TDN can span centralized data centers, cloud regions, sovereign data centers, enterprise private clouds, neocloud GPU environments, and programmable edge locations.
The core idea is not that every inference request must run at the edge. The core idea is that inference should run from the best available endpoint based on proximity, performance, policy, sovereignty, capacity, and cost.
Token Delivery Networks are relevant for telcos, neoclouds, sovereign cloud providers, enterprises, and platform operators that need to deliver AI inference across distributed GPU infrastructure.
These organizations may need to support low-latency inference, regional or sovereign AI services, internal AI platforms, model-as-a-service offerings, AI marketplaces, or token-metered API consumption. A TDN helps them turn distributed GPU capacity into consumable AI services.
Build Your Programmable Edge for Token Delivery Networks
See how Rafay's Token Factory and programmable edge platform can turn your GPU infrastructure into governed, token-metered AI services.








