Quickly launch and easily manage production-grade Kubernetes clusters for AI and machine learning applications at scale with Rafay
SUNNYVALE, Calif., March 17, 2022 — Rafay Systems, the leading platform provider for Kubernetes Operations, announced the expansion of the industry’s only turnkey solution for operating Kubernetes clusters with GPU support at scale by adding powerful new metrics and dashboards for deeper visibility into GPU health and performance.
The Rafay Kubernetes Operations Platform (KOP) now features a fully integrated GPU Resource Dashboard that visualizes critical GPU metrics so developers and operations teams can seamlessly monitor, operate, and improve performance for GPU-based container workloads – all from one unified platform.
Kubernetes has rapidly become the preferred orchestration layer for enterprises that need the ability to provision and operate GPU-enabled, AI and machine learning applications in the cloud and at edge/remote locations.
According to 2022 Gartner® Emerging Technologies: Edge Technologies Offer Strong Area of Opportunity — Adopter Survey Findings*, “The primary objectives for respondent organizations investing in and adopting edge technologies are to improve employees productivity (41%) and automate business processes (39%). This aligns with existing Gartner research (see Emerging Technologies: Use-Case Patterns in Edge AI) that edge AI is being used to improve business processes, delivering automation and productivity gains that translate into measurable ROI, such as cost savings.”*
However, as enterprises rapidly increase the number of AI and machine learning workloads, addressing several challenges such as visibility and monitoring helps prevent significant delays in application deployment and wasted costs associated with idle or underperforming GPUs in the clusters.
For example, a factory that increasingly relies upon real-time video detection applications powered by AI needs a standardized approach for cross-functional teams to manage the IT infrastructure and applications. The following challenges often result in operational fragility and lack of repeatability that hinders productivity:
- Flawed or overly restrictive access and visibility for developers and operational personnel that need GPU metrics on demand to tune and optimize GPU workloads.
- The struggle of hiring or training a team of experts and spending months to develop, operate and maintain a customized monitoring infrastructure to scrape and centrally aggregate GPU metrics.
- The complexity of developing and maintaining an integration with corporate single sign-on (SSO) systems to provide role-based access to metrics and dashboards.
- Accounting for the organizations’ GPU-enabled workloads that are developed and maintained by external entities (e.g., partners and ISVs). These entities also need visibility to GPU metrics to ensure the workloads are performing optimally.
Rafay KOP solves these challenges by providing enterprises and trusted external entities with a zero touch experience for automated and centralized aggregation of critical operational metrics for GPUs for the entire fleet of Kubernetes clusters. Rafay’s Zero-Trust Access Service with SSO integration enables seamless role-based access to ensure only authorized developers, external partners and operational personnel can gain secure access and visibility into GPU metrics from the console.
View this video to see the Rafay KOP with the new GPU Resources Dashboard in action:
The new GPU Resource Dashboard that streamlines the orchestration of GPU-based container workloads has been fully integrated into the Rafay KOP and teams can take advantage of many additional benefits of the SaaS platform today including:
- AI/ML Application Deployment Automation: Rafay KOP allows organizations to avoid spending months or years developing a custom platform just to provision and manage GPU-enabled Kubernetes clusters for bare metal, virtualized and cloud environments.
- AI/ML Cluster and Workload Standardization and Consistency: Rafay KOP’s Cluster Blueprints standardize and govern clusters and workload configurations across a fleet. Enterprises can detect, be notified, and/or block configuration changes to Kubernetes clusters.
Unleash the power of AI and machine learning applications at the edge with Rafay KOP: https://rafay.co/start/
*Gartner, “Emerging Technologies: Edge Technologies Offer Strong Area of Opportunity — Adopter Survey Findings,” by Danielle Casey, Vibha Chitkara, 1 March 2022.
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
About Rafay Systems
Rafay Systems offers the industry’s first Kubernetes Operations Platform to help enterprises maximize the value of containerized applications that deliver today’s business innovation. With Rafay’s unified platform teams can operate modern application infrastructure at scale across public clouds, data centers, and the Edge. A full suite of turnkey services helps streamline deploying apps across multiple environments and deliver enterprise-grade control and governance to application deployment workflows. This breakthrough approach brings a new and much-needed operations mindset to the increasingly outdated Kubernetes Management market. With the Rafay Kubernetes Operations Platform, platform teams enjoy centralized visibility, management and automation across once disparate processes and systems, resulting in the improved delivery of modern applications. Rafay’s growing customer roster includes clients such as Verizon, SonicWall and Guardant Health. For more information, please visit www.rafay.co.