The Kubernetes Current Blog

GPU-enabled AI/ML at the Edge using Kubernetes

Over the last few years, we have seen an explosion in the number of Internet-connected smart devices both on the home front as well in businesses. For example, it is very common to see Internet-connected, video surveillance cameras in retail stores. Factories and industrial complexes have standardized on video-based defect detection and classification for quality control.

The typical architecture that has been used so far is to stream all this data to a central place in the cloud and then process it using machine learning algorithms to drive recommendations, decisions, and outcomes. Unfortunately, this approach is not very practical at scale because of three fundamental issues:

Issue 1: Congestion

Backhaul networks were not designed to handle this kind of throughput. As a result, these smart devices constantly struggle with congestion-related issues resulting in degradation in quality of service and a poor user experience.

Issue 2: Latency and Delays

If the network distance between the source and destination is high, the quality of service can be significantly degraded. For example, anomaly detection requires a 5-10 second delay.

Issue 3: High Operating Costs

The central service has to deal with constant resource increases in order to scale and handle the ever increasing number of source devices. At scale, this can potentially turn out to be a very expensive proposition for the provider. For example, does it always make sense to aggregate a large number of high bandwidth video streams to put it through ML algorithms powered by a bank of GPUs?

Edge ML to the Rescue

A very sensible and practical architectural alternative emerged a few years back that addresses all of these issues. With this approach, a good portion of the ML-based business logic is shifted from the central location closer to the source. This architectural shift is possible primarily because of the portability of containers (i.e., can run anywhere) and the ability to orchestrate containers using Kubernetes clusters operating at the edge.

By performing machine learning closer to the device, it allows for high throughput data to be processed locally. This approach will dramatically speed up the performance and user experience because a latency-heavy, round trip is completely eliminated and the business risk due to congestion is avoided completely.

Instead of having to invest in a large bank of expensive, high-end GPUs at a central location, organizations can utilize lower-end, significantly cheaper GPUs at the edge for machine learning tasks.

Note that the edge devices can still send consolidated data to the cloud. But, the bulk of the data will be processed locally with the ability to respond in real-time. This approach is a game-changer for these services allowing them to deliver transformative and differentiated use cases for customers.

These next generation, Edge ML-based applications operate in the field. For example, in hospitals monitoring heart rate, glucose levels, and in assisted living facilities monitoring falls using cameras and motion sensors. These applications can literally save lives by analyzing the data at the edge and making real time decisions.

Key Requirements

Adopting the public cloud required organizations to adopt and implement very different processes and tools. In a similar manner, implementing a “distributed, ML at the edge” based application deployment architecture will require teams to factor in a number of new requirements.

So, what will it take to get your application to adopt/use this “edge ML” powered architecture?

For an environment like this, at steady state, you will be operating 100s or 1000s of remote Kubernetes clusters in isolated security domains (i.e., operating behind firewalls). Let us look at some of the foundational requirements that you need to cater for.

# Requirement
1. Automated and Remote Cluster Lifecycle Management

You need the means to remotely provision and manage the lifecycle of a fleet of Kubernetes clusters (e.g., scale, repair, upgrade) without requiring privileged access to these secure environments.

2. Environments and Separation of Duties

You need the means to group your clusters into logically isolated environments with role based access control for administrator access.

3. Standardization and Consistency

You need the means to keep the fleet of Kubernetes clusters standardized and consistent. This means that the entire fleet of clusters needs to have the correct and required versions of software add-ons deployed.

For GPU enabled edge deployments, it is critical to deploy and use the correct version of software add-ons such as Nvidia’s GPU operator for Kubernetes.

4. Zero-Trust Access to Remote Clusters

Your administrators need the means to be able to securely access the remote Kubernetes clusters to debug and diagnose issues without requiring any form of inbound access to their network.

You will need an immutable audit trail of all kubectl activity with authorized users to demonstrate compliance.

5. Multi Cluster App Deployments and Operations

Your application team needs the means to be able to deploy and operate their containerized applications on the fleet of clusters without requiring inbound access via Kubectl.

6. Centralized Policy Management

Your security team will need the means to implement and enforce security policies across the entire fleet of Kubernetes clusters. This is a critical requirement because these clusters are operating in a remote, hostile environment.

7. Visibility and Metrics

Your operations team needs critical operating metrics data from the remote clusters to be aggregated centrally so that they can remotely monitor the health of the clusters. In addition to cluster metrics, for ML based clusters powered by GPUs, they need GPU metrics to also be aggregated centrally. All this data needs to be accessible to them via easy to use dashboards.

As you can see, there is a lot to consider, develop, operate and maintain for edge ML powered architecture. To help, a number of organizations use a Kubernetes Operations Platform to fast track implementation of this transformative architecture for their applications.

This article was originally published in JAXenter.

Trusted by leading companies