The Kubernetes Current Blog

How Platform Teams Can Return Autonomy to Developers

DevOps was created to give developers the autonomy they needed in order to iterate quickly and create great software. But it hasn’t scaled to meet the demands of today’s businesses.

Combining development and operations together into many small, agile teams led to transformational rates of iteration that turbocharged business innovation. But it also led to duplication of skills across many teams. Increasing unintended effects have led to wasted cycles and cognitive load that actually delayed development processes rather than accelerating them.

Nowhere was this more evident than in the management of infrastructure. In order to get their projects off the ground, DevOps teams needed to acquire their own cloud infrastructure, manage it, secure it, debug it, fix problems with it, and constantly justify its very existence when they really should have been designing, coding, testing, iterating, and deploying great software.

Today, it’s not just developers who depend on quickly provisioning and efficiently managing cloud infrastructure – it’s also data scientists testing models, researchers and engineers running simulations, and more. The overhead associated with the ongoing operation of their cloud infrastructure robs them of the autonomy they need in order to use it to do innovative things. This, in turn, blocks them from moving the ball forward for the business.

Platform teams are meant to centralize infrastructure management practices in order to bring that autonomy back – it’s why many leading IT organizations are creating them. In a recent report, “Top Strategic Technology Trends for 2024: Platform Engineering,” Gartner said that platform teams exist to build “curated self-service platforms that improve developer experience, delivery speed, process agility and business value.” With the right tools, platform engineers can build “paved roads” that enable anyone who depends on rapid access to cloud infrastructure “as a service” to move faster safely.

Let’s explore how that looks in practice.

Cognitive Overload, Explained

Every team is different, but surveys indicate that developers and other cloud users can spend anywhere between 20% and 60% of their time on managing their infrastructure. This is time wasted learning and managing infrastructure operations rather than executing their core focus areas, whatever they may be.

Siloed teams tend to create siloed infrastructure, each with their own configurations. This wastes resources (aka $$$) because utilization rates are low. They also have to contend with version control, compliance, or security issues that can impact performance and availability. And since development / test infrastructure rarely matches production environments, the differences between the two often cause long delays moving anything from the former to the latter.

Organizations sometimes try to reduce or prevent these issues by centralizing cloud infrastructure management to a single operations team. But if provisioning and management processes aren’t created with self-service workflows for developers in mind, developers might get frustrated as they send tickets and wait for overworked ops engineers to wade through long queues of requests. This visual illustrates the pain that can be introduced by these kinds of processes.

Steps needed to provision developer infrastructure via a classical approach

The Way Forward

Everyone is clearly better off when the teams that need rapid access to dependable, proven, hardened cloud infrastructure can get it – for all their development, testing, or production needs. Here is what platform teams need to build in order to get there:

  • Turnkey self-service workflows with built-in automation and governance.
    Developers should get one-click access to the cloud infrastructure they need, when they need it, for whatever use case drives that need. The workflows must automate any governance checks prior to provisioning, so the business is assured that the infrastructure is safe, up to date, and within budget.
  • Rapid Kubernetes provisioning capabilities with cluster- or namespace-as a service.
    Every modernizing IT organization is leaning into containers, and Gartner predicts that 90% of ALL global organizations will be running containerized applications in production by 2027. And since Kubernetes is rapidly becoming the default choice for container deployments, making K8s infrastructure available through self-service processes is critical.
  • Rapid provisioning of environments, with guaranteed safety.
    Cloud infrastructure isn’t just containers. Developers and other cloud users often need other services – databases, cloud storage, cache services, and more – to be connected to their containers in order to create viable landing zones for their applications. These as-a-service provisioning workflows must also be templatized and automated.
  • Tools to monitor and troubleshoot their applications’ performance.
    Once infrastructure is deployed, it must be monitored. Platform teams should provide optimized tools for this purpose that developers want to use, so that they don’t feel the need to use their own decentralized solutions. Engineers tasked with operational duties also benefit from centralized monitoring for all cloud infrastructure in use.
  • Tools to estimate cost of requested infrastructure, before it is provisioned and over time.
    Don’t lose sight of costs! Budgets aren’t getting any bigger, and no team should be wondering how their cloud expenses are tracking against their budget. Providing greater transparency over where financial resources are going to developers, operations, and executives gives everyone confidence that those resources are being spent intelligently.

Building these capabilities isn’t easy – in fact, it takes a LOT of work. But while every emerging platform team needs to deliver these capabilities to their constituents, none of them should do it alone, or starting from scratch. Rafay has built a Cloud Automation Platform specifically for platform teams, so they can build these capabilities out for their organizations in order to give developers and other cloud users the autonomy they need to move faster safely.

How Businesses Benefit

“We were able to fast-track our drug discovery … we were able to discover new drugs and get it to the market in a much faster pace.”

Ra Singh, Head of Cloud & DataBase DevOps @ Regeneron Pharmaceuticals

When developers, data scientists, researchers, and other key cloud users are provided with the autonomy to use the cloud infrastructure they need, when they need it, a virtuous cycle is created that leads to significant downstream benefits for the business. No longer saddled with learning new technologies, managing the health of their compute resources, or waiting for beleaguered ops teams to provision their infrastructure, cloud users are free to focus on the projects at hand and iterate quickly, with the confidence that the foundation of their work is built on solid, proven, and continuously maintained technologies.

When these technologies can be accessed at will through self-service processes, the iteration rate naturally increases. This, in turn, leads to greater levels of experimentation, managed failure risk, greater business agility, and fresh innovations.

Automation and pre-approved templates can dramatically reduce wait times

Through its Cloud Automation Platform, Rafay has helped customers like Verizon and Genentech realize the benefits of greater autonomy, along with the control and efficiency that platform and operations teams need in order to deliver that autonomy safely. Companies have used Rafay to build self-service workflows that have increased deployment rates by a factor of 4, while simultaneously reducing compute use to keep costs down.

If you’re interested in learning more about how Rafay Systems can help your organization provide this kind of autonomy to the teams in your organization that depend on cloud infrastructure the most, please reach out!

Author

Trusted by leading companies