Platform Engineering: what is it and why do you need it?
As a global DevOps community, we have pretty much nailed the agile process, but one thing we didn’t plan for was the Cognitive Load of the various software involved for teams. Enter the Platform Engineer. A master of DevOps who understands exactly how to leverage a best-in-class software stack for optimal outcomes. They are the MacGyvers of DevOps. And simply put, you need one. In this article, we cover everything from the basics of being a Platform Engineer to the benefits of hiring one for your team.
Before diving deep into Platform engineering, let's talk about the history of DevOps in a few sentences.
DevOps, as we all know it already, promised us agility.
The ability to deliver high-quality software autonomously, not depend on a human factor while delivering software, and ensure everything stays secured and compliant.
We did a lot to accomplish DevOps and become agile.
Monoliths are now broken into tiny pieces that interplay (Microservices, Microfrontends, Mono-Repo, Multi-Repo).
We leverage IaC to interact with Cloud Resources to enjoy the benefits of git, and third-party software of all kinds (SaaS, OSS, Cloud Services, etc.) is leveraged to write less code and focus on business logic.
Production & Staging only two types of environments today; we have dozens of them (QA, Security, Pre-Prod, Dev-Env, Single/Multi-Tenant) to shift-left everything away from production as much as possible.
So much work has been put in over the past years to reach what we know today as agile.
While we did amazing things, we got to a point where it got very sophisticated. Don't get me wrong; these transformations are crucial to continued innovation.
But, when we think about the developers who need to interact with these new tools, technologies, methodologies & processes in their everyday lives, it's easy to see the problem - Cognitive Load.
Platform Engineers to the rescue → In this article, we will go through the role of a platform engineer, how they come into play within the DevOps cycle, the benefits of Platform engineers, and more.
To demonstrate the cognitive load created on the developers, let's think of a few use cases developers might confront in their everyday job.
When a developer needs to implement a new feature, they need to know a few things beforehand – which microservice APIs do they need to interact with? Who owns those microservices? How can they add a new cloud resource with Terraform or Pulumi, how would they roll out a new version, and what environments are involved in the process?
In a regular organization with DevOps innovation and advancement but no Platform team, the process of finding this information can be cumbersome.
The Internal Developer Platform (IDP) was introduced to our lives to reduce cognitive load. The IDP is maintained & managed by the Platform team and is being used by the developers within the company.
IDP is a self-service layer between the developers and the underlying infrastructure, technologies, tools & processes. The platform engineer is responsible for ensuring this layer puts the developers on the Golden Path to get what they want.
Platform engineers must master DevOps while deeply understanding the developer's needs to make sure interacting with the IDP feels natural to the developers.
Platform Engineers in the DevOps Cycle
In the diagram below, we can see a layered representation of DevOps within every modern company.
As "You build it, you own it" is a common practice adopted by many tech companies and part of every developer's philosophy, developers need high familiarity with the different DevOps layers to bring software from zero (Code) to hero (Production).
Let's explore the different use cases where developers "Interact with DevOps."
During development, developers interact with many moving parts besides the IDE.
A typical developer will be highly familiar with the following aspects of software:
Cloud resources & IaC (Add, Modify, Delete)
Microservices (Ownership, Structure, Metadata)
3rd party SaaS (Infra, DevTools, Observability, Process Management)
A developer can add a token (of a 3rd party SaaS) to a Microservice.
This simple use case touches several components within the DevOps landscape; providing a one-click experience that abstracts the DevOps to the developer requires high DevOps skills.
Ok, the code is ready to ship. Now what?
DevOps worked hard to ensure every line of code that reaches production is bulletproof. They did it by building a robust CI/CD pipeline to provide a high-quality supply chain of code delivery.
Platform engineers must ensure every developer has an easy way to initiate the code delivery process, troubleshoot the pipeline if something goes wrong, and operate the process independently.
From merging a PR to the main branch, through the build process, unit tests, on-demand DevEnvs, Regression testing, Canary, and Feature flags, up to 100% production.
As the software delivery process becomes, well, a jungle, Platform engineering teams will need to provide developers to control the delivery pipeline easily while making sure they do not step on a mine.
Service Maturity & Quality & Security
Delivering high-quality software today is not an easy task. A developer must take care of many aspects, from tests (of all kinds) to security, misconfigurations, compliance aspects, operational aspects, and much more.
Each of these "Maturity aspects" relies on dozens or even hundreds of different tools & technologies.
Some small examples are using Snyk to scan OSS packages, python Linter to validate code syntax, regression tests, secrets detection within the code, Jira ticket number as part of the PR title, end-to-end tests, etc.
Platform teams should embed the Maturity Readiness model of services in a consolidated way, thus reducing the fragmentation of test results and giving the developer a Score for their service maturity, including the breakdown of the Score result.
Code is not the only thing developers do in their everyday job.
On-Call is a common practice in many companies. As On-Call, you need to be highly familiar with many aspects of the software developed and maintained.
Services dependencies, troubleshooting services not within your responsibility, understand the underlying infrastructure to identify issues not caused by the application, and master all the different DevTools available to you to troubleshoot elegantly (Observability tools, Production Debugging, Exception Management tools, Incident Response)
As software got fragmented, troubleshooting tools also got fragmented.
Platform teams need to provide developers with the right, unified, DevPortal that can help them troubleshoot intelligently and focus on the issue itself, rather than wondering in the cumbersome forest of software, thus reducing the cognitive load around the DevOnCall duty.
Why are Platform Engineers Important?
Speed up new developer onboarding
Platform engineers are looking to provide a natural and intuitive self-service experience for developers. This aim is the same whether the dev is a company veteran or a new starter.
The focus on providing a visual representation of the entire development lifecycle means that onboarding is a faster, easier, and smoother process. There’s dramatically less time before the new dev can confidently perform their “first commit” to production.
Through the developer portal, platform teams are looking to make the process of managing the software development life cycle as smooth and easy as possible. Enhancing DevEx and reducing friction is critical to attracting and retaining software engineering talent.
Often, a key focus for the platform team is the creation of a central hub for developers to pool their knowledge. By gathering together and sharing source code, services, APIs, and other existing assets for reuse, developers can create a collaborative culture to learn more efficiently. And ultimately, be more productive.
Resolve incidents faster
Platform engineering teams provide continual visibility over services and their owners. This efficient visualization allows SREs, operations, and product teams to see each service’s digital footprint. Connecting the relevant teams and individuals faster reduces incident resolution times and with the right integrations allows engineering teams to take end-to-end ownership.
Drive adaptive governance
There’s always been a clash between the developer’s desire for autonomy and agility, and the business necessity for governance and control. The sweet spot in the middle is an adaptive governance model. The platform engineering team can help achieve this approach by codifying the necessary security, cost, and compliance policies for cloud infrastructure management.
Platform Engineers vs. Site Reliability Engineers
If we look closely, we see that they're pretty different. Platform engineering is about building a platform to support software development through its entire lifecycle, making the experience seamless for developers. But DevPortal (like Backstage) also includes operational capabilities (metric systems, runbook management systems, alerting systems, etc.), so they also serve SREs.
The Development team writes the business logic for the software. At the same time, the SRE adds on operational automation leveraging the primitives exposed by the platform team, improving precisely what metrics are collected, how they are alerted on, what actions are automatically taken when an alert is triggered, and so forth.
In the end, a platform engineering team and SREs are categorically different – the former is a layer of the stack while the latter is a role. Yet both are necessary for success in the software modernization journey. Indeed, it's both teams collaboratively working together that enables success in your modernization journey.
Platform engineers are a cardinal building block for every innovative company. So in a way, platform engineers are an enabler for DevOps innovation; they allow the business to be cutting edge while not compromising on the initial promise of DevOps, if you recall – agility.