Platform engineering is becoming a core focus, tasked with ensuring a good developer experience to grow productivity and retention. To do this, internal developer portals have become a basic requirement. This article covers the whys and how's of internal developer portals and how they affect developer experience and productivity.
1. Introduction: platform engineering
Modern application development means greater velocity but also complexity, with microservices, containers and different DevOps tools and cloud providers. Today we are at a point where even DevOps leaders are challenged in managing engineering operations.
“We have GitHub, Jenkins, a Jfrog Artifactory and DataDog as the base tools, a variety of build and security tools such as blackduck or snyk, as well as multiple cloud vendors and many more. You can go through our corridors with a shopping cart and load DevOps tools. Developers can’t deal with all these tools, no one can” (Shlomi Benita, CyberArk).
This has led to an increased focus on the developer experience and platform engineering.
According to Gartner, “platform engineering implements reusable tools and self-service capabilities with automated infrastructure operations, improving the developer experience and productivity”. It utilizes reusable configurable application components and services and its benefit to users is in standardized tools, components and automated processes.
This is part of a realization that a poor developer experience and a developer-DevOps interaction that is based on “ticket ops” isn’t sustainable. At some point, the price of less ability to deploy code and poorer productivity, as well as developer frustration, becomes too high.
“One of the greatest selling points we used to get the formal management OK to work on an internal developer platform was plotting lead time to change on a graph over time. We showed what lead time to change would be without the IDP and what it would be with an IDP. Over time, we presented our estimation that lead time to change would grow worse without an IDP. This convinced management” (Shlomi Benita, CyberArk).
Or, as stated by Paul Delory, VP Analyst at Gartner: “Platform engineering emerged in response to the increasing complexity of modern software architectures. Today, non-expert end users are often asked to operate an assembly of complicated arcane services. To help end users, and reduce friction for the valuable work they do, forward-thinking companies have begun to build operating platforms that sit between the end user and the backing services on which they rely”.
2. Internal developer portals as drivers of platform engineering
The first step towards a platform engineering approach is the Internal Developer Portal. As Gartner explains: “IDPs provide a curated set of tools, capabilities and processes. They are selected by subject matter experts and packaged for easy consumption by development teams. The goal is a frictionless, self-service developer experience that offers the right capabilities to enable developers and others to produce valuable software with as little overhead as possible. The platform should increase developer productivity, along with reducing the cognitive load. The platform should include everything development teams need and present it in whatever manner fits best with the team’s preferred workflow”.
Another thing going for platform engineering is that some developers also lack the basic knowledge required to thrive in a “you build it you own it” world. This is the case, for instance, when there is a move from monolith or on-premises products, which isn’t easy for developers. In this case internal developer portals allow developers to consume DevOps resources in a self-serve mode, with a product-like experience. It also provides them with the guardrails they need.
IDPs also greatly reduce the day-to-day load on both DevOps and developer teams, by offering a layer of abstraction that can reduce developer cognitive load. When everything is abstracted away, all developers need to care about is coding, git and using the self-service functions in the IDP (which would then interface with whatever tools DevOps are using, from K8S to git, terraform, Jenkins etc).
3. What is an internal developer portal, and who needs it?
According to Puppet’s state of DevOps report, internal developer platforms are one of the three things that set mature engineering organizations apart (the other two are integrated security and automated change management).
What are mature engineering organizations? Depending on the organization you’re in, that may be a thorny question. Instead, let’s consider engineering team size instead and see how the size of the organization affects the developer experience, developer productivity and the ability for developers and DevOps to work closely together. Consider the following diagram which charts engineering org size and the need to use developer self-service tools:
The larger the organization, the more likely it is to require an internal developer portal (although maturity doesn’t necessarily equal size…):
- With 1-30 employees, everything works. If the developers know ops, they can do ops too, and if they don’t, DevOps can easily serve them. In any case, most requests can be answered by the watercooler and access to tribal knowledge is a desktop away.
- Organizations that are up to 100 employees can usually make do with GitOps, the direct use of devtools and more, as well as a healthy dose of ticketops. It isn’t optimal but it can still work. You can read about this here.
- At some point, organizations use CI tools, such as Jenkins, to allow developers self-service, but these solutions tend to break when used extensively.
- Above 1500 developers? Any solutions but an internal developer platform won’t work, whether built or bought.
Compliance is also a big driver of complexity, since SOC2 for example, requires proper permission management for infrastructure. This may result in even more tickets.
But the reasons for an internal developer portal go beyond sheer organization size. There are qualitative and quantitative benefits to an internal developer portal.
“To succeed you need an IDP. We’re a large company. Not all developers “know” cloud. If we tell them “here’s the cloud, with all its configurations, work with the cloud’s SDK, use these APIs” they would not know what to do. So it’s not just education, it’s also about having a platform and it’s the guardrails in it (Guy Brodny, CyberArk).
4. The five principles for a good platform engineering approach
Internal developer portals should provide developers the ability to self-service their infrastructure needs, through the consumption of DevOps assets. They should also be able to scaffold, deploy, browse, operate and access all the services that should be available to them.
Here are five principles to follow:
#1 Product-like and decoupled
Within the IDP developers should access a product-like experience, that is UI based and simple to consume. Forms should be clear and simple to use. The tools within the IDP should be decoupled from the infrastructure so that infrastructure can be changed without changing the developer experience.
#2 Compliant and secure by design
Working with platform engineering tools should ensure and support compliance, testing, quality and security. This includes role-based access control as well as the ability to add manual approvals when necessary.
Developers should access a central place which contains docs, tools, standards, templates, infrastructure and cloud resources. Views should reflect a live state of different DevOps assets managed within the organization and customizable by team/dev. All developers should have access to the system, based on their role, since it contains the DevOps state of the engineering world.
#4 Allow self service in the broadest way possible
Self-service should go beyond microservice scaffolding and provide for anything developers need to do: provision, terminate and perform day 2 operations on any asset exposed (microservice or not) in the software catalog, within the policies and guardrails you’ve set.
Machines should have IDP interfaces similar to those presented to humans, to trigger DevOps flows and gain access to the relevant software catalog data for DevOps automations and pipelines.
5. What makes an internal developer portal?
Internal developer portals vary across engineering organizations - depending on the areas where developers needed abstraction, the code base, the demands from DevOps teams, the developer backgrounds (are they accustomed to cloud-based microservices, or were they working on on-premises monoliths), as well as the engineering culture and processes, and the tools in use. Yet they have many common denominators.
An initial definition is a platform that offers an abstraction layer (commonly called a software catalog) and an ability for developers to perform self-service actions against the assets represented in the catalog.
The software catalog is there to provide simple answers to complex DevOps-related questions, questions that usually require developers to wander around dozens of different tools and require deep tribal knowledge.
Self-service is there to reduce the burden for DevOps and developer teams. The internal developer platform makes tools, services, and knowledge available to all, freeing them to code. The IDP should have role based access control to provide proper access to data according to the engineer’s role.
Here are the three main parts:
5.1 The software catalog
The software catalog should be much more than a microservice catalog, since the complexity resides in the fact that there are multiple entities within the infrastructure.
The software catalog is a visibility layer to the infrastructure and the software deployed over it. An ideal software catalog should show the entire ecosystem surrounding the SDLC: CI/CD flows, dev environments, pipelines, deployments and anything cloud.
The main question that the software catalog needs to answer is “what is deployed where” - and the structure of the catalog depends on what is needed to answer that question, which varies across organizations and can even change for a given organization over time.
The software catalog should help engineering teams quickly answer the following questions:
- What’s the current running version in production for a given service?
- Who owns this microservice, and which API routes does it expose?
- Which Kubernetes clusters exist in which cloud environment?
- Why did this deploy fail?
- Who is on-call?
- Is this version production-ready?
- DORA metrics for a given team, service or developer
Since answering these questions is also a function of “what is deployed where” and the relative complexity of the “what” and the “where”, there is no ideal structure for a software catalog.
Let’s look at a common case, which would require a unified view of (1) Service, (2) Environment, (3) Deployed Service and (4) Deployment. Taking these 4 elements, in this use case, will provide a clear understanding of each service’s maturity and readiness and a detailed view of every service’s lifecycle from the first commit to many deployments running across different environments. Let’s look at the definitions for this case.
A service can be a microservice, software monolith or any other software architecture.
An environment is any production, staging, QA, DevEnv, on-demand or any other environment type.
A deployed service is a representation of the current “live” version of a service running in a specific environment. It will include references to the service, environment and deployment, as well as real-time information such as status, uptime and any other relevant metadata.
A deployment could be described as an object representing a CD job. It includes the version of the deployed service and a link to the job itself. Unlike other objects, the deployment is an immutable item in the software catalog. It is important to keep it immutable to ensure the catalog remains a consistent source of truth.
However, in some cases, answering the “what is deployed where” question can be far more complex, as a function of the environment. Let’s look at this environment, which would require mapping, beyond the four elements of (1) Service, (2) Environment, (3) Deployed Service and (4) Deployment, the following: (5) Namespace (6) Cluster (7) Cloud account as well as (8) System and (9) Product Unit. In this case, answering questions would need to be based on the mapping of all nine elements.
The software catalog should be live. Once you define the basics, the software catalog integrates with your development lifecycle, immediately presenting the data you need (K8S exporter, Terraform, Github app, Jenkins, etc). It should also have a minimum infrastructure footprint with a complete decoupling between the self-service UI and the underlying infrastructure. A graph view showing dependencies is also required.
5.2 A builder based approach
Different organizations have different needs and architectures, and as a result, they require different software catalogs to visualize and represent their SDLC.
It is advisable to begin with schema definitions for software catalog assets, so that it will be easy to build the catalog. A schema is the basic building block - a blueprint. It represents assets that can be managed in the internal developer portal:
- Databases, and more.
Blueprints support any number of properties, and would typically contain the parts and properties of your infrastructure that you want to manage and track. As shown above, they are the only way of mapping what needs to be mapped in the “what is deployed where” view. In the next step, entities that are mapped to the blueprint schema are created.
Imagine taking Kubernetes data and mapping it to entities, to form the software catalog, putting data in the relevant entity. You can manage the representation of a running cluster, see a table with all of your namespaces and see which services are deployed in each namespace. Another use case is tracking authentication and authorization resources. You’ll have the ability to see a Service Account (an identity for processes that run in a Pod), and both the pods where it is being used and the relevant rules and policies.
5.3 The importance of self-service
Gartner, in its “Software Engineering Leader’s Guide to Improving Developer Experience” report (requires a subscription) highlights the the importance of developer self-service: “Developer self-service has an inherent benefit to bringing consistency and repeatability to otherwise disparate processes and error-prone manual handoffs. The goal of self-service is to ensure developers have an experience that makes ‘the right thing to do, the intuitive thing to do.’ For example, the ability to self-serve pre-vetted open-source libraries from a trusted component catalog improves governance, as well as developer experience.”
The key to success is to use a product mindset - the internal developer portal should be as easy to use and made with the same consideration made to a user journey taken by an “ordinary” user of any software product. This means that it should be easy for developers to self-serve but also that abstractions should be used in a way that is useful and helps developers make the right decision.
In this respect, it’s important to ensure that self-service extends beyond microservice scaffolding and lets developers provision, terminate and perform day 2 operations on any asset exposed in the software catalog, within the policies, manual approvals and guardrails within the organization, including pre-defined templates. Let them provision a dev env, request permission for an S3 bucket or add a secret to a microservice.
5.4 More than a Kubernetes abstraction
One of the challenges of platform engineering is to find the right abstraction for your developer-customers. The right abstraction differs between organizations and even between developers in the same organization.
Every organization is a snowflake and has its processes and workflows. Some organizations develop production-critical products while others develop a "nice to have" developer tool product. Some organizations have developers who are familiar with the bits and bytes of Kubernetes and developers that do not even want to hear about it. There are organizations that are more focused on big-data technologies, while others focus on front-end technologies, and the list goes on. That’s why tools that create an abstraction layer above Kubernetes that can not be customized between different organizations and personas are useless.
Most likely you will not find an off-of-shelf abstraction layer that fits your organization's specific needs. One developer portal can be amazing for one company but a disaster for another.
Developer Portals must be customizable to the exact needs of the different developers in your organization. Without these customization capabilities, a developer portal in your organization will be too complex for the developers and not safe for the DevOps teams, or it will put the developers in "golden cages" and will decrease the velocity of the R&D teams.
In addition, the DevOps ecosystem does not consist just of Kubernetes. There are git repos, runbooks, cloud and identity providers, CI/CD pipelines, tickets, on-call, observability tools, and a lot more.
Developer portal must encapsulate all the moving parts in the DevOps ecosystem of your organization. Just Kubernetes is far from enough.
5.5 Vs GitOps
GitOps lets developers get things done using code changes in git. Developers can deploy a microservice, provision cloud resources, manage environments, configurations and more. While this does offer a good developer experience, it doesn’t replace the need for a developer portal.
There are several reasons why:
- Files associated with GitOps are distributed across the codebase, making it difficult to determine what needs to be changed and creating a risk of significant outages.
- DevOps and developer properties exist in the same file, making it difficult for developers to tell what needs to change.
- A proliferation of tool requests which create many tickets.
- Cases where Git doesn’t reflect the state of the world, which may lead developers to make the wrong decisions.
- Lack of clarity when there are many repositories and configuration files.
An internal developer portal is a decoupled interface on top of GitOps, ensuring developer inputs are validated and that developers take the golden path. Developers won’t need to make sense of GitOps files. Typically, basic and recurring operations will be performed through a developer portal. Pre-defined self service actions will reduce the number of pull requests. This is best if you consolidate self-service actions, so that if in GitOps you need several file changes, try to set just one self-service action in the internal developer portal.
5.6 Vs CI tools
Enabling Jenkins self-service can work well for developer self-service, but there are many known visibility, compliance and other issues that stem from the inherent openness of Jenkins. At some point the speed and flexibility can turn into a mess.
Jenkins isn’t built for self-service for a variety of reasons.
- It is stateless, so it’s difficult to track the changes made and extend actions.
- It has a limited set of UI components, so that forms contain no input validation (both RegEx & validation against 3rd parties), input of third party data (e.g. having a drop list with all the S3 Buckets for the user, and if-then-that features.
- Jenkins is also tightly coupled. Which makes changes more difficult
All this creates a bad developer experience, a high potential for mistakes and compliance and security issues.
6. Benefits of an internal developer platform
“Our KPI is the percentage of developers that entered the developer portal at least once in the past week. Today, it stands at around 40%. The more actions we enable … the more people use it” (Lior Rabin, monday.com).
How do you know an internal developer portal has reached the mark? A good internal developer portal should reduce the burden for developers in terms of dealing with the complexity of modern software. The result should be just one place for developers to go for anything related to environments, deployments, software, ETL, databases - and a simple way for them to perform the self-service actions they need. When the burden of complexity is reduced, software quality, maturity, security and stability should improve, with a great developer productivity and satisfaction. It will also free DevOps to focus on building infra and automations, which will probably make them much happier too.
Don’t let the “golden path” enabled through the developer portal change into a “golden cage”. Developers can’t be forced to always use the developer portal. A good IDP should provide a golden path for 98% of use cases, so it does achieve the benefits associated with it. But there will always be the 2% of use cases that require work outside the developer portal - IDPs should embrace that rather than preventing it, and beware of creating a “golden cage”. It is important that even if the engineer did not use the golden path for self-service actions, that any action made will appear in the IDP through the ingestion of data from the engineering infrastructure. This will be done through an exporter that will populate the service catalog with data coming from other systems (e.g. K8S) even in case the changes were not done through the internal developer portal.
6.1 Quantitative benefits
Twitter defines the success of its platform engineering team through its developer velocity. By using an internal developer portal, it expects to double it.
Today, “we start by looking for velocity,” said Nick Tornow, platform lead at Twitter. “We define that as the number of features an engineer can deliver in a unit of time, and we want to double that by the end of 2023.” source
Many industry experts estimate that the use of internal developer portals can reduce the number of tickets received by DevOps by up to 80%, freeing valuable DevOps time.
Besides developer velocity, you can also measure developer productivity, DevOps productivity, a reduction in cloud costs as well as even grander goals such as a reduction of technical debt or reduced downtime. DORA metrics can also improve, as well as MTTR and the time it takes to onboard a new developer.
An often overlooked benefit of developer portals is that they can set standards for service maturity by providing developers with guardrails for deploying right and standards for developer productivity. Those same elements can also be measured per service, developer or team. Service maturity can be many things, but you can use the internal developer portal to define it as a mixture of production readiness, quality, security and compliance. You can then immediately score all services, by teams, developers and more, with immediate knowledge which services or resources are up to par and which aren’t.
“A feature we’re thinking of for the future is to score a service’s readiness level. If a service is not updated with the packages that we have defined as core, if on GitHub it doesn’t have configurations that are required, if the service doesn’t exist in all our geo regions - we can calculate a score. A score will let us warn the service owners that their service is below par. If a service drops below a certain threshold, we may want to block deployments to production or something similar" (Lior Rabin, Monday.com).
6.2 Qualitative benefits
Changing the culture to shift-left, making developers’ lives easier, freeing DevOps to work on more strategic projects, all make the company a better place for engineering teams. All this can be achieved through the adoption of an internal developer portal.
7. The future of developer portals
This entire workbook mostly concerned itself with how humans interact with internal developer portals. There is a future beyond this: machines will also interact with internal developer portals. Let’s think of this example: it's black Friday and we don’t want any changes to an e-commerce site. The IDP can be queried by the CI/CD pipeline to check the “Lock Status” of services deployments to production. What this example is telling us is that IDPs will evolve to become the central location that reflects the actual state of the entire architecture and software, (permissions, secrets, owners, etc) and where all the do's and don'ts are documented and enforced. In this future IDPs will also present machines with interfaces similar to those presented to humans, to trigger DevOps flows.
Welcome to the future of DevOps - Platform Engineering.