There are many good reasons to begin using an internal developer platform. An internal developer portal reduces developer cognitive load and sets the path for a culture of high quality software from the operational, readiness and maturity points of view. A developer portal also frees developers to act on their own by using self-service actions to scaffold a service, temporarily access a cloud resource or add a secret to a service.
When you build an internal developer platform you bring deep context to every engineer within the organization. This context is powerful. It tells engineers what lies where, who’s responsible for it, what maturity levels it’s at and more. Many times it maps and shows microservice and entity dependencies that were previously available on too many devops tools or “cataloged” on a huge csv file.
What many people miss, is that this context and information can also deliver context and valuable data not only to engineers but also to machines.
A missing part of the Internal developer platform stack is workflow automation
Workflow automation is a major pillar in how internal developer portals will be used in the future. In this case the IDP isn’t used just by developers to track service maturity, production readiness or for developer self-service actions. It will be used as part of machine-driven workflows.
In this case devops or infra engineers use the internal developer platform to automate devops workflows by integrating with the IDP’s API, since it is a single source of truth of the state of software within the organization. “Single source of truth” here isn’t some marketing fluff. It really represents the fact that the internal developer portal is a real-time reflection of the state of microservices, deployments, clouds and anything in between.
Here’s how we see this “stack” where alongside the software catalog, service maturity and developer self-service lies workflow automation:
Human Stakeholders in an Internal Developer Platform
Developers: The primary users of an internal developer portal are the developers within the organization. They rely on the portal for access to documentation, code repositories, and other resources needed for their work as well as developer self-service actions..
Engineering management: Management may also be stakeholders in an internal developer portal as they have an interest in ensuring that developers have the resources they need to be productive and efficient.
DevOps and Platform Engineering: The software catalog within an internal developer portal isn’t just a microservice catalog, since it also covers resources. The software catalog includes the infrastructure and the software deployed over it, and reflects the entire ecosystem surrounding the software development life cycle: dev environments, CI/CD, pipelines, deployments and cloud resources. The software catalog also shows KPIs in context of a certain service, its deployment and the environments it runs on. As DevOps/Platform are responsible for the Internal developer portal, they are also primary users as they need to be able to see and manage the many moving parts making up their software ecosystem.
CI/CD jobs, Cron jobs, Weekly cleanup scripts for permissions or environments, pipelines, SRE automations and runbooks: machines as developer portal consumers.
By accessing the developer portal, machines get access to a single API that contains a real time software catalog. This provides machines with the context they need to make an automated decision. Machines can check in with the internal developer platform and fail CI jobs, auto-terminate resources or run a CD flow based on software catalog data. This is a better option than implementing glue-code or ad-hoc databases, since the margin for error is much lower.
How the software catalog API is used by machines for automations
To explain how machines use the software catalog’s API, let’s first explain what’s in a software catalog and why it matters.
While some internal developer portals are focused on cataloging microservices, the correct approach is to go beyond microservices and reflect the fact that the software catalog should show the entire ecosystem surrounding the software development lifecycle.
An ideal software catalog should contain it all: CI/CD flows, dev environments, pipelines, deployments and anything cloud. This is the one place that shows the context of all software elements, and the corresponding dependencies between them:
Microservice quality metrics
Service readiness metrics
Permissions to cloud resources
Service performance metrics
And much more
The software catalog is a hub of all software metadata as well as live deployment data. It contains and unifies all siloed devops metadata, and keeps it updated as developers execute self-service actions.
This is where it gets interesting for the machines. To illustrate that, let’s check out five queries machines can make using the internal developer portal’s API.
Use case number 1 - Failing a build if a microservice does not meet quality requirements as delivered by internal developer portal API
Developer portals unify data regarding the quality of a developed service. This is a collection of information that was spread across many different developer tooling and CI pipelines. This data can even be represented in the developer portal as one single metric, based on a calculation made on KPIs, rules and checks.
For instance, in case you want to verify that tests passed the CI, that an owner exists for the service, and that the code scanning tool check for the service passed, you can use this scorecard:
Let’s see how machines can use this data. In this case, imagine your developer portal is queried within the CI workflow. You can check a single metric and make sure it meets a predefined threshold (or fail the build), for example:
If (qulity_result.status != "Gold"):
This is better than writing this query, and more importantly, it unifies data and allows the definition of a common metric that is accessible from anywhere in the organization, with a consistent definition that does not require thinking of the different KPIs, a calculation and the tools in which such metadata exists. You just set a policy and define the calculation once.
If (tests_results.score != 100 && Jira.tickets.count < 5 && x & y & z):
Use case number 2 - Query the developer portal API and revert a microservice if service KPIs are below a certain threshold
The developer portal unifies data regarding how a service behaves in production. It collects information spread across many observability, monitoring and tracking tools and technologies. You can represent service health in a single metric calculated in the developer portal, based on the many results of rules and checks within it.
As an example, you can automatically revert a new version of service in case its health is poor.
In case a service is poorly functioning in production, you can automatically revert its version (you can also have a manual approval for major version).
This can be done by using an event subscription mechanism in the developer portal. This will trigger a revert version flow when health is marked as “critical”.
Use case number 3 - Terminate an ephemeral environment after TTL is met, based on the software catalog API
When providing your developers with the ability to self-provision developer environments from the Developer Portal, you might want to consider putting a TTL input to know when to terminate their environment.
By using a TTL input, you can subscribe to the event mechanism of the developer portal, and once the TTL gets to 0, trigger an automation to self-terminate the environment.
Many companies usually manage ad-hoc databases to document timestamps of ephemeral resources for their developers. This is error-prone and labor intensive. Instead you can leverage the power of a developer portal to automate this process.
Use case number 4 - Fail build if deployments to production are locked due to special events
Have you ever seen this Slack message in your engineering channel?
@here hey guys :) Please don't deploy to production until Monday next week!
In many cases there is an engineering-wise decision to disable deployments to production, for all kinds of reasons, usually associated with minimizing the risk of deploying at the wrong time, when there won’t be anyone to fix things if they go wrong.
An engineering-side Slack message is not a solution. In some cases enforcing locks on deployments means ensuring pipelines against locked services will fail.
It is possible to have your developer portal include a lock indicator on certain environments. In this case you can easily fail a build in case lock == true.
And you can also deliver an automated slack message to notify people deployments are locked :)
Use case number 5 - Force merge with bug fix (pass tests) in case of an outage
“Can you remove the enforcement in Github so we can skip tests?! We need to deliver this bug fix now and we can not lose valuable time!”. Have you heard this? We have…
Sometimes you need to push the red button and skip tests to deliver a bug fix version of code to production. This usually requires the admin of the Git Provider to skip tests. This can be addressed in an internal developer portal through a self-service action to request a “Force Merge” on a service (this can require a manual approval from a team member or team lead, if required).
Types of workflow automations through an internal developer portal API
I hope the five examples above got you thinking. I’m sure there are many more such cases.
There are two ways machines can interact with a developer portal and perform automated tasks. The first is plain vanilla API queries against the software catalog. The second is event subscriptions to changes in the software catalog.
API Queries against the software catalog
The data in the software catalog is held as a graph, due to the nature of and relationships among software components. Software pieces have dependencies between them. It's important, therefore, that the API supports graph related queries, since oftentimes this is the only way to answer deep questions about microservices.
Event subscription to the software catalog
Due to the nature of software, the data in the software catalog changes. Events can be anything: if a TTL reaches 0, an owner of a service changes or a performance metric for a given microservice becomes poor. You can run an automated workflow in any of these cases once the developer portal fires an event indicating on_x_change(e).
In conclusion, the integration of machines with an internal developer portal can bring significant business value to an organization. By accessing the developer portal's API, machines can benefit from the context provided by the software catalog and make automated decisions based on this information. For example, a machine may use the developer portal to fail a build if quality requirements are not met, or to automatically terminate resources to reduce costs and meet organizational security and compliance standards. Overall, the integration of machines with a developer portal can be a valuable asset for an organization, helping to improve the efficiency and effectiveness of its software development process.