GitOps has a lot going for it, but it can also be challenging from the developer experience perspective. In this blog, we’ll discuss the reasons that impact the developer experience (and DevOps outcomes) and how internal developer portals can keep what’s good about GitOps and provide a smoother developer experience.
GitOps is about managing application and infrastructure configurations using Git as a declarative single source of truth. It uses application development operational best practices for infrastructure automation, from version control through CI/CD, environment management, etc. It leverages the benefits of Git while interacting with anything from code to cloud.
While GitOps is great, it has some downsides. One, as GitLab comments here, is the need for collaboration and the need to prevent “cowboy engineering” while introducing a “change by committee” element to infrastructure. Others are related to the developer experience or, in simpler terms, getting developers to code. While GitOps reduces tickets, and supports some developer self-service, it still creates cognitive load.
In this blog, we’ll describe how GitOps impacts the developer experience.. When we talk about the developer experience, we’re focusing on simpler, clearer workflows with less potential for mistakes or developer cognitive load. This isn’t to say that GitOps is wrong - it certainly is a great practice- but you can make it even better by adding a developer experience layer on top. To do this, we’ll need to begin with discussing what’s not that great about GitOps.
GitOps and the developer experience
One of the core ideas behind internal developer portals is to let developers focus on their code, and not worry about the operations side, or spend too much time waiting for their tickets, pull requests and more. By focusing on core coding, developers will become more productive, be less distracted by the cognitive load and complexities of the devops tools stack and will be faster to onboard and resolve on-call issues. The catch-all term for this is developer experience.
GitOps certainly lets developers get things done using code changes in git. Developers can:
Deploy a microservice
Provision cloud resources
Manage configurations and validate them
From the DevOps point of view, performing these changes through GitOps has incredible benefits. From the developer point of view, it is a better developer experience, but it’s not that simple to do.
A distributed code base and fragmented file types
In many cases, files associated with GitOps operations are distributed across the codebase. They may be YAML, JSON, IaC, or any other format in mono or polyrepo. This requires a steep learning curve for many developers. Mistakes can easily happen when developers are required to edit many files across various repos.
Developers may also find it difficult to get around the entire ecosystem of GitOps files. In some cases, a file hierarchy can make it even harder to apply proper changes. Helm charts, for instance, can have dependencies, called subcharts, with their own values and templates. When you have several value files for each Helm Application, set in a specific hierarchy, it becomes difficult for developers to not deviate from the golden path. In all those cases, mistakes carry a significant cost: modifying the wrong value or file can cause significant application outage.
ArgoCD is an excellent example. In ArgoCD it is common to use the App of Apps pattern, where ArgoCD manages the main application whose responsibility is to provision and manage microservices and applications in a Kubernetes cluster. This is a classic example of making it easier to provision new apps but at the cost of increasing developer cognitive load. Even though the YAML file for a new application is well-structured and has a set format, every app needs to inherit some of its configuration from values files. These files could create complexity by overriding files or creating a conflict between them. In this case, it could be difficult to determine the exact resulting state of an app.
Additionally, it would also be difficult to determine which values file is causing the change in the deployed application. And one more critical issue is that if all apps inherit values from one single file, a change in that file could have overarching effects on the cluster and even cause faults or service downtime.
Lack of separation of powers (DevOps and developers)
Within the GitOps world, YAML or JSON may contain an unhealthy mixture of properties - some are the responsibility of DevOps, and others are within the developer responsibility. DevOps will modify boilerplates repeated across multiple microservices, and developers will modify values and application-related properties (such as path/port/version/etc.). This combination may cause chaos for DevOps who want to preserve structure and order, while the developer experience can be challenging and frustrating, since the most simple actions require a context switch for the developer.
Time to merge pull requests
When everything is managed through code, Pull Requests increase dramatically.
As time goes by, the amount of repositories (or branches) increases the more environments and applications there are. When you multiply it by the number of microservices you own, there is a proliferation of repositories, and each one has pull requests, opened either through automation or by a developer.
In this case, just managing the mapping of which team should approve which pull requests, based on which files were changed, becomes challenging quickly.
To improve that, sometimes you want to allow automatic approval for specific use cases, but, as the number of GitOps use cases is significant, it's tough to manage an effective approval process across a big organization, especially if you are under SOC2 or FedRAMP compliance, which limit bulk pull request approvals, as a result of their change control policies.
Differences between Git and the actual state of the world
Git is a declarative source of truth and is made for manual editing and resolving conflicts. But as the production environment often changes through the CI process, production reality can say one thing, but Git will say something else. On the DevOps side, understanding why this is the case isn’t simple, and fixing this may be quite a hassle for the team that owns the GitOps process. Similarly, on the developer side, if Git causes a developer to assume a certain production reality that doesn’t exist, the end result will be a lot of work and bad developer productivity.
For example, a configuration derived from the Git files specifies that a microservice should have a minimum of X containers, but in order to resolve a production error, the DevOps scaled up the microservice and did not update the Git files. In this case, the value in Git does not reflect the actual state of the infrastructure.
This will result in a constant barrage of questions, support issues, and other interruptions for the DevOps team.
Lack of clarity
The theory is that GitOps will provide high visibility since all intended states exist in Git, where you can see the state of affairs. This is true but only if you’re not experiencing repository proliferation. In most cases you’ll have many repositories and many configuration files. This will make it difficult to answer questions relating to microservice deployments, since the Git repository changes will be difficult to make sense of.
GitOps is not for everyone
Engineers come in different shapes and sizes, and with different levels of expertise. Some developers are DevOps oriented and have no problem in directly interacting with infrastructure. Others are more application-oriented and don’t know DevOps much or aren’t excited about it. For the latter group, GitOps isn’t smooth sailing, especially if they used to work on on-premises software and have only now begun moving to the cloud.
GitOps is here to stay for many good reasons, but the problem of developer experience and cognitive load remains. One solution is to create a simplified interface over GitOps to reduce developers’ cognitive load and reduce some of the issues with scaling GitOps over large engineering environments.
Developer Portals were introduced to our life a few years ago by Spotify, whose Internal Developer Portal was released as an open-source project called Backstage.
The origin story for Backstage is that as DevOps becomes mature, the developer stack may become more sophisticated but also complex and fragmented. Developer Portals simplify developers’ life by offering a software catalog: a single pane of glass that includes everything needed for them to operate daily. This abstracts away cognitive load and the need to track tribal knowledge. Some of that knowledge can also be GitOps: it enables automation and efficiency but also requires expertise. A developer portal can allow the beneficial use of GitOps but abstract away its complexities and allow users to perform self service actions without issues and with the proper guardrails in place. In Port, self-service actions are to provision, terminate and perform day 2 operations on any asset exposed (microservice or not) in the software catalog, within the policies and guardrails you’ve set. They can also provision a dev env, request permission for an S3 bucket or add a secret to a microservice. Once the developer acts, the Developer Portal will automatically initiate a GitOps process via commit to the relevant repository.
How GitOps works better with Internal Developer Portals
Using an internal developer portal puts a developer portal user interface on top of GitOps. A user interface ensures developers take the golden path and that the input validation is there, by definition. This makes mistakes less likely and helps developers get what they need with little cognitive load. They aren’t exposed to the Git files and don’t need to consider them.
The best approach is hybrid. You want the basic and recurring operations to be done via a developer portal and more complex changes requiring human input. You do need to ensure that changes made by a human are also reflected in the developer portal.
Developer portals are also a way to skip the pull request wait - the pre-defined self service actions that developers can do in the portal can actually transform three pull requests into one click.
Most self-service actions limit the freedom of developers to make mistakes - and this ensures that GitOps files become (relatively) error-proof. Additionally, by providing advanced role-based access control, you can define who can approve what, or even skip approvals in some cases.
An internal developer portal tracks changes, and as a result, error messages can be used to indicate which self-service action or software catalog changes caused an issue, for simpler troubleshooting, achieved by removing developers’ need for direct interaction with the GitOps files.
This can’t happen all on its own: here are some best practice recommendations for the use of GitOps and developer portals:
Make sure that GitOps changes and their state are reflected in the internal developer platform - in case the changes happened through self-service actions or in case the changes happened through a hybrid approach (ensuring that manual steps are also reflected in the developer portal).
Consolidate self-service actions, so that if in GitOps you need several file changes, try to set just one self-service action in the internal developer portal.