The software catalog is a central metadata store for everything application-related in your engineering, from CI/CD metadata through cloud resources, Kubernetes, services and more. As described in Chapter 2, its building blocks are blueprints and relations.
More than a static inventory
Far from being just a “flat” repository for static metadata (e.g., ownership, logs), the software catalog is continuously updated and enriched with context based on your specific data model. Software catalogs deliver value to the dev organization in several key ways:
- Help developers answer critical questions (e.g., “What is the version of the checkout service in staging vs. production?”)
- Drive ownership and accountability. Port syncs with your identity provider to reflect software ownership from the team structure point of view.
- Offer a “single pane of glass” into your services and applications
Let’s dive in.
Data models are driven by the needs of your “personas”
Imagine that you’re the platform engineering team designing the software catalog for your organization. How would you go about it?
You’d probably start by thinking about the different personas that will be using the software catalog. Developers on the team would probably want to abstract away, say, Kubernetes complexity – for them, the ideal view would probably include production services, vulnerabilities, and a minimal amount of K8s. In contrast, DevOps users would want to understand the infrastructure more deeply, including cost and resource tradeoffs.
The point is that there’s no “one size fits all” answer to what data goes inside the catalog and the views that different team members will use. It depends on the user personas and their needs.
This is where different data models come in.
Base data models and extensions
Several base data models serve as foundational frameworks for structuring software catalogs. These models are designed to answer critical questions about common infrastructure and applications.
Here are some of the most common base models:
- Classic (aka SDLC) Model: This model encompasses three primary components: Service, Environment (representing different environments), and Running Service (depicting the live version of a service in a specific environment). Its goal is to make it easy to understand the interdependencies in the infrastructure and how the SDLC tech stack comes together. This helps answer questions such as how are cloud resources connected to a test environment on which a certain service is running, all within a larger domain and system.
- C4 Model: Port uses an adaption of the Backstage C4 Model, which provides a hierarchical approach to visualize software architectures built around "Context, Containers, Components, and Code." Context reveals the software catalog's broader position in the ecosystem, Containers identify major components, Components delve into internal structures, and Code showcases low-level details.
And here are some extensions to base data models:
Primarily for DevOps
- Kubernetes (K8s): This expands the data model to represent all K8s data around infrastructure and deployment, utilizing Kubernetes objects like Pods, Deployments, Services, ConfigMaps, etc. to define system state and management.
- API Catalog: Adding API data where each API endpoint is part of the catalog, alongside its authentication, formats, usage guidelines, versioning, deprecation status, and documentation. This can support API route tracking and health monitoring.
- Cloud Resources: Expanding the model to encompass the entire technology stack involves representing both software components and the underlying cloud resources that support them. This approach provides a unified view of the software's technical context within the broader cloud environment.
- CI/CD: Including information about CI/CD pipelines and related tools augments the data model's scope. This offers a complete representation of end-to-end development and deployment workflows, enabling efficient management of software release processes.
For all developers
- Packages & Libraries: Extending the model to include packages and libraries facilitates improved software dependency management. This is crucial for maintaining software integrity and security by tracking and overseeing dependencies effectively.
- Vulnerabilities: Integrating security vulnerability information into the data model enables the identification and management of vulnerabilities present in software components or packages, bolstering security measures and risk mitigation.
- Incident Management: Integrating Incident Management information, such as data from tools like PagerDuty or OpsGenie, extends the data model to handle incidents, outages, and response processes. This inclusion provides a comprehensive view of how the software ecosystem responds to and recovers from unexpected events, contributing to overall reliability and rapid issue resolution. It also provides the ability to track historic incidents and create an internal knowledge base of incidents and their resolution.
- Alerts: Incorporating alerts into the data model provides timely insights into system performance, security, and health. This proactive feature empowers teams to take swift actions, ensuring a stable and reliable software ecosystem.
- Tests: Expanding the model to encompass tests, their status, and associated metadata creates a centralized view of testing efforts. This aids in monitoring testing progress, identifying bottlenecks, and promoting efficient quality assurance.
- Feature Flags: Bringing in data from external feature flag management systems allows for controlled and visible management of application features. This approach fosters an iterative and data-driven approach to feature deployment, enhancing flexibility and adaptability.
- Misconfigurations: Addressing misconfigurations by integrating them into the model helps prevent security vulnerabilities, performance issues, and operational inefficiencies. This ensures the software's operational health and stability.
- FinOps: Adding FinOps cloud cost data into your portal will instantly map it to developers, teams, microservices, systems, and domains. This simplifies the data, letting you easily break down costs by service, team, customer, or environment – helping FinOps, DevOps, and platform engineering teams efficiently manage costs and optimize spending without spending hours on basic reporting.
The software catalog should be stateful and updated in real time
Two important final considerations in the design of your software catalog.
First, your data model should support a stateful representation of the environment. For example: the running service in the classic model (see above) reflects the real world, where “live” services are deployed to several environments, such as development, staging or production environments. This is critical for providing context. (Again: your code isn’t your app.)
Second: the software catalog should sync in real time with data sources to provide a live graph of your services and applications. For example, integrating with CI/CD tools ensures that your software catalog will be kept up to date by making updates directly from your CI/CD.