Platform Architecture¶

Architecture

This document describes the architecture of the Contain Platform using the principles of the C4 model. The C4 model helps communicate software architecture at different levels of abstraction, making it understandable for various audiences.

We use the formal C4 levels to "zoom in" on the platform's architecture:

Level 1: System Context: Shows how the platform fits into a typical organization, its users, and its high-level system interactions.
Level 2: Architectural Views: Decomposes the platform into its largest parts from different perspectives.
Level 3: Components: Zooms into individual sub-systems to show their key components.

Level 1: System Context¶

Overview¶

The System Context Overview diagram provides a simplified, high-level view of how the Contain Platform fits into a typical customer's environment. It shows the key players and systems involved, treating the platform as a single "black box" to clarify its main purpose and interactions.

Application Observability Service

Application Observability is an add-on service. While we utilize an observability system for managing and monitoring the platform itself, we also provide a fully managed observability service for customer applications.

Diagram Explanation

Developer: Builds and operates the customer's business applications and uses the Application Observability Service to improve application performance and stability.
Customer: Owns and operates business applications and can use the application observability service to monitor them
Customer Applications: The new, value-creating applications and services that run on the platform.
Contain Platform: The platform is shown as a set of high-level systems:
Contain Base: A managed environment for running containerized applications.
Application Observability: Provides insights into the health and performance of applications.
Managed Services: Offers a range of services to support applications.
Contain Operations: Manages, monitors, and maintains the platform.
Contain Platform Engineers: Develops and extends the platform.

Detail - Customer Engagement¶

This diagram focuses on how a customer's development team interacts with the Contain Platform and their own systems. It shows the key relationships between developers, the applications they build, and the platform services they consume.

Diagram Explanation

Developer: Builds and operates business applications on the Contain Platform.
Customer Applications: New, value-creating applications and services built by the customer.
Customer Systems: Existing customer systems that the platform integrates with, such as SIEM, monitoring tools, and other applications.
Customer IdP: The customer's Identity Provider (e.g., Entra ID, Keycloak) for user authentication.
Contain Platform: The system itself, composed of:
Contain Base: Provides a secure, managed environment for running applications.
Application Observability: Provides insights into the health and performance of customer applications.

Detail - Our Engagement¶

This diagram shows the system context from our perspective, illustrating how our internal teams manage and develop the platform. It highlights the separation between our operational responsibilities and the customer's environment.

Diagram Explanation

Contain Operations: The team responsible for the 24/7 monitoring, management, and maintenance of the platform's health and security.
Contain Platform Engineer: The team responsible for developing new features and extending the capabilities of the platform.
Customer Applications: The applications and services built by the customer, which run on the platform.
Contain Platform: The system itself, composed of:
Contain Base: The managed environment where customer applications run.
Platform Observability: Our internal system for monitoring the health of the core platform components.
Application Observability: The service that provides observability for customer applications.
Infrastructure: The underlying cloud or on-premise resources (e.g., servers, networking) where the platform is deployed.

Level 2: Architectural Views¶

Level 2 diagrams "zoom in" on the Contain Platform to show its high-level structure. We can view the platform's architecture from different perspectives.

Logical Sub-Systems View¶

This first view decomposes the platform into its major logical sub-systems, showing how they work together to provide the platform's capabilities.

Diagram Explanation

Management Plane: The central control point that manages the lifecycle of and offers services to the clusters.
Workload Plane: One or more Kubernetes clusters where customer applications are deployed and run.
Observability Plane: Systems that collect and analyze telemetry (metrics, logs, traces). This includes both our own internal observability systems and the Application Observability add-on service.
Managed Services: A suite of managed services like databases and object storage.

Cluster View¶

This second view shows the platform's architecture from the perspective of its physical cluster topology. The platform is composed of several specialized Kubernetes clusters that work together.

Diagram Explanation

Workload Cluster(s): The clusters where applications run. This runs the Contain Base service and other managed services.
Application Observability (if enabled): Service that provides a monitoring stack for your applications.
Management Services: Various services used by the clusters (image registry, IdP, etc.)
Managed Services: A suite of managed services like databases and object storage.

Infrastructure View¶

This third view shows the platform's architecture from the perspective of its physical infrastructure in a single datacenter. It illustrates how a workload cluster is deployed within an isolated network environment on the underlying servers or virtual machines.

In reality the cluster will span multiple datacenters, Availability Zones or Availability Cells. For the sake of simplicity, we will focus on a single datacenter in this view.

Diagram Explanation

Infrastructure Layer: The physical or virtual servers that host the platform.
Network Segment / Security Group: An isolated network environment for each cluster, providing security and separation.
Workload Cluster: Decomposed into its control plane and worker nodes.
Control Plane: A set of three nodes for high availability.
Worker Nodes: A minimum of three nodes (usually 6 or more for production environments) that run the customer applications.

Level 3: Components¶

Level 3 diagrams zoom into a specific sub-system to show its internal components. These are useful for understanding the details of a particular capability.

Workload Plane¶

This diagram details the major logical systems inside a single Workload Plane, which is one or more Kubernetes clusters. It shows the Kubernetes Control Plane, the GitOps controller that manages the cluster state, and where customer applications run.

Contain Base¶

Contain Base is our fully managed and production-ready Kubernetes service that serves as the central orchestration engine for containerized applications. It's a curated, "batteries-included" distribution that extends upstream Kubernetes with a set of best-in-class, open-source components, pre-integrated and hardened by our team.

This diagram shows the key components that are included in every cluster as part of the Contain Base service.

Diagram Explanation

Automation & Delivery: Using GitOps to manage the cluster state.
Security & Governance: Enforcing policies and managing secrets and certificates.
Networking: Controlling ingress traffic and managing DNS.
Operations & Resilience: Providing backup/restore capabilities and resource metrics.
CSI, CNI, CPI: Providing storage, networking, and cloud provider integration.

Management Plane¶

The Management Plane provides a set of centralized, shared services that are used by the workload clusters and the operations team. These services are essential for the security, automation, and management of the entire platform.

This diagram shows the key services that make up the Management Plane.

Diagram Explanation

Identity Provider (IdP): A centralized service for managing user and service authentication.
Container Registry: A private registry for storing and distributing container images, with integrated vulnerability scanning.
Git Server: The source of truth for the GitOps-driven cluster and application configuration.
DNS Provider: Manages the DNS records for services running in the workload clusters.
Secret Store: A secure, centralized location for managing secrets and other sensitive data.
PKI Service: Manages the Public Key Infrastructure, including issuing and renewing certificates.
Image Scanner: Scans container images for known vulnerabilities.

Git and Artifacts Flow¶

The platform is built on a GitOps workflow, where Git is the single source of truth for both application and infrastructure configuration. This diagram illustrates the end-to-end flow, from a developer pushing code to a new version of the application running in the cluster.

Diagram Explanation

A developer pushes code changes to a Git repository.
The push triggers a CI/CD pipeline that builds and tests the application, then pushes a new container image to the registry.
The pipeline updates a configuration file in the Git repository with the new image tag.
The GitOps controller (Flux), running in the cluster, detects the change in the repository.
Flux applies the new configuration to the cluster, which then pulls the new image and deploys the updated application.

Observability¶

The platform provides a centralized observability solution based on the Grafana LGTM stack (Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics). When the Application Observability Service is enabled for a workload cluster, telemetry agents are deployed to collect and forward data to the central observability plane.

This diagram illustrates the flow of telemetry data from an instrumented application to the observability platform, where a developer can analyze it.

Observability Plane

The Observability Plane covers all services related to observability. While the diagram shows one system, in reality it consists of multiple observability stacks. For instance, to avoid resource contention, we deploy our own internal stack for monitoring the platform itself. For more information, see Observability.

Diagram Explanation

Developer: Instruments their application using OpenTelemetry SDKs and uses Grafana to view dashboards, logs, traces, and metrics.
Customer Application: The customer's application is instrumented to generate telemetry data (logs, metrics, traces) using OpenTelemetry which is either pulled or pushed.
Telemetry Collectors: The telemetry data is collected either by scraping customer applications or via pushes from customer applications.
Observability Plane: Dedicated clusters hosting the LGTM stack, which receives, stores, and visualizes the telemetry data.

Authentication¶

While direct access to the Kubernetes API can be made available, we encourage teams to interact with the Contain Platform primarily through the GitOps workflow for deployments and the observability platform for monitoring. This approach provides a more secure, auditable, and collaborative environment. Direct access may not always be available, depending on the specific security and compliance requirements of the organization.

When direct access is required, the platform authenticates users via our Managed Identity Provider (IdP), which can optionally federate with the organization's existing IdP (like Entra ID or Okta).

This diagram shows a simplified, high-level view of the authentication flow from a user's perspective.

Diagram Explanation

User Authenticates to Cluster: The user initiates an action against the Workload Cluster (e.g., via kubectl or a UI).
Cluster Redirects to Managed IdP: The cluster redirects the user's authentication request to our Managed IdP.
Managed IdP Federates (Optional): If configured, our IdP federates with the customer's own IdP, allowing the user to authenticate with their corporate credentials.
User Receives Token: After successful authentication, the user receives a token that the client can use.
User Accesses Cluster: The user's client uses the token to securely access the Workload Cluster.

Authorization¶

Once a user is authenticated, the platform uses Kubernetes Role-Based Access Control (RBAC) to determine what actions they are allowed to perform.

Authorization is based on the group claims present in the user's ID Token (JWT), which is provided by the Identity Provider. These groups are then mapped to Kubernetes Roles or ClusterRoles via RoleBindings or ClusterRoleBindings.

This diagram illustrates how the Kubernetes API server uses the information in a user's ID token to make an authorization decision.

Diagram Explanation

User Makes Request: An authenticated user makes a request to the Kubernetes API, presenting their ID Token.
API Checks RBAC Policies: The API server extracts the user's identity and group membership from the token. It then checks the stored RBAC policies to find any roles bound to that user or their groups.
Allow or Deny: Based on the permissions defined in the associated roles, the API server either allows or denies the user's request.

Backup & Recovery¶

Application Data Backup

By default, the platform's backup and recovery solution covers the Kubernetes resources and configuration within your cluster. However, it does not automatically back up the data stored within your application's Persistent Volumes (PVs). To ensure your application data is protected, you must explicitly use our managed backup service for your stateful applications. Read more about this here.

The platform provides a robust backup and recovery solution using Velero, an open-source tool for safely backing up and restoring resources in a Kubernetes cluster. Velero runs in each workload cluster and is configured to store backups in a dedicated, S3-compatible object storage bucket located in the Management Plane.

This diagram illustrates the high-level backup and restore flow.

Diagram Explanation

Backup Flow

Backup Initiated: An automated backup schedule or a manual trigger initiates a backup job.
Velero Queries API: Velero queries the Kubernetes API server to get a list of the resources to be backed up.
Velero Backs Up Resources: Velero creates a backup file of the Kubernetes objects (Deployments, Services, etc.).
Velero Backs Up PVs: For any Persistent Volumes (PVs), Velero performs a file-system level backup of the data within the volume.
Backup Stored in S3: The Kubernetes object backup and the PV file-system backup are stored in the S3-compatible object storage bucket in the Management Plane.

Restore Flow

Restore Initiated: An operator initiates a restore job, specifying the backup to restore from.
Velero Retrieves Backup: Velero retrieves the Kubernetes object backup and the PV file-system backup from the S3 bucket.
Velero Restores Resources: Velero restores the Kubernetes objects and creates new Persistent Volumes. Velero then restores the file-system data into the new volumes, bringing the application back to its previous state.