This document describes the architecture of the Contain Platform using the
principles of the C4 model. The C4 model helps communicate software
architecture at different levels of abstraction, making it understandable for
various audiences.
We use the formal C4 levels to "zoom in" on the platform's architecture:
Level 1: System Context: Shows how the platform fits into a typical
organization, its users, and its high-level system interactions.
Level 2: Architectural Views: Decomposes the platform into its largest
parts from different perspectives.
Level 3: Components: Zooms into individual sub-systems to show their key
components.
The System Context Overview diagram provides a simplified, high-level view of
how the Contain Platform fits into a typical customer's environment. It shows
the key players and systems involved, treating the platform as a single "black
box" to clarify its main purpose and interactions.
Application Observability Service
Application Observability is an add-on service.
While we utilize an observability system for managing and monitoring the
platform itself, we also provide a fully managed observability service for
customer applications.
Diagram Explanation
Developer: Builds and operates the customer's business applications and
uses the Application Observability Service to improve application
performance and stability.
Customer: Owns and operates business applications and can use the
application observability service to monitor them
Customer Applications: The new, value-creating applications and
services that run on the platform.
Contain Platform: The platform is shown as a set of high-level
systems:
Contain Base: A managed environment for running containerized
applications.
Application Observability: Provides insights into the health and
performance of applications.
Managed Services: Offers a range of services to support
applications.
Contain Operations: Manages, monitors, and maintains the platform.
Contain Platform Engineers: Develops and extends the platform.
This diagram focuses on how a customer's development team interacts with the
Contain Platform and their own systems. It shows the key relationships between
developers, the applications they build, and the platform services they consume.
Diagram Explanation
Developer: Builds and operates business applications on the Contain
Platform.
Customer Applications: New, value-creating applications and services
built by the customer.
Customer Systems: Existing customer systems that the platform
integrates with, such as SIEM, monitoring tools, and other applications.
Customer IdP: The customer's Identity Provider (e.g., Entra ID,
Keycloak) for user authentication.
Contain Platform: The system itself, composed of:
Contain Base: Provides a secure, managed environment for running
applications.
Application Observability: Provides insights into the health and
performance of customer applications.
This diagram shows the system context from our perspective, illustrating how our
internal teams manage and develop the platform. It highlights the separation
between our operational responsibilities and the customer's environment.
Diagram Explanation
Contain Operations: The team responsible for the 24/7 monitoring,
management, and maintenance of the platform's health and security.
Contain Platform Engineer: The team responsible for developing new
features and extending the capabilities of the platform.
Customer Applications: The applications and services built by the
customer, which run on the platform.
Contain Platform: The system itself, composed of:
Contain Base: The managed environment where customer applications
run.
Platform Observability: Our internal system for monitoring the
health of the core platform components.
Application Observability: The service that provides observability
for customer applications.
Infrastructure: The underlying cloud or on-premise resources (e.g.,
servers, networking) where the platform is deployed.
Level 2 diagrams "zoom in" on the Contain Platform to show its high-level
structure. We can view the platform's architecture from different perspectives.
This first view decomposes the platform into its major logical sub-systems,
showing how they work together to provide the platform's capabilities.
Diagram Explanation
Management Plane: The central control point that manages the lifecycle
of and offers services to the clusters.
Workload Plane: One or more Kubernetes clusters where customer
applications are deployed and run.
Observability Plane: Systems that collect and analyze telemetry
(metrics, logs, traces). This includes both our own internal
observability systems and the Application
Observability add-on service.
Managed Services: A suite of managed services like databases and object
storage.
This second view shows the platform's architecture from the perspective of its
physical cluster topology. The platform is composed of several specialized
Kubernetes clusters that work together.
Diagram Explanation
Workload Cluster(s): The clusters where applications run. This runs
the Contain Base service and other managed services.
Application Observability (if enabled): Service that provides
a monitoring stack for your applications.
Management Services: Various services used by the clusters (image
registry, IdP, etc.)
Managed Services: A suite of managed services like databases and object
storage.
This third view shows the platform's architecture from the perspective of its
physical infrastructure in a single datacenter. It illustrates how a workload
cluster is deployed within an isolated network environment on the underlying
servers or virtual machines.
In reality the cluster will span multiple datacenters, Availability Zones or
Availability Cells. For the sake of simplicity, we will focus on a single
datacenter in this view.
Diagram Explanation
Infrastructure Layer: The physical or virtual servers that host the
platform.
Network Segment / Security Group: An isolated network environment for
each cluster, providing security and separation.
Workload Cluster: Decomposed into its control plane and worker nodes.
Control Plane: A set of three nodes for high availability.
Worker Nodes: A minimum of three nodes (usually 6 or more for
production environments) that run the customer applications.
Level 3 diagrams zoom into a specific sub-system to show its internal
components. These are useful for understanding the details of a particular
capability.
This diagram details the major logical systems inside a single Workload
Plane, which is one or more Kubernetes clusters. It shows the Kubernetes
Control Plane, the GitOps controller that manages the cluster state, and where
customer applications run.
Contain Base is our fully managed and
production-ready Kubernetes service that serves as the central orchestration
engine for containerized applications. It's a curated, "batteries-included"
distribution that extends upstream Kubernetes with a set of best-in-class,
open-source components, pre-integrated and hardened by our team.
This diagram shows the key components that are included in every cluster as part
of the Contain Base service.
Diagram Explanation
Automation & Delivery: Using GitOps to manage the cluster state.
Security & Governance: Enforcing policies and managing secrets and
certificates.
Networking: Controlling ingress traffic and managing DNS.
Operations & Resilience: Providing backup/restore capabilities and
resource metrics.
CSI, CNI, CPI: Providing storage, networking, and cloud provider
integration.
The Management Plane provides a set of centralized, shared services that are
used by the workload clusters and the operations team. These services are
essential for the security, automation, and management of the entire platform.
This diagram shows the key services that make up the Management Plane.
Diagram Explanation
Identity Provider (IdP): A centralized service for managing user and
service authentication.
Container Registry: A private registry for storing and distributing
container images, with integrated vulnerability scanning.
Git Server: The source of truth for the GitOps-driven cluster and
application configuration.
DNS Provider: Manages the DNS records for services running in the
workload clusters.
Secret Store: A secure, centralized location for managing secrets and
other sensitive data.
PKI Service: Manages the Public Key Infrastructure, including issuing
and renewing certificates.
Image Scanner: Scans container images for known vulnerabilities.
The platform is built on a GitOps workflow, where Git is the single source of
truth for both application and infrastructure configuration. This diagram
illustrates the end-to-end flow, from a developer pushing code to a new version
of the application running in the cluster.
Diagram Explanation
A developer pushes code changes to a Git repository.
The push triggers a CI/CD pipeline that builds and tests the application,
then pushes a new container image to the registry.
The pipeline updates a configuration file in the Git repository with the
new image tag.
The GitOps controller (Flux), running in the cluster, detects the change
in the repository.
Flux applies the new configuration to the cluster, which then pulls the
new image and deploys the updated application.
The platform provides a centralized observability solution based on the Grafana
LGTM stack (Loki for logs, Grafana for visualization, Tempo for traces, and
Mimir for metrics). When the Application Observability
Service is enabled for a workload cluster, telemetry
agents are deployed to collect and forward data to the central observability
plane.
This diagram illustrates the flow of telemetry data from an instrumented
application to the observability platform, where a developer can analyze it.
Observability Plane
The Observability Plane covers all services related to observability. While
the diagram shows one system, in reality it consists of multiple
observability stacks. For instance, to avoid resource contention, we deploy
our own internal stack for monitoring the platform itself. For more
information, see Observability.
Diagram Explanation
Developer: Instruments their application using OpenTelemetry SDKs and
uses Grafana to view dashboards, logs, traces, and metrics.
Customer Application: The customer's application is instrumented to
generate telemetry data (logs, metrics, traces) using OpenTelemetry which
is either pulled or pushed.
Telemetry Collectors: The telemetry data is collected either by
scraping customer applications or via pushes from customer applications.
Observability Plane: Dedicated clusters hosting the LGTM stack, which
receives, stores, and visualizes the telemetry data.
While direct access to the Kubernetes API can be made available, we encourage
teams to interact with the Contain Platform primarily through the GitOps
workflow for deployments and the observability platform for monitoring. This
approach provides a more secure, auditable, and collaborative environment.
Direct access may not always be available, depending on the specific security
and compliance requirements of the organization.
When direct access is required, the platform authenticates users via our Managed
Identity Provider (IdP), which can optionally federate with the organization's
existing IdP (like Entra ID or Okta).
This diagram shows a simplified, high-level view of the authentication flow from
a user's perspective.
Diagram Explanation
User Authenticates to Cluster: The user initiates an action against
the Workload Cluster (e.g., via kubectl or a UI).
Cluster Redirects to Managed IdP: The cluster redirects the user's
authentication request to our Managed IdP.
Managed IdP Federates (Optional): If configured, our IdP federates
with the customer's own IdP, allowing the user to authenticate with their
corporate credentials.
User Receives Token: After successful authentication, the user
receives a token that the client can use.
User Accesses Cluster: The user's client uses the token to securely
access the Workload Cluster.
Once a user is authenticated, the platform uses Kubernetes Role-Based Access
Control (RBAC) to determine what actions they are allowed to perform.
Authorization is based on the group claims present in the user's ID Token (JWT),
which is provided by the Identity Provider. These groups are then mapped to
Kubernetes Roles or ClusterRoles via RoleBindings or
ClusterRoleBindings.
This diagram illustrates how the Kubernetes API server uses the information in a
user's ID token to make an authorization decision.
Diagram Explanation
User Makes Request: An authenticated user makes a request to the
Kubernetes API, presenting their ID Token.
API Checks RBAC Policies: The API server extracts the user's identity
and group membership from the token. It then checks the stored RBAC
policies to find any roles bound to that user or their groups.
Allow or Deny: Based on the permissions defined in the associated
roles, the API server either allows or denies the user's request.
By default, the platform's backup and recovery solution covers the
Kubernetes resources and configuration within your cluster. However, it does
not automatically back up the data stored within your application's
Persistent Volumes (PVs). To ensure your application data is protected, you
must explicitly use our managed backup service for your stateful
applications. Read more about this here.
The platform provides a robust backup and recovery solution using Velero, an
open-source tool for safely backing up and restoring resources in a Kubernetes
cluster. Velero runs in each workload cluster and is configured to store backups
in a dedicated, S3-compatible object storage bucket located in the Management
Plane.
This diagram illustrates the high-level backup and restore flow.
Diagram Explanation
Backup Flow
Backup Initiated: An automated backup schedule or a manual trigger
initiates a backup job.
Velero Queries API: Velero queries the Kubernetes API server to get a
list of the resources to be backed up.
Velero Backs Up Resources: Velero creates a backup file of the
Kubernetes objects (Deployments, Services, etc.).
Velero Backs Up PVs: For any Persistent Volumes (PVs), Velero
performs a file-system level backup of the data within the volume.
Backup Stored in S3: The Kubernetes object backup and the PV
file-system backup are stored in the S3-compatible object storage bucket
in the Management Plane.
Restore Flow
Restore Initiated: An operator initiates a restore job, specifying
the backup to restore from.
Velero Retrieves Backup: Velero retrieves the Kubernetes object
backup and the PV file-system backup from the S3 bucket.
Velero Restores Resources: Velero restores the Kubernetes objects and
creates new Persistent Volumes. Velero then restores the file-system data
into the new volumes, bringing the application back to its previous
state.