Observability¶
In modern distributed systems, failures are inevitable. When something goes wrong, the key to a swift recovery is not just monitoring for known failure modes but having the ability to ask questions about your system's behavior. This is the essence of observability.
The Contain Platform is built on a robust observability foundation designed to provide deep insights into the health and performance of both the platform itself and the applications running on it. This foundation is key to providing a stable, secure, and reliable environment for your applications.
The Three Pillars of Observability¶
Through our managed services, the Contain Platform provides a complete picture of your system's behavior by delivering on the three pillars of observability.
-
Metrics: Numerical measurements aggregated over time (e.g., CPU usage, request rates). Metrics are ideal for dashboards, trend analysis, and triggering alerts.
-
Logs: Timestamped text records of discrete events. Logs are invaluable for debugging, providing detailed, event-specific context.
-
Traces: A trace represents the end-to-end journey of a single request as it moves through the services in your application. Traces are essential for pinpointing bottlenecks in a microservices architecture.
A Dual-Plane Architecture¶
To ensure security, stability, and a clear separation of concerns, our observability architecture is split into two distinct, isolated planes: the Application Observability Plane and the Platform Observability Plane. We refer to these planes collectively as the Observability Plane.
The Application Observability Plane¶
The Application Observability Plane is the dedicated, managed service that you, our customer, can use. When you subscribe to our Application Observability Service, we provision a dedicated, isolated observability environment for you.
This plane is where you can send the telemetry (metrics, logs, and traces) from your own applications. It includes your own dedicated Grafana instance, giving your teams a single pane of glass to visualize, query, and alert on the performance of your own services. Because this environment is fully isolated, your data is always secure from other tenants, and your usage can never impact our operational monitoring (or vice-versa).
Paid Service
The Application Observability Plane is a paid, add-on service. The Platform Observability Plane is part of our core platform management and is included with all deployments.
The Platform Observability Plane¶
The Platform Observability Plane is our internal, centralized system used exclusively by our operations teams. It collects health and performance telemetry from the underlying infrastructure and core platform components across all clusters.
This plane is what allows us to meet our operational responsibilities. It powers the alerting and dashboards our teams use 24/7 to ensure the health, security, and reliability of the platform. You do not have direct access to this system, but you benefit from the stability it enables.
Key Benefits of Our Solution¶
By providing a managed, "batteries-included" observability stack, we help you:
- Troubleshoot Faster: Quickly move from detecting a problem (a metric alert) to understanding its context (related logs) and root cause (the corresponding trace) in a single, unified interface.
- Improve System Reliability: Set up proactive alerts on your application's key performance indicators to detect and address issues before they impact your users.
- Optimize Application Performance: Identify and resolve performance bottlenecks in your distributed services by analyzing trace data.
- Reduce Operational Toil: We handle the entire lifecycle of the observability stack—the scaling, maintenance, and upgrades—so your teams can focus on instrumenting your applications and gaining insights from the data.
Getting Started¶
Platform-level telemetry is collected and managed by us automatically. To gain insights into your own applications, subscribe to the Application Observability Service.