Developer Troubleshooting Guide¶
This guide provides a structured approach to troubleshooting common issues you might encounter when deploying and running applications on the Contain Platform. By following these steps, you can quickly diagnose and solve most problems yourself.
The recommended approach is to troubleshoot "top-down": start with the deployment system (GitOps) and work your way down to your specific application pods and their network connections.
Access Levels and Tooling
This guide provides troubleshooting steps using kubectl and flux for
users with direct command-line access to the cluster.
However, for security reasons, you may not have direct access to the Kubernetes API. In this case, your primary troubleshooting tools will be the dashboards and log viewers available in your dedicated Grafana instance. Where applicable, this guide will include notes on how to find similar information in Grafana.
Recommended Tool: k9s
While this guide uses kubectl for all examples, we highly recommend using
k9s, a powerful terminal-based UI for Kubernetes.
It provides an interactive and much more efficient way to perform many of
the checks described below.
Step 1: Is GitOps Healthy?¶
Every change to your application starts in Git. If the GitOps controller (Flux) is unable to apply the state from your repository, your changes will never reach the cluster.
Check the Git Source¶
First, verify that Flux can connect to and read from your Git repository.
# Check all Git sources managed by Flux
flux get sources git -A
# Check a specific source in the flux-system namespace
flux get source git flux-system -n netic-gitops-system
You should see True under the READY column and a message showing the latest
commit hash.
- Common Problems
READY: False: This often indicates an issue with authentication. Verify that the deploy key or credentials Flux is using are correct and have not expired.- Error Message: The
STATUScolumn will provide a detailed error, such as "authentication failed" or "could not resolve host".
Check the Kustomization¶
Next, check if Flux was able to apply the manifests from your repository path to the cluster.
# Check all Kustomizations managed by Flux
flux get kustomizations -A
# Check a specific application's Kustomization
flux get kustomization hello-app -n netic-gitops-system
You should see True under the READY column.
- Common Problems
READY: False: This means Flux failed to apply the manifests. TheSTATUScolumn will provide the error.- "path not found": The
pathspecified in yourKustomizationresource does not exist in the Git repository. - "invalid YAML": One of your Kubernetes manifest files has a syntax error.
- "admission request denied": Your manifests violate a security policy. The error message will typically point to the specific policy rule.
Forcing a Reconciliation¶
If you want to trigger an immediate sync without waiting for the next interval, you can force Flux to reconcile.
# Reconcile the Git source
flux reconcile source git netic-gitops-system
# Reconcile a specific Kustomization
flux reconcile kustomization hello-app
Step 2: Is My Application Deployed Correctly?¶
If Flux is healthy, the next step is to check the Kubernetes Deployment
resource. This resource is responsible for managing your application's Pods.
Troubleshooting in Grafana
If you do not have kubectl access, you can find information about your
Deployments and Pods in your Grafana instance. Look for a dashboard related
to Kubernetes workload or namespace health, which will typically show the
status of your deployments and pods.
Look at the output columns:
READY: Shows how many replicas are ready vs. the desired number (e.g.,2/2). If this is not at the desired state, there's a problem with your pods.UP-TO-DATE: The number of replicas that have been updated to the latest version.AVAILABLE: The number of replicas that are available to serve traffic.
If the READY count is not what you expect, use describe to get more details
and events.
Scroll to the bottom to see the Events section. This will often tell you
exactly why a deployment is failing (e.g., it couldn't pull the container
image).
Step 3: Are My Pods Running? (Troubleshooting Pods)¶
If your Deployment is not healthy, the problem is almost always with the
Pods it's trying to create.
Look at the STATUS column for any pods that are not Running.
- Common Pod Statuses
Pending: The pod has been accepted by Kubernetes, but it can't be scheduled to run on a node yet.- Reason: Often due to insufficient resources (CPU or memory). Use
kubectl describe pod <pod-name>to see the exact reason in theEventssection. ImagePullBackOff: Kubernetes could not pull the container image.- Reasons: A typo in the image name or tag in your
Deployment.yaml, the image does not exist, or the cluster does not have the correct credentials to pull from your private registry. Check theEventsinkubectl describe pod ...for the specific error. CrashLoopBackOff: The container is starting, crashing, and then being restarted by Kubernetes in a loop.- Reason: This is an application-level problem. Your application is exiting with an error shortly after it starts. The next step is to check the logs.
Checking Pod Events¶
If the pod status is not Running and the reason isn't immediately obvious, the
next place to look is the event stream for your namespace. Events provide a
chronological log of what the Kubernetes system is doing, such as scheduling
decisions, pulling images, or failing health checks.
The easiest way to view events is with k9s, sorted by the last time the event
occurred (Shift + L).
Alternatively, you can use kubectl to get a sorted list of events:
Step 4: What Are My Pods Saying? (Inspecting Logs)¶
If your pod is in a CrashLoopBackOff state or is Running but not behaving
correctly, the logs are the best place to find the root cause.
Viewing Logs in Grafana
If you do not have kubectl access, you can view your application's logs
directly within your Grafana instance. Use the "Explore" feature and select
the "Loki" data source to query logs for your specific application and
namespace.
# Get the logs from a running pod
kubectl logs <pod-name> -n hello-app
# Stream the logs in real-time
kubectl logs -f <pod-name> -n hello-app
If your pod is in CrashLoopBackOff, the container might be crashing too
quickly to view the logs. Use the -p or --previous flag to get the logs from
the last time the container terminated. This is often where the critical error
message is found.
Advanced Loki Searching in Grafana
When searching for logs in Grafana (Loki), your initial results are just raw
text. To make them more searchable, you can use the unpack filter, which
will parse structured log lines (like JSON) and extract key-value pairs that
you can then filter on.
Start with a basic unpack:
If your logs are in JSON format, you can chain a json filter to color-code
and properly parse them:
Keep in mind that stack traces are often broken into individual log lines, which can affect how you search for errors.
Step 5: Can My Pods Communicate? (Network Troubleshooting)¶
A common issue in multi-service applications is a lack of network connectivity
between pods. On the Contain Platform, this is almost always due to a missing
NetworkPolicy.
Least Privilege Networking by Default¶
By default, our platform uses a default-deny network model. This means
that all traffic between pods is denied unless it is explicitly allowed
by a NetworkPolicy resource.
If your application is trying to connect to another service (like a database or
another microservice) and is timing out, a missing NetworkPolicy is the most
likely cause.
Checking Network Policies¶
You can see which policies are active in a namespace with get.
Allowing Traffic¶
To allow traffic, you need to create a NetworkPolicy that selects your pod and
defines the allowed ingress (incoming) or egress (outgoing) traffic.
For example, this policy allows ingress traffic to your hello-app pods from
pods in the ingress-system namespace (where the ingress controller runs).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-to-hello-app
namespace: hello-app
spec:
# This policy applies to pods with the label 'app: hello-app'
podSelector:
matchLabels:
app: hello-app
policyTypes:
- Ingress
ingress:
- from:
# Allow traffic from any pod in a namespace with this label
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-system
To debug network issues, you can use a debug container with tools like curl or
ping to test connectivity from inside your application's pod.
Advanced Troubleshooting Techniques¶
The steps above will solve the majority of common deployment issues. However, sometimes you need to dig deeper. The following techniques are for more complex or unusual problems.
Verifying External Secrets¶
If your application depends on secrets from the central OpenBao vault but is
failing to start with errors related to missing credentials, you should verify
that the ExternalSecret resource is working correctly.
Check the status of all external secrets in your namespace:
Look for aSTATUS that indicates the secret has been successfully synced. If you
see an error, get more details with describe:
The Events section will often tell you if there was an issue connecting to the
vault or if the specified secret path could not be found.
Attaching a Debug Container¶
Sometimes, logs are not enough and you need an interactive shell inside your application's running environment to test network connections, inspect the filesystem, or run diagnostic tools. The most powerful way to do this is by attaching an ephemeral debug container to your existing pod.
Permissions Required
This feature requires edit access to the namespace, which is typically
only granted on sandbox or development clusters.
First, verify that you have the required permissions:
If the command returns yes, you can attach a debug container to your running
pod using the kubectl debug command. This command will start a new container
within your existing pod, sharing the same network and process namespaces.
# Find the latest debug image tag here:
# https://github.com/neticdk/kubernetes-debug-image/pkgs/container/kubernetes-debug-image
kubectl debug -n <NAMESPACE> -it \
--image=ghcr.io/neticdk/kubernetes-debug-image:v0.0.13 \
--profile=restricted \
<POD-NAME>
curl, ping, netcat, and dig.