Developer Troubleshooting Guide¶

This guide provides a structured approach to troubleshooting common issues you might encounter when deploying and running applications on the Contain Platform. By following these steps, you can quickly diagnose and solve most problems yourself.

The recommended approach is to troubleshoot "top-down": start with the deployment system (GitOps) and work your way down to your specific application pods and their network connections.

Access Levels and Tooling

This guide provides troubleshooting steps using kubectl and flux for users with direct command-line access to the cluster.

However, for security reasons, you may not have direct access to the Kubernetes API. In this case, your primary troubleshooting tools will be the dashboards and log viewers available in your dedicated Grafana instance. Where applicable, this guide will include notes on how to find similar information in Grafana.

Recommended Tool: k9s

While this guide uses kubectl for all examples, we highly recommend using k9s, a powerful terminal-based UI for Kubernetes. It provides an interactive and much more efficient way to perform many of the checks described below.

Step 1: Is GitOps Healthy?¶

Every change to your application starts in Git. If the GitOps controller (Flux) is unable to apply the state from your repository, your changes will never reach the cluster.

Check the Git Source¶

First, verify that Flux can connect to and read from your Git repository.

# Check all Git sources managed by Flux
flux get sources git -A

# Check a specific source in the flux-system namespace
flux get source git flux-system -n netic-gitops-system

You should see True under the READY column and a message showing the latest commit hash.

Common Problems: READY: False: This often indicates an issue with authentication. Verify that the deploy key or credentials Flux is using are correct and have not expired.; Error Message: The STATUS column will provide a detailed error, such as "authentication failed" or "could not resolve host".

Check the Kustomization¶

Next, check if Flux was able to apply the manifests from your repository path to the cluster.

# Check all Kustomizations managed by Flux
flux get kustomizations -A

# Check a specific application's Kustomization
flux get kustomization hello-app -n netic-gitops-system

You should see True under the READY column.

Common Problems: READY: False: This means Flux failed to apply the manifests. The STATUS column will provide the error.; "path not found": The path specified in your Kustomization resource does not exist in the Git repository.; "invalid YAML": One of your Kubernetes manifest files has a syntax error.; "admission request denied": Your manifests violate a security policy. The error message will typically point to the specific policy rule.

Forcing a Reconciliation¶

If you want to trigger an immediate sync without waiting for the next interval, you can force Flux to reconcile.

# Reconcile the Git source
flux reconcile source git netic-gitops-system

# Reconcile a specific Kustomization
flux reconcile kustomization hello-app

Step 2: Is My Application Deployed Correctly?¶

If Flux is healthy, the next step is to check the Kubernetes Deployment resource. This resource is responsible for managing your application's Pods.

Troubleshooting in Grafana

If you do not have kubectl access, you can find information about your Deployments and Pods in your Grafana instance. Look for a dashboard related to Kubernetes workload or namespace health, which will typically show the status of your deployments and pods.

# Check deployments in a specific namespace
kubectl get deployments -n hello-app

Look at the output columns:

READY: Shows how many replicas are ready vs. the desired number (e.g., 2/2). If this is not at the desired state, there's a problem with your pods.
UP-TO-DATE: The number of replicas that have been updated to the latest version.
AVAILABLE: The number of replicas that are available to serve traffic.

If the READY count is not what you expect, use describe to get more details and events.

kubectl describe deployment hello-app -n hello-app

Scroll to the bottom to see the Events section. This will often tell you exactly why a deployment is failing (e.g., it couldn't pull the container image).

Step 3: Are My Pods Running? (Troubleshooting Pods)¶

If your Deployment is not healthy, the problem is almost always with the Pods it's trying to create.

kubectl get pods -n hello-app

Look at the STATUS column for any pods that are not Running.

Common Pod Statuses: Pending: The pod has been accepted by Kubernetes, but it can't be scheduled to run on a node yet.; Reason: Often due to insufficient resources (CPU or memory). Use kubectl describe pod <pod-name> to see the exact reason in the Events section.; ImagePullBackOff: Kubernetes could not pull the container image.; Reasons: A typo in the image name or tag in your Deployment.yaml, the image does not exist, or the cluster does not have the correct credentials to pull from your private registry. Check the Events in kubectl describe pod ... for the specific error.; CrashLoopBackOff: The container is starting, crashing, and then being restarted by Kubernetes in a loop.; Reason: This is an application-level problem. Your application is exiting with an error shortly after it starts. The next step is to check the logs.

Checking Pod Events¶

If the pod status is not Running and the reason isn't immediately obvious, the next place to look is the event stream for your namespace. Events provide a chronological log of what the Kubernetes system is doing, such as scheduling decisions, pulling images, or failing health checks.

The easiest way to view events is with k9s, sorted by the last time the event occurred (Shift + L).

k9s -n <NAMESPACE> -c events

Alternatively, you can use kubectl to get a sorted list of events:

kubectl get events -n <NAMESPACE> -o wide --sort-by='.lastTimestamp'

Step 4: What Are My Pods Saying? (Inspecting Logs)¶

If your pod is in a CrashLoopBackOff state or is Running but not behaving correctly, the logs are the best place to find the root cause.

Viewing Logs in Grafana

If you do not have kubectl access, you can view your application's logs directly within your Grafana instance. Use the "Explore" feature and select the "Loki" data source to query logs for your specific application and namespace.

# Get the logs from a running pod
kubectl logs <pod-name> -n hello-app

# Stream the logs in real-time
kubectl logs -f <pod-name> -n hello-app

If your pod is in CrashLoopBackOff, the container might be crashing too quickly to view the logs. Use the -p or --previous flag to get the logs from the last time the container terminated. This is often where the critical error message is found.

# Get logs from the PREVIOUS, terminated container
kubectl logs -p <pod-name> -n hello-app

Advanced Loki Searching in Grafana

When searching for logs in Grafana (Loki), your initial results are just raw text. To make them more searchable, you can use the unpack filter, which will parse structured log lines (like JSON) and extract key-value pairs that you can then filter on.

Start with a basic unpack:

{cluster_id="<cluster_id>"} |= `` | unpack

If your logs are in JSON format, you can chain a json filter to color-code and properly parse them:

{cluster_id="<cluster_id>"} |= `` | unpack | json

Keep in mind that stack traces are often broken into individual log lines, which can affect how you search for errors.

Step 5: Can My Pods Communicate? (Network Troubleshooting)¶

A common issue in multi-service applications is a lack of network connectivity between pods. On the Contain Platform, this is almost always due to a missing NetworkPolicy.

Least Privilege Networking by Default¶

By default, our platform uses a default-deny network model. This means that all traffic between pods is denied unless it is explicitly allowed by a NetworkPolicy resource.

If your application is trying to connect to another service (like a database or another microservice) and is timing out, a missing NetworkPolicy is the most likely cause.

Checking Network Policies¶

You can see which policies are active in a namespace with get.

kubectl get networkpolicy -n hello-app

Allowing Traffic¶

To allow traffic, you need to create a NetworkPolicy that selects your pod and defines the allowed ingress (incoming) or egress (outgoing) traffic.

For example, this policy allows ingress traffic to your hello-app pods from pods in the ingress-system namespace (where the ingress controller runs).

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-to-hello-app
  namespace: hello-app
spec:
  # This policy applies to pods with the label 'app: hello-app'
  podSelector:
    matchLabels:
      app: hello-app
  policyTypes:
  - Ingress
  ingress:
  - from:
    # Allow traffic from any pod in a namespace with this label
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-system

To debug network issues, you can use a debug container with tools like curl or ping to test connectivity from inside your application's pod.

Advanced Troubleshooting Techniques¶

The steps above will solve the majority of common deployment issues. However, sometimes you need to dig deeper. The following techniques are for more complex or unusual problems.

Verifying External Secrets¶

If your application depends on secrets from the central OpenBao vault but is failing to start with errors related to missing credentials, you should verify that the ExternalSecret resource is working correctly.

Check the status of all external secrets in your namespace:

kubectl get externalsecret -n <NAMESPACE>

Look for a STATUS that indicates the secret has been successfully synced. If you see an error, get more details with describe:

kubectl describe externalsecret <EXTERNAL-SECRET-NAME> -n <NAMESPACE>

The Events section will often tell you if there was an issue connecting to the vault or if the specified secret path could not be found.

Attaching a Debug Container¶

Sometimes, logs are not enough and you need an interactive shell inside your application's running environment to test network connections, inspect the filesystem, or run diagnostic tools. The most powerful way to do this is by attaching an ephemeral debug container to your existing pod.

Permissions Required

This feature requires edit access to the namespace, which is typically only granted on sandbox or development clusters.

First, verify that you have the required permissions:

kubectl auth can-i create pods/ephemeralcontainers -n <NAMESPACE>

If the command returns yes, you can attach a debug container to your running pod using the kubectl debug command. This command will start a new container within your existing pod, sharing the same network and process namespaces.

# Find the latest debug image tag here:
# https://github.com/neticdk/kubernetes-debug-image/pkgs/container/kubernetes-debug-image

kubectl debug -n <NAMESPACE> -it \
  --image=ghcr.io/neticdk/kubernetes-debug-image:v0.0.13 \
  --profile=restricted \
  <POD-NAME>

This will give you a shell inside your pod's environment, equipped with common debugging tools like curl, ping, netcat, and dig.