GKE CrashLoopBackOff: Logs, Exit Codes, Causes, and Fixes
CrashLoopBackOff is not the error. It is Kubernetes telling you that a container keeps crashing and it is spacing out the restart attempts to avoid hammering a failing process. The actual cause is always something else: a missing environment variable, an out-of-memory kill, a liveness probe that fires too early, or any number of application-level failures. Your job is to find that underlying cause and fix it. This page gives you a clear, step-by-step workflow to do exactly that, starting with reading the right logs and exit codes, then working through each common root cause with the exact commands and YAML fixes you need.
If you are new to Kubernetes or GKE, do not panic when you see CrashLoopBackOff. The name sounds alarming, but it is simply a restart state. Every CrashLoopBackOff has a concrete, fixable cause. By the end of this page, you will know how to find it in under five minutes.
CrashLoopBackOff in simple terms
Think of CrashLoopBackOff like a car that will not start. You turn the key, the engine tries to start, then stalls. You wait a moment and try again. Each time it fails, you wait a little longer before the next attempt. Kubernetes does the same thing with your container: it starts the container, the container crashes, and Kubernetes waits before trying again. That waiting period is the “backoff.”
When you run kubectl get pods and see a pod in CrashLoopBackOff, you are seeing
two important pieces of information:
- STATUS: CrashLoopBackOff means the container has crashed and Kubernetes is waiting before the next restart attempt
- RESTARTS: N tells you how many times the container has been restarted so far (a high number means the crash has been happening for a while)
The container is not currently running. It is sitting in a wait period. That is why
kubectl logs POD_NAME on its own often shows nothing useful because the current
container instance has not started yet. You need —previous to see the logs from
the last crash.
CrashLoopBackOff is the symptom, not the disease. Think of it like a fever: it tells you something is wrong, but not what is wrong. The exit code and the previous logs are the actual diagnosis.
How CrashLoopBackOff works
Exponential backoff timing
After each crash, Kubernetes applies exponential backoff before restarting the container. The timing works like this:
- First crash: wait 10 seconds before restart
- Second crash: wait 20 seconds
- Third crash: wait 40 seconds
- Fourth crash: wait 80 seconds
- Fifth crash and beyond: wait 5 minutes (the cap)
Once the backoff reaches 5 minutes, it stays at 5 minutes for every subsequent restart attempt. Kubernetes will keep trying indefinitely. It never gives up. If you fix the underlying problem and the container starts successfully, the restart counter eventually resets.
Restart policy
The pod’s restartPolicy controls whether Kubernetes restarts a crashed container.
Most workloads use Always (the default for Deployments and StatefulSets), which
means Kubernetes always restarts a failed container. Jobs and CronJobs typically use
OnFailure or Never. If your container exits cleanly (exit code 0)
and the restart policy is Always, Kubernetes still restarts it. This can look
like CrashLoopBackOff if a liveness probe then fails on the restarted container.
Why kubectl logs —previous matters
During the backoff window, the current container has not started yet. Running
kubectl logs POD_NAME returns nothing or an error. The flag
—previous tells kubectl to fetch logs from the last terminated
container, the one that actually crashed. This is nearly always where the root
cause is visible: an exception stack trace, a “file not found” error, a “connection refused”
message, or a segfault.
Where to find the exit code and reason
Run kubectl describe pod POD_NAME and look for the Last State
section under the container entry. It shows:
- Reason: a human-readable label like
OOMKilled,Error, orCompleted - Exit Code: the numeric code the process returned (0, 1, 137, 143, etc.)
- Started / Finished: timestamps showing how long the container ran before it crashed
The exit code and reason together are your first diagnostic data point. A container that ran for 0 seconds before crashing has a different problem than one that ran for 30 minutes. The exit code narrows the cause category. The pod status (CrashLoopBackOff) tells you nothing about why. The Last State section does.
If Started and Finished timestamps are identical (the container ran for 0 seconds), the process crashed at startup. If the container ran for minutes or hours before crashing, look for memory leaks, connection pool exhaustion, or timeout-driven failures instead of configuration errors.
When to use this guide
This page helps when you see any of these situations:
- Pod status shows
CrashLoopBackOffinkubectl get pods - The RESTARTS column keeps increasing
- Your application starts but immediately exits
- The pod runs for a few seconds then gets killed
- Events show “Back-off restarting failed container”
- A liveness or readiness probe keeps killing your container
- You deployed a new image and the pod will not stabilise
If the pod status shows ImagePullBackOff instead of CrashLoopBackOff,
the container image cannot be pulled. That is a different problem entirely. See the
ImagePullBackOff section below for the fix.
If the pod status shows Pending and never transitions to Running or
CrashLoopBackOff, the issue is scheduling. The cluster cannot find a node with
enough resources. That is outside the scope of this guide.
Fast triage: 5-minute workflow
Follow these five steps in order. By step 5, you will know which category the crash falls into and can jump to the matching fix section below.
This workflow is like diagnosing a car that will not start. Step 1: check the dashboard warning lights (pod status). Step 2: pop the bonnet and look at the engine (describe). Step 3: check the error log from the last drive (previous logs). Step 4: ask a mechanic what they noticed (events). Step 5: decide whether it is a fuel, electrical, or engine problem (classify).
Step 1: List the pods and confirm the status
kubectl get pods -n NAMESPACELook at the STATUS and RESTARTS columns. A high restart count means the crash has been happening for a while. Note the exact pod name for the next steps.
Step 2: Describe the pod
kubectl describe pod POD_NAME -n NAMESPACEFocus on three sections in the output:
- Last State (under each container): the exit code and reason from the last crash
- Events (at the bottom): Kubernetes-level events like image pull errors, probe failures, and OOM kills
- Containers, Restart Count: confirms which container is crashing in multi-container pods
Step 3: Read the previous logs
# Single-container pod
kubectl logs POD_NAME -n NAMESPACE --previous
# Multi-container pod: specify the crashing container
kubectl logs POD_NAME -n NAMESPACE -c CONTAINER_NAME --previousThis is almost always where you find the actual error. Look for exception stack traces, “connection refused” errors, “file not found” messages, or “permission denied” lines.
Step 4: Check recent events
kubectl get events -n NAMESPACE --sort-by='.lastTimestamp'Events show cluster-level context that logs miss: failed image pulls, failed scheduling attempts, node pressure events, and probe failures.
Step 5: Classify the root cause
Based on steps 2–4, the crash falls into one of these categories:
- Application crash (exit code 1). The logs show an unhandled exception, missing config, or startup failure. Jump to application crashes.
- OOMKilled (exit code 137). The describe output shows
Reason: OOMKilled. Jump to OOMKilled fixes. - Probe failure. Events show “Liveness probe failed” or “Readiness probe failed”. Jump to probe failures.
- Image issue. Events show “Failed to pull image” or “ImagePullBackOff”. Jump to image issues.
- Dependency failure. Logs show “connection refused,” “timeout,” or “DNS resolution failed”. Jump to dependency failures.
- Graceful shutdown failure (exit code 143). Container received SIGTERM but did not handle it before SIGKILL. Jump to SIGTERM handling.
Reading exit codes
The exit code from the previous container run is your first diagnostic data point. You can
find it in the kubectl describe pod output under Last State, Exit
Code.
Exit code 0. The container exited successfully. It did not crash. If the pod still shows CrashLoopBackOff, either the
restartPolicyisAlwaysand the container is being restarted after a clean exit, or a liveness probe is failing and forcing a restart even though the app completed normally.Exit code 1. Application error. An uncaught exception, a missing configuration file, a failed database connection on startup, or any unhandled error that caused the process to call
exit(1). Read—previouslogs for the specific error.Exit code 126. The command specified in the container entrypoint was found but is not executable. Common when a shell script is missing the executable bit or the binary format does not match the container architecture (e.g. running an ARM image on an AMD64 node).
Exit code 127. The command was not found. The entrypoint or command in the Dockerfile or pod spec refers to a binary that does not exist in the container image. Check the image contents with
docker run —rm -it IMAGE sh.Exit code 137 (OOMKilled). The container exceeded its memory limit and was killed by the Linux kernel OOM killer, or it received SIGKILL from Kubernetes (e.g. after a failed liveness probe exceeded the grace period). Check the
Reasonfield:OOMKilledconfirms a memory issue.Exit code 143. The container received SIGTERM and shut down. This is normal during rolling updates, node drains, and scale-down events. If it appears as CrashLoopBackOff, the container may not be handling SIGTERM gracefully and is being killed by SIGKILL after the termination grace period expires.
Exit code 255. A runtime crash in the container process, often a segmentation fault or a language runtime error (e.g. a Go panic without a recover, or a JVM native crash).
Exit codes above 128 indicate the process was killed by a signal. The signal number is the exit code minus 128. Exit code 137 = 128 + 9 (SIGKILL). Exit code 143 = 128 + 15 (SIGTERM). This formula helps you identify unexpected signals quickly.
Fixing application crashes (exit code 1)
Exit code 1 means the application itself crashed during startup or shortly after. The previous logs contain the specific error. Start here:
kubectl logs POD_NAME -n NAMESPACE --previousMissing environment variables
The application reads a required environment variable that is not set in the pod spec. The logs typically show “undefined,” “KeyError,” “env var X not set,” or a NullPointerException when the code tries to use the value.
Fix by adding the variable to the container spec:
containers:
- name: my-app
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: APP_ENV
valueFrom:
configMapKeyRef:
name: app-config
key: environmentCheck which environment variables the pod currently has:
kubectl exec POD_NAME -n NAMESPACE -- envFailed database connection
The app tries to connect to a database on startup and fails because the hostname is wrong, the Cloud SQL Auth Proxy sidecar has not started yet, or credentials are incorrect. If the sidecar starts after the main container, the connection attempt fails before the proxy is ready.
See Cloud SQL Connection Refused for the full diagnosis. For sidecar ordering, consider adding a startup probe or an init container that waits for the database port to become available.
Missing files or configuration
A mounted ConfigMap, Secret, or volume is missing or has the wrong path. The logs show “file not found” or “no such file or directory.” Verify volumes are mounted correctly:
# Check volume mounts in the pod spec
kubectl describe pod POD_NAME -n NAMESPACE | grep -A 5 "Mounts:"
# Verify the file exists inside the container (if it is briefly running)
kubectl exec POD_NAME -n NAMESPACE -- ls -la /path/to/config/Do not delete and recreate the pod to “start fresh.” Deleting the pod erases the previous logs and restart history you need for diagnosis. Investigate first, then fix the Deployment or StatefulSet spec. Kubernetes will roll out new pods automatically.
Permission errors calling GCP APIs
GKE pods use Workload Identity or the node service account for GCP API calls. A missing IAM binding causes “permission denied” or “forbidden” errors that crash the application on startup. Check which service account the pod is using:
# Check the Kubernetes service account
kubectl get pod POD_NAME -n NAMESPACE -o jsonpath='{.spec.serviceAccountName}'
# Check whether Workload Identity is configured on the KSA
kubectl describe serviceaccount KSA_NAME -n NAMESPACEIf the annotation iam.gke.io/gcp-service-account is missing, the pod falls
back to the node service account. See
Permission Denied Errors
for the full fix.
Fixing OOMKilled (exit code 137)
OOMKilled means the container exceeded its memory limit and the Linux kernel killed it. This is one of the most common causes of CrashLoopBackOff on GKE.
Confirm the OOM kill
# Check the termination reason
kubectl describe pod POD_NAME -n NAMESPACE | grep -A 3 "Last State:"
# Check actual memory usage across pods
kubectl top pods -n NAMESPACEIf the Reason field shows OOMKilled, the fix is to increase the
memory limit, reduce the application’s memory consumption, or both.
Set appropriate resource limits
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-image:latest
resources:
requests:
memory: "256Mi" # Minimum reserved for the container
cpu: "100m"
limits:
memory: "512Mi" # Hard ceiling. OOMKilled if exceeded
cpu: "500m"# Apply the updated deployment
kubectl apply -f deployment.yaml
# Or patch directly without editing the file
kubectl set resources deployment my-app \
-c my-app \
--limits=memory=512Mi,cpu=500m \
--requests=memory=256Mi,cpu=100mThink of requests and limits like booking a hotel room. The request is the room you reserved: it is guaranteed to be there when you arrive. The limit is the maximum room size the hotel will allow. If you try to move into a suite that exceeds your booking class, the hotel kicks you out (OOMKilled).
Sizing guidelines
Requests set the guaranteed minimum. The scheduler uses requests to decide which node to place the pod on.
Limits set the hard ceiling. The container is killed if it exceeds the memory limit.
Set memory limits to at least 1.5–2× the typical peak usage to give headroom for load spikes, garbage collection bursts, and JVM metaspace growth.
If the OOM kills happen gradually (the container runs for minutes before being killed), suspect a memory leak. Profile the application or check for unbounded caches.
On GKE Standard clusters, Metrics Server is installed automatically, so
kubectl top pods works immediately. On
Autopilot
clusters, Kubernetes manages resource limits automatically based on the requests you
set. If you are on Autopilot and see OOMKilled, increase the resource requests. Autopilot
adjusts limits accordingly.
Liveness and readiness probe failures
Kubernetes kills a container if its liveness probe fails consecutively. This can cause CrashLoopBackOff even when the application itself is healthy but temporarily slow or still starting up. Readiness probe failures do not kill the container, but they remove it from the Service, which can cause cascading failures if all replicas become unready simultaneously.
Detect probe failures
# Check for probe failure events
kubectl describe pod POD_NAME -n NAMESPACE | grep -E "(Liveness|Readiness|Startup) probe"
# Check current probe configuration
kubectl get pod POD_NAME -n NAMESPACE -o jsonpath='{.spec.containers[0].livenessProbe}' | python -m json.toolIf you see “Liveness probe failed” in the events, the probe is killing the container. The application may be perfectly healthy but too slow to respond within the probe timeout.
Common probe mistakes
- Probe fires before the app is ready. The
initialDelaySecondsis shorter than the application’s startup time. The probe runs, gets no response, and kills the container repeatedly. - Timeout too short. The probe
timeoutSecondsis 1 second, but the health endpoint takes 2 seconds under load. The probe fails even though the app is running. - Wrong port or path. The probe targets port 8080 but the app listens on 3000, or the probe path is
/healthbut the app serves/healthz. - failureThreshold: 1. A single slow response kills the container. Always set
failureThresholdto at least 3.
Liveness probe failures look identical to application crashes in kubectl get pods.
Both show CrashLoopBackOff with a rising restart count. The only way to tell them apart is
kubectl describe pod: look for “Liveness probe failed” in the Events section.
If you see it, the probe is killing a healthy container. Do not start debugging your
application code until you have ruled out probes.
Fix the probe configuration
# Use a startupProbe for slow-starting applications
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # 30 attempts × 10s = 5 minutes to start
periodSeconds: 10
# The livenessProbe only begins after the startupProbe succeeds
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0 # startupProbe already handled the delay
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # Require 3 consecutive failures before killing
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3If your application has a variable startup time (common with JVM-based apps, large model
loading, or database migrations), use a startupProbe instead of relying on a
large initialDelaySeconds on the liveness probe. The startup probe disables
the liveness probe entirely until the application signals it is ready. This avoids both
premature kills and unnecessarily long delays before liveness checking begins.
Dependency failures
If the previous logs show “connection refused,” “connection timed out,” “name resolution failed,” or “no route to host,” the container started successfully but crashed because it could not reach a dependency. This could be a database, an external API, a message queue, or another microservice.
A dependency failure is like arriving at work and finding the office door locked. You (the container) started up fine, but you cannot do your job because something you depend on is unavailable. The fix is not in your code; it is in whatever is behind the locked door.
Common dependency failure patterns
Cloud SQL Auth Proxy not ready. The main container starts before the Auth Proxy sidecar is listening. Add retry logic or an init container that waits for the proxy port. See Cloud SQL Connection Refused.
DNS not resolving. The container tries to reach a service by hostname but cluster DNS (CoreDNS) has not resolved it yet, or the hostname is wrong. Check DNS resolution inside the pod:
# Test DNS resolution from inside the pod
kubectl exec POD_NAME -n NAMESPACE -- nslookup SERVICE_NAME.NAMESPACE.svc.cluster.local
# If the pod is crashing too fast, run a temporary debug pod
kubectl run dns-test --rm -it --image=busybox --restart=Never -- nslookup SERVICE_NAME.NAMESPACE.svc.cluster.localNetwork policy blocking traffic. A
NetworkPolicyin the namespace may be blocking egress to the dependency. Check:
kubectl get networkpolicies -n NAMESPACEExternal API unreachable. The pod needs to reach an external endpoint but lacks internet egress. On private GKE clusters, nodes do not have external IP addresses by default. You need Cloud NAT configured for the subnet. See Private GKE Clusters for the full setup.
SIGTERM handling (exit code 143)
Exit code 143 means the container received SIGTERM (signal 15) and shut down. SIGTERM is
sent during rolling updates, node drains, and scale-down events. If the container does not
handle SIGTERM and shut down within the terminationGracePeriodSeconds (default
30 seconds), Kubernetes sends SIGKILL to force-stop it.
If exit code 143 appears in a CrashLoopBackOff cycle (not during a deployment), something is sending SIGTERM to the container unexpectedly. Possible causes:
- A liveness probe failure (Kubernetes sends SIGTERM before SIGKILL)
- The node is under memory pressure and evicting pods
- A preemptible or Spot VM was reclaimed
- A cluster autoscaler is draining the node
Check events for eviction or drain messages:
kubectl get events -n NAMESPACE --sort-by='.lastTimestamp' | grep -i -E "(evict|drain|preempt)"ImagePullBackOff alongside CrashLoopBackOff
Some pods show ImagePullBackOff rather than CrashLoopBackOff. This means the
container image cannot be pulled. This is a completely different problem. The container never
started, so there are no application logs to read.
Diagnose the image pull failure
# Check the events for the specific pull error
kubectl describe pod POD_NAME -n NAMESPACE | grep -A 5 "Events:"
# Verify the image exists in Artifact Registry
gcloud artifacts docker images list REGION-docker.pkg.dev/PROJECT/REPO
# Check which service account the GKE node pool uses for image pulls
gcloud container clusters describe CLUSTER_NAME \
--zone=ZONE \
--format="value(nodeConfig.serviceAccount)"Common image pull failure causes
Image tag does not exist. The tag was never pushed, was overwritten, or you have a typo. Verify with
gcloud artifacts docker images list.Registry permissions. GKE nodes need
roles/artifactregistry.readeron the Artifact Registry repository to pull images. Grant it to the node service account:
gcloud artifacts repositories add-iam-policy-binding REPO \
--location=REGION \
--member="serviceAccount:NODE_SA@PROJECT.iam.gserviceaccount.com" \
--role="roles/artifactregistry.reader"Private registry without imagePullSecrets. If you are pulling from a registry outside GCP, the pod needs an
imagePullSecretconfigured.Using
:latestwithoutimagePullPolicy: Always. If the image is cached on the node with an older version of:latest, Kubernetes may use the stale cache. SetimagePullPolicy: Alwaysor use immutable tags (e.g. the git SHA).
Never use :latest tags in production. Two deployments can run different
versions of the same tag, and rollbacks become impossible because there is no immutable
reference to roll back to. Use the git commit SHA or a build number as the tag instead.
Init container failures
If an init container fails, the main containers never start, and the pod may enter CrashLoopBackOff. Init containers run sequentially before the main containers and must complete successfully.
# Check init container status
kubectl describe pod POD_NAME -n NAMESPACE | grep -A 10 "Init Containers:"
# Get logs from a failed init container
kubectl logs POD_NAME -n NAMESPACE -c INIT_CONTAINER_NAMECommon init container failures: a database migration that fails, a secret-fetching init container that lacks IAM permissions, or a network-check init container that cannot reach a dependency.
Init container logs are not covered by —previous in the same way as main
containers. You must explicitly name the init container with -c to get its
logs. If you skip this step, you may miss the real failure entirely.
GKE-specific considerations
Some CrashLoopBackOff causes are specific to GKE or more common on GKE than on other Kubernetes platforms.
Workload Identity misconfiguration. The Workload Identity binding between the Kubernetes service account (KSA) and the GCP service account (GSA) is missing or incorrect. The pod authenticates as the wrong identity and gets “permission denied” on every GCP API call.
Autopilot resource adjustments. On GKE Autopilot, Kubernetes automatically adjusts resource limits. If your requests are too low, Autopilot may set limits that are still insufficient for peak load.
Node pool machine type too small. On Standard clusters, if the node pool uses small machine types (e.g.
e2-micro), system pods consume a large fraction of available resources, leaving little for your workloads.GKE version skew. If the cluster control plane and node pools are on different Kubernetes minor versions, API incompatibilities can cause unexpected behaviour in admission controllers or mutating webhooks. Keep node pools within one minor version of the control plane. See Upgrading GKE Clusters Safely.
Common beginner mistakes
Reading logs from the current container instead of the previous one. During the backoff window, the current container has not started. Always use
—previousto get logs from the last crash.Setting memory limits too close to baseline usage. A limit only slightly above baseline triggers OOMKilled under any load spike. Set limits with a significant buffer above typical peak, at least 1.5–2× peak usage.
Not setting resource requests at all. Without requests, Kubernetes does not reserve CPU or memory for the pod. Under node pressure, the pod is evicted first, producing CrashLoopBackOff on an otherwise healthy application.
Using
failureThreshold: 1on liveness probes. A single slow probe response kills the container. Always setfailureThresholdto at least 3.Assuming CrashLoopBackOff is a Kubernetes bug. CrashLoopBackOff is almost always an application-level or configuration-level issue. Kubernetes is doing its job correctly. The container is the thing that needs fixing.
Deleting and recreating the pod instead of investigating. Deleting the pod erases the previous logs and restart history. Investigate first, then fix the Deployment or StatefulSet spec. The new pods will roll out automatically.
Using
:latesttags withoutimagePullPolicy: Always. Nodes cache images. If you push a new version of:latestbut the node already has the old version cached, the pod runs stale code. Use immutable tags (e.g. the git commit SHA) instead.
Summary
- CrashLoopBackOff is a restart management state, not the actual error. Your job is to find and fix the root cause.
- Use
kubectl logs POD_NAME —previousto read crash logs from the last terminated container. The current container has not started during the backoff window. - Use
kubectl describe podto find the exit code, termination reason, events, and probe configuration. - Exit code 1 means an application crash. Read the previous logs for the exception or configuration error.
- Exit code 137 (OOMKilled) means the container exceeded its memory limit. Increase the limit or reduce memory usage.
- Exit code 0 with CrashLoopBackOff usually means a liveness probe is failing on a container that exited cleanly.
- Liveness probes with low
failureThresholdor shorttimeoutSecondscan kill healthy but slow containers. - Use a
startupProbefor applications with variable startup times to prevent premature liveness kills. - On GKE, check Workload Identity bindings, Autopilot resource adjustments, and node pool sizing as additional root causes.
Frequently asked questions
What exactly is CrashLoopBackOff and why does the backoff time keep increasing?
CrashLoopBackOff means a container is crashing repeatedly and Kubernetes is applying exponential backoff before restarting it. The backoff starts at 10 seconds, doubles with each restart (20s, 40s, 80s...), and caps at 5 minutes. Kubernetes does this to avoid hammering a failing system. The container will keep trying to restart until you fix the underlying problem. CrashLoopBackOff is not the error itself. It is the restart management state.
How do I see the logs from a pod that keeps crashing before I can read them?
Use kubectl logs POD_NAME --previous to see the logs from the last terminated container. If the pod is in the middle of the backoff wait, the current container has not started yet, so --previous is the only way to get the most recent crash logs. You can also use kubectl describe pod POD_NAME to see the last exit code and termination reason without needing the logs.
My pod shows OOMKilled in the exit reason. What causes this and how do I fix it?
OOMKilled (exit code 137) means the Linux kernel killed the container because it exceeded its configured memory limit. Fix it by increasing the memory limit in the container spec, or reduce the application memory usage. The request sets the minimum reserved; the limit is the ceiling. Set limits to at least 2x typical peak usage to give headroom for bursts.
A liveness probe is killing my container before it finishes starting up. How do I fix this?
Increase the initialDelaySeconds on the liveness probe to give the application time to start before the probe begins. Set failureThreshold to at least 3 so a single slow response does not kill the container. If startup is highly variable, consider adding a separate startupProbe which disables the liveness probe until the startup check passes.
What is the difference between CrashLoopBackOff and ImagePullBackOff?
CrashLoopBackOff means the container image was pulled successfully and the container started, but then crashed. ImagePullBackOff means Kubernetes could not pull the container image at all. The image tag does not exist, the registry is unreachable, or the node lacks permission to pull from the registry. Check kubectl describe pod to see which state your pod is in and follow the matching fix.
How do I tell whether a CrashLoopBackOff is caused by a probe failure or an application crash?
Run kubectl describe pod POD_NAME and check the Events section. If the events show "Liveness probe failed" or "Readiness probe failed" messages, a probe is killing the container. If the events only show "Back-off restarting failed container," the application is crashing on its own. Also check the Last State section: exit code 137 with reason OOMKilled points to a memory issue, while exit code 1 with reason Error points to an application-level crash.