Fix Cloud Run "Container Failed to Start" (PORT, Logs, Image, Secrets)

The error “Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable” means Cloud Run deployed your container, but the container either crashed, hung, or never opened a network listener before the startup timeout expired. Cloud Run could not route traffic to it, so the revision failed.

Most of the time the fix is straightforward. The container is not reading the PORT environment variable, is binding to 127.0.0.1 instead of 0.0.0.0, or is crashing on startup due to a missing dependency. Start with the fastest checks below before working through the deeper causes.

This page primarily covers startup failures, where the container never became reachable. It also addresses adjacent false positives like image pull errors and secret mount failures that surface with similar error messages. Runtime crashes (the container starts but later fails under traffic) are covered briefly for comparison.

Simple explanation

Think of Cloud Run like a restaurant host seating a new chef. The host (Cloud Run) walks the chef (your container) to the kitchen and says “start cooking.” But before any orders go to that kitchen, the host checks: is the chef actually at the stove and ready to take orders? If the chef never shows up at the counter, the host sends all orders to a different kitchen instead.

When you deploy a Cloud Run service, Cloud Run starts your container and waits for it to become reachable. Four things must happen for the revision to succeed:

  1. The ingress container must start without crashing.
  2. It must read the PORT environment variable that Cloud Run injects (default 8080).
  3. It must bind its listener to 0.0.0.0 (all interfaces), not 127.0.0.1 (loopback).
  4. It must begin accepting TCP connections before the startup timeout expires.

If the container exits, hangs, or never opens a listener on PORT, Cloud Run marks the revision as failed and surfaces the “container failed to start” error. No traffic is ever sent to a failed revision.

Key takeaway

Cloud Run does not care what your app does internally. It only checks one thing: “is something listening on the port I told you to use?” If the answer is no after the timeout, the revision fails.

Fastest fix checklist

Before you dig deeper

Work through these checks in order. Most Cloud Run startup failures are resolved by one of the first four items, often in under five minutes.

  1. Run the image locally. Use docker run -e PORT=8080 -p 8080:8080 IMAGE and confirm the app responds on http://localhost:8080. If it crashes locally, fix that first.

  2. Verify the app reads PORT at runtime. Search your code for hardcoded port numbers like 3000, 5000, or 8000. Replace them with a read from the PORT environment variable.

  3. Bind to 0.0.0.0, not 127.0.0.1. Cloud Run routes traffic from outside the container. A loopback-only listener is unreachable.

  4. Check the container port, entrypoint, and command. If you overrode the container port in the Cloud Run config, make sure it matches what the app actually listens on. Verify the entrypoint and command are correct in your Dockerfile.

  5. Check Logs Explorer. Filter on resource.type=“cloud_run_revision” and look at the first log lines from the failed revision. The crash reason is almost always there.

  6. Confirm the image is Linux x86-64. If you build on Apple Silicon, use —platform=linux/amd64. An ARM image fails with “exec format error.”

  7. Verify secrets and environment variables. Missing secrets or env vars that the app requires at startup will crash the process before it listens.

  8. Check image pull permissions. The Cloud Run service agent needs roles/artifactregistry.reader on the Artifact Registry repository.

How Cloud Run startup works

Ingress container vs sidecars

A Cloud Run service has one ingress container that receives HTTP traffic. You can optionally add sidecar containers for tasks like logging or proxying, but only the ingress container must listen on PORT. The “container failed to start” error refers specifically to the ingress container.

Revision readiness

Each deployment creates a new revision. Cloud Run starts the revision’s container and waits for it to accept TCP connections on the configured port. Once the container is reachable, the revision is marked Ready and traffic shifts to it. If the container never becomes reachable within the startup timeout, the revision is marked as failed.

Why port binding and startup timing matter

Cloud Run does not inspect your application code. It only knows whether the container is listening on the expected port. Think of it like a phone call: Cloud Run dials the port number and waits for someone to pick up. If nobody answers before the timeout, Cloud Run hangs up and marks the call as failed.

If your app takes 30 seconds to load an ML model but the startup timeout is 10 seconds, Cloud Run kills the container before it finishes. The logs may show a healthy startup that simply stopped.

Startup probes, liveness probes, and readiness probes

Think of probes as health checks at a hospital. A startup probe is the admission check (“are you awake yet?”). A liveness probe is the daily checkup (“are you still alive?”). A readiness probe is the discharge check (“are you ready to go home?”). Each serves a different purpose:

  • Startup probe: Runs once during container startup. It tells Cloud Run “the container is still starting, keep waiting.” If the startup probe keeps failing past the timeout, the container is killed.

  • Liveness probe: Runs continuously after startup. If it fails repeatedly, Cloud Run restarts the container. Use this to detect deadlocks or hung processes.

  • Readiness probe: Not applicable to Cloud Run (used in GKE). Cloud Run uses TCP port readiness instead.

Watch out

If you configure a startup probe with a path that returns errors or a timeout that is too short, the container may be perfectly healthy but Cloud Run kills it anyway. A misconfigured probe is one of the hardest startup failures to spot because nothing in your app is actually broken.

How to tell startup failure vs runtime crash

This page primarily targets startup failures, where the revision never became ready. Runtime crashes (the revision deployed but requests fail later) need a different diagnostic approach.

Startup failureRuntime crash
When it happensDuring deployment, before any trafficAfter deployment, while serving requests
Revision statusReady = FalseReady = True (but requests fail)
Typical symptomsDeploy fails, “container failed to start”Intermittent 503, container restarts
Where to lookRevision conditions, startup logsRequest logs, error logs at runtime
Common causesPORT, image arch, missing secretsOOM, unhandled exceptions, downstream failures
# Check revision conditions (startup failure diagnosis)
gcloud run revisions describe REVISION_NAME \
  --region=REGION \
  --format=yaml

# Look for: conditions[type=Ready].status = False
# The "message" field explains why startup failed

# Check request logs (runtime crash diagnosis)
gcloud logging read \
  'resource.type="cloud_run_revision" AND resource.labels.service_name="my-service" AND severity>=ERROR' \
  --project=PROJECT_ID \
  --limit=20 \
  --format="table(timestamp, textPayload, jsonPayload.message)"

Common causes and fixes

App is not listening on PORT

What it means: Cloud Run injects a PORT environment variable (default 8080). Your application must read this value at runtime and listen on it. If the app hardcodes a different port, Cloud Run cannot reach it.

What it looks like: The exact error message is “Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable.” The container may appear to start normally in the logs but Cloud Run still rejects the revision.

How to check it: Search your code for hardcoded port numbers. Run the container locally with docker run -e PORT=9999 -p 9999:9999 IMAGE and confirm it listens on port 9999, not a hardcoded value.

How to fix it:

# Python (Flask)
import os
from flask import Flask

app = Flask(__name__)

if __name__ == "__main__":
    port = int(os.environ.get("PORT", 8080))
    app.run(host="0.0.0.0", port=port)
// Node.js (Express)
const express = require('express');
const app = express();

const port = parseInt(process.env.PORT) || 8080;
app.listen(port, '0.0.0.0', () => {
  console.log(`Listening on port ${port}`);
});
// Go
package main

import (
  "fmt"
  "net/http"
  "os"
)

func main() {
  port := os.Getenv("PORT")
  if port == "" {
    port = "8080"
  }
  http.ListenAndServe(fmt.Sprintf(":%s", port), nil)
}

App is binding to 127.0.0.1 instead of 0.0.0.0

What it means: 127.0.0.1 is the loopback address. Picture a shop that locks its front door and only serves people already inside. That is what binding to 127.0.0.1 does: the server only accepts connections from within the container itself. Cloud Run routes traffic from outside the container, so a loopback-only listener is unreachable.

What it looks like: The same “container failed to start” PORT error. The app logs may show it started successfully and is listening, but Cloud Run still cannot connect.

How to check it: Search your code and framework config for 127.0.0.1, localhost, or host: “127.0.0.1”. Some frameworks (like Flask in debug mode) default to loopback.

How to fix it: Change the bind address to 0.0.0.0 in your server configuration. In Express, pass ‘0.0.0.0’ as the second argument to app.listen(). In Flask, set host=“0.0.0.0”. In Go’s http.ListenAndServe, use “:PORT” (no IP prefix binds all interfaces).

Warning

Some frameworks bind to 127.0.0.1 by default in development mode. Always verify the bind address in your production startup command, not just your dev config.

Wrong container port, command, or entrypoint

What it means: The container port configured in Cloud Run does not match the port the application actually listens on, or the entrypoint/command in the Dockerfile does not start the correct process.

What it looks like: The PORT error, or the container exits immediately with a non-zero exit code. If the entrypoint is wrong, the logs may show “file not found” or “permission denied.”

How to check it:

# Check the configured container port
gcloud run services describe my-service \
  --region=REGION \
  --format="value(spec.template.spec.containers[0].ports[0].containerPort)"

# Check the current entrypoint and command overrides
gcloud run services describe my-service \
  --region=REGION \
  --format="yaml(spec.template.spec.containers[0].command, spec.template.spec.containers[0].args)"

# Inspect the Dockerfile entrypoint in the image
docker inspect IMAGE --format='{{json .Config.Entrypoint}} {{json .Config.Cmd}}'

How to fix it: Make sure the container port in Cloud Run matches what the app listens on, or remove the override so it defaults to PORT. Verify your Dockerfile CMD or ENTRYPOINT runs the correct binary.

App crashes before binding to the port

What it means: The process starts but exits before it opens a listener. Cloud Run waits for the startup timeout, then marks the container as failed.

What it looks like: Logs show the app starting, then an exception or error message, then the container exits. Common culprits: missing environment variable, Python import error, failed database connection during init.

How to check it: Open Logs Explorer and filter on the failed revision. Look at the very first log lines. The crash reason is usually there.

# Read the earliest logs from the failed revision
gcloud logging read \
  'resource.type="cloud_run_revision" AND resource.labels.revision_name="my-service-00042-abc"' \
  --project=PROJECT_ID \
  --limit=30 \
  --order=asc \
  --format="table(timestamp, textPayload, jsonPayload.message)"

How to fix it: Fix the crash. Common fixes:

  • Add missing environment variables with gcloud run services update —set-env-vars.
  • Install missing Python/Node packages in the Dockerfile.
  • Add retry logic for database connections during init. See Cloud SQL Connection Refused for connection failures to Cloud SQL.
  • Use structured logging so crash details are easier to find in Logs Explorer.

ARM / Apple Silicon / wrong image architecture

What it means: This is like putting a Blu-ray disc into a DVD player. Cloud Run runs on x86-64 (amd64) infrastructure. A container image built for ARM (such as on an Apple Silicon Mac without specifying the platform) contains binaries in the wrong “format” and cannot execute on Cloud Run at all.

What it looks like: The error message is usually exec format error. The container exits immediately with no application logs.

How to check it:

# Check the architecture of a local image
docker inspect IMAGE --format='{{.Architecture}}'

# Check a remote image in Artifact Registry
gcloud artifacts docker images describe \
  REGION-docker.pkg.dev/PROJECT/REPO/IMAGE:TAG \
  --show-build-details

How to fix it:

# Build for the correct platform explicitly
docker build --platform=linux/amd64 -t my-image:latest .

# Or build a multi-arch image
docker buildx build --platform=linux/amd64,linux/arm64 -t my-image:latest --push .
Tip

If you use a CI/CD pipeline like Cloud Build, the build runs on x86-64 by default. The architecture mismatch problem mainly affects local builds on Apple Silicon.

Artifact Registry image pull or permission problem

What it means: Cloud Run cannot pull the container image from Artifact Registry. The deployment fails before the container even starts.

What it looks like: The error references an image pull failure. The revision never starts and no application logs appear.

How to check it: Verify the image URL is correct and the tag exists. Check that the Cloud Run service agent has read access to the registry.

# Find the Cloud Run service agent
gcloud projects describe PROJECT_ID \
  --format="value(projectNumber)"
# Service agent: service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com

# Grant image pull access
gcloud artifacts repositories add-iam-policy-binding REPO_NAME \
  --location=REGION \
  --member="serviceAccount:service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com" \
  --role="roles/artifactregistry.reader"

# Verify the image tag exists
gcloud artifacts docker images list \
  REGION-docker.pkg.dev/PROJECT/REPO/IMAGE \
  --include-tags

How to fix it: Grant roles/artifactregistry.reader to the Cloud Run service agent. For cross-project registries, the grant must be in the registry’s project. Double-check the image URL for typos. See Permission Denied Errors for broader IAM troubleshooting.

Secret Manager access failure

What it means: The Cloud Run service mounts a secret from Secret Manager, but the service account lacks roles/secretmanager.secretAccessor. The container fails before the application process starts.

What it looks like: The container exits immediately with no application logs. The revision conditions mention a permission error related to Secret Manager.

How to check it:

# Check which service account the Cloud Run service uses
gcloud run services describe my-service \
  --region=REGION \
  --format="value(spec.template.spec.serviceAccountName)"

# Check if the service account has secret accessor role
gcloud secrets get-iam-policy SECRET_NAME \
  --format="table(bindings.role, bindings.members)"

# Grant access if missing
gcloud secrets add-iam-policy-binding SECRET_NAME \
  --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

How to fix it: Grant roles/secretmanager.secretAccessor to the runtime service account on each secret the service references. See IAM AccessDenied Errors for diagnosing service account permission issues.

Cloud SQL, private service, or VPC connectivity during startup

What it means: The application tries to connect to a private resource (Cloud SQL via private IP, an internal API, a Redis instance) during startup. Without Serverless VPC Access or the built-in Cloud SQL connector, the connection times out and the app crashes before binding to PORT.

What it looks like: The app logs show a connection timeout or “connection refused” to an internal IP address, followed by a crash. The startup takes the full timeout period before failing.

How to check it: Check whether your app connects to any private resource during init. Verify the Cloud Run service has a VPC connector or Cloud SQL instance attached.

# Check VPC connector and Cloud SQL instances on the service
gcloud run services describe my-service \
  --region=REGION \
  --format="yaml(spec.template.metadata.annotations)"

# Attach a Cloud SQL instance (connects via Unix socket)
gcloud run services update my-service \
  --region=REGION \
  --add-cloudsql-instances=PROJECT_ID:REGION:INSTANCE_NAME

# Or attach a VPC connector for private IP access
gcloud run services update my-service \
  --region=REGION \
  --vpc-connector=my-connector \
  --vpc-egress=private-ranges-only

How to fix it: For Cloud SQL, use the built-in connector or a VPC connector. For other private resources, configure Serverless VPC Access. Add retry logic for database connections so a transient failure during cold start does not crash the container. See Cloud SQL Connection Refused for detailed connection troubleshooting.

Why this catches people off guard

Your app might connect to a database fine when you run it locally, because your machine can reach the database directly. Cloud Run containers start in an isolated network. Without a VPC connector or Cloud SQL instance attached, the container has no route to private resources. The connection hangs until the startup timeout kills the container.

Startup probe / timeout / health check issue

What it means: The container starts correctly, but the startup probe fails or the startup timeout expires before the app finishes initialising. Cloud Run kills the container even though it was on track to start successfully.

What it looks like: Logs show a healthy startup (loading models, warming caches) that simply stops. No crash, no error, just silence when the timeout fires.

How to check it:

# Check the current startup probe and timeout configuration
gcloud run services describe my-service \
  --region=REGION \
  --format="yaml(spec.template.spec.containers[0].startupProbe, spec.template.spec.containerConcurrency, spec.template.metadata.annotations['run.googleapis.com/startup-cpu-boost'])"

# Check the container startup timeout
gcloud run services describe my-service \
  --region=REGION \
  --format="value(spec.template.spec.timeoutSeconds)"

How to fix it: Increase the startup timeout for services that need more init time (loading ML models, large dependency trees). Increase the startup probe failureThreshold and periodSeconds if using a custom probe. Enable startup CPU boost to give the container full CPU during init:

# Increase container startup timeout to 300 seconds
gcloud run services update my-service \
  --region=REGION \
  --timeout=300

# Enable startup CPU boost
gcloud run services update my-service \
  --region=REGION \
  --cpu-boost

Symptom to cause map

How to use this table

Find the symptom that matches what you see. The “Likely cause” column points you to the right section above, and “What to check first” gives you the fastest next step.

SymptomLikely causeWhat to check first
”Failed to start and then listen on the port defined by the PORT environment variable”App not reading PORT or binding to 127.0.0.1App startup code, bind address
Revision never becomes ReadyAny startup failure (PORT, crash, timeout)Revision conditions: gcloud run revisions describe
Exit code 1 in logsApp crash (missing env var, import error, dependency failure)Earliest log lines from the revision
”exec format error”ARM image on x86-64 infrastructureImage architecture: docker inspect
Image pull failureWrong image URL, missing tag, or IAM permission gapImage URL, service agent permissions on Artifact Registry
Works locally but fails on Cloud RunHardcoded port, missing env vars, ARM image, private network dependencyRun locally with -e PORT=8080, check env config
Intermittent 503 after successful deployRuntime crash (OOM, unhandled exception, downstream failure)Request logs, Cloud Run metrics

When to use this page

This page solves these situations:

  • You deployed a Cloud Run service and got “Container failed to start.”
  • You see “The user-provided container failed to start and listen on the port defined by the PORT=8080 environment variable within the allocated timeout.”
  • Your Cloud Run revision shows Ready = False after deployment.
  • The container works locally but fails on Cloud Run.
  • You see “exec format error” in your Cloud Run logs.
  • The deploy fails with an image pull error.

When this is actually a different problem

If the revision deployed successfully (Ready = True) but requests are failing, you have a runtime problem, not a startup problem. Here are the right pages for related issues:

Common mistakes

  1. Hardcoding the port in application code. The single most common cause of Cloud Run startup failures. Read the port from the PORT environment variable at runtime. Do not assume 8080. Read it dynamically.

  2. Building for ARM on Apple Silicon. Always build with —platform=linux/amd64 when deploying to Cloud Run. The “exec format error” message does not mention architecture, which makes this hard to diagnose if you do not know to look for it.

  3. Connecting to a database in the startup path without retry logic. A transient connection failure during cold start crashes the container and triggers another cold start, creating a loop. Add retry with backoff for all startup-time connections.

  4. Assuming the startup timeout is long enough. The default startup timeout may be too short for apps that load large models, warm caches, or compile templates. Increase the timeout and enable startup CPU boost.

  5. Using the default service account without granting it access to secrets or registries. The default Compute Engine service account does not have Secret Manager or cross-project Artifact Registry access by default. Use a purpose-specific service account with the exact roles needed.

  6. Not checking the logs. The crash reason is almost always in the first few log lines of the failed revision. Open Logs Explorer before guessing.

Cloud Run container failed to start vs runtime crash

A startup failure and a runtime crash need different fixes. Here is the practical difference:

  • Startup failure: The deployment itself fails. No traffic reaches the new revision. The previous revision (if any) keeps serving. Fix the container and redeploy.

  • Runtime crash: The deployment succeeds and the revision serves traffic, but the container crashes under certain conditions. Traffic may still reach the crashing revision, causing 503 errors for users.

Why this matters

A startup failure is safe: no users see errors because the old revision keeps serving. A runtime crash is urgent: users are getting 503s right now. Know which one you have before you start debugging.

To diagnose a startup failure, check the revision conditions. To diagnose a runtime crash, check request logs and error rates in Cloud Run monitoring.

# Startup failure: check revision conditions
gcloud run revisions describe REVISION_NAME \
  --region=REGION \
  --format="yaml(status.conditions)"

# Runtime crash: check error logs during traffic
gcloud logging read \
  'resource.type="cloud_run_revision" AND resource.labels.service_name="my-service" AND severity>=ERROR' \
  --project=PROJECT_ID \
  --limit=20

Frequently asked questions

How do I fix "Failed to start and then listen on the port defined by the PORT environment variable"?

Your application must read the port number from the PORT environment variable at runtime and listen on that value. Cloud Run injects PORT into every container (default 8080). If your app listens on a hardcoded port like 3000 or 5000, Cloud Run cannot route traffic to it. Fix your code to use os.environ.get("PORT", 8080) in Python, process.env.PORT in Node.js, or os.Getenv("PORT") in Go. Then bind to 0.0.0.0, not 127.0.0.1.

Why does my container work locally but fail on Cloud Run?

Four common causes: (1) the container listens on a hardcoded port instead of the PORT environment variable, (2) required environment variables or secrets are not configured in the Cloud Run service, (3) the container tries to reach a private VPC resource without Serverless VPC Access configured, (4) the container was built for ARM (Apple Silicon) and cannot run on Cloud Run x86-64 infrastructure. Run the image locally with docker run -e PORT=8080 -p 8080:8080 to simulate what Cloud Run does.

Can Apple Silicon cause the "container failed to start" error?

Yes. If you build a Docker image on an Apple Silicon Mac (M1/M2/M3/M4) without specifying the target platform, the image contains ARM binaries. Cloud Run runs on x86-64 (amd64) infrastructure, so ARM images fail immediately with "exec format error." Always build with docker build --platform=linux/amd64 when targeting Cloud Run.

How do I know whether this is a startup failure or a runtime crash?

Check the revision status. Run gcloud run revisions describe REVISION --region=REGION --format=yaml and look at the conditions section. If the Ready condition shows status: False, the container never started. If the revision is Ready but requests return 503 or 500, the container started but is crashing at runtime. Startup failures block deployment; runtime crashes happen after traffic begins flowing.

Can secrets, Cloud SQL, or private networking cause "container failed to start"?

Yes. If the service account lacks roles/secretmanager.secretAccessor, secrets mounted via Secret Manager prevent the container from starting. A Cloud SQL connection attempt during init that fails (missing roles/cloudsql.client or no VPC connector for private IP) can crash the process before it binds to PORT. Private dependencies that require Serverless VPC Access will time out during startup if the connector is not configured.

Last verified: 27 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.