Fix Cloud Run "Container Failed to Start" (PORT, Logs, Image, Secrets)
The error “Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable” means Cloud Run deployed your container, but the container either crashed, hung, or never opened a network listener before the startup timeout expired. Cloud Run could not route traffic to it, so the revision failed.
Most of the time the fix is straightforward. The container is not reading the PORT
environment variable, is binding to 127.0.0.1 instead of 0.0.0.0,
or is crashing on startup due to a missing dependency. Start with the fastest checks below
before working through the deeper causes.
This page primarily covers startup failures, where the container never became reachable. It also addresses adjacent false positives like image pull errors and secret mount failures that surface with similar error messages. Runtime crashes (the container starts but later fails under traffic) are covered briefly for comparison.
Simple explanation
Think of Cloud Run like a restaurant host seating a new chef. The host (Cloud Run) walks the chef (your container) to the kitchen and says “start cooking.” But before any orders go to that kitchen, the host checks: is the chef actually at the stove and ready to take orders? If the chef never shows up at the counter, the host sends all orders to a different kitchen instead.
When you deploy a Cloud Run service, Cloud Run starts your container and waits for it to become reachable. Four things must happen for the revision to succeed:
- The ingress container must start without crashing.
- It must read the
PORTenvironment variable that Cloud Run injects (default8080). - It must bind its listener to
0.0.0.0(all interfaces), not127.0.0.1(loopback). - It must begin accepting TCP connections before the startup timeout expires.
If the container exits, hangs, or never opens a listener on PORT, Cloud Run
marks the revision as failed and surfaces the “container failed to start” error. No traffic
is ever sent to a failed revision.
Cloud Run does not care what your app does internally. It only checks one thing: “is something listening on the port I told you to use?” If the answer is no after the timeout, the revision fails.
Fastest fix checklist
Work through these checks in order. Most Cloud Run startup failures are resolved by one of the first four items, often in under five minutes.
Run the image locally. Use
docker run -e PORT=8080 -p 8080:8080 IMAGEand confirm the app responds onhttp://localhost:8080. If it crashes locally, fix that first.Verify the app reads
PORTat runtime. Search your code for hardcoded port numbers like3000,5000, or8000. Replace them with a read from thePORTenvironment variable.Bind to
0.0.0.0, not127.0.0.1. Cloud Run routes traffic from outside the container. A loopback-only listener is unreachable.Check the container port, entrypoint, and command. If you overrode the container port in the Cloud Run config, make sure it matches what the app actually listens on. Verify the entrypoint and command are correct in your Dockerfile.
Check Logs Explorer. Filter on
resource.type=“cloud_run_revision”and look at the first log lines from the failed revision. The crash reason is almost always there.Confirm the image is Linux x86-64. If you build on Apple Silicon, use
—platform=linux/amd64. An ARM image fails with “exec format error.”Verify secrets and environment variables. Missing secrets or env vars that the app requires at startup will crash the process before it listens.
Check image pull permissions. The Cloud Run service agent needs
roles/artifactregistry.readeron the Artifact Registry repository.
How Cloud Run startup works
Ingress container vs sidecars
A Cloud Run service has one ingress container that receives HTTP traffic.
You can optionally add sidecar containers for tasks like logging or proxying, but only the
ingress container must listen on PORT. The “container failed to start” error
refers specifically to the ingress container.
Revision readiness
Each deployment
creates a new revision. Cloud Run starts the revision’s container and waits for it to
accept TCP connections on the configured port. Once the container is reachable, the revision
is marked Ready and traffic shifts to it. If the container never becomes
reachable within the startup timeout, the revision is marked as failed.
Why port binding and startup timing matter
Cloud Run does not inspect your application code. It only knows whether the container is listening on the expected port. Think of it like a phone call: Cloud Run dials the port number and waits for someone to pick up. If nobody answers before the timeout, Cloud Run hangs up and marks the call as failed.
If your app takes 30 seconds to load an ML model but the startup timeout is 10 seconds, Cloud Run kills the container before it finishes. The logs may show a healthy startup that simply stopped.
Startup probes, liveness probes, and readiness probes
Think of probes as health checks at a hospital. A startup probe is the admission check (“are you awake yet?”). A liveness probe is the daily checkup (“are you still alive?”). A readiness probe is the discharge check (“are you ready to go home?”). Each serves a different purpose:
Startup probe: Runs once during container startup. It tells Cloud Run “the container is still starting, keep waiting.” If the startup probe keeps failing past the timeout, the container is killed.
Liveness probe: Runs continuously after startup. If it fails repeatedly, Cloud Run restarts the container. Use this to detect deadlocks or hung processes.
Readiness probe: Not applicable to Cloud Run (used in GKE). Cloud Run uses TCP port readiness instead.
If you configure a startup probe with a path that returns errors or a timeout that is too short, the container may be perfectly healthy but Cloud Run kills it anyway. A misconfigured probe is one of the hardest startup failures to spot because nothing in your app is actually broken.
How to tell startup failure vs runtime crash
This page primarily targets startup failures, where the revision never became ready. Runtime crashes (the revision deployed but requests fail later) need a different diagnostic approach.
| Startup failure | Runtime crash | |
|---|---|---|
| When it happens | During deployment, before any traffic | After deployment, while serving requests |
| Revision status | Ready = False | Ready = True (but requests fail) |
| Typical symptoms | Deploy fails, “container failed to start” | Intermittent 503, container restarts |
| Where to look | Revision conditions, startup logs | Request logs, error logs at runtime |
| Common causes | PORT, image arch, missing secrets | OOM, unhandled exceptions, downstream failures |
# Check revision conditions (startup failure diagnosis)
gcloud run revisions describe REVISION_NAME \
--region=REGION \
--format=yaml
# Look for: conditions[type=Ready].status = False
# The "message" field explains why startup failed
# Check request logs (runtime crash diagnosis)
gcloud logging read \
'resource.type="cloud_run_revision" AND resource.labels.service_name="my-service" AND severity>=ERROR' \
--project=PROJECT_ID \
--limit=20 \
--format="table(timestamp, textPayload, jsonPayload.message)"Common causes and fixes
App is not listening on PORT
What it means: Cloud Run injects a PORT environment variable
(default 8080). Your application must read this value at runtime and listen on
it. If the app hardcodes a different port, Cloud Run cannot reach it.
What it looks like: The exact error message is “Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable.” The container may appear to start normally in the logs but Cloud Run still rejects the revision.
How to check it: Search your code for hardcoded port numbers. Run the
container locally with docker run -e PORT=9999 -p 9999:9999 IMAGE and confirm
it listens on port 9999, not a hardcoded value.
How to fix it:
# Python (Flask)
import os
from flask import Flask
app = Flask(__name__)
if __name__ == "__main__":
port = int(os.environ.get("PORT", 8080))
app.run(host="0.0.0.0", port=port)// Node.js (Express)
const express = require('express');
const app = express();
const port = parseInt(process.env.PORT) || 8080;
app.listen(port, '0.0.0.0', () => {
console.log(`Listening on port ${port}`);
});// Go
package main
import (
"fmt"
"net/http"
"os"
)
func main() {
port := os.Getenv("PORT")
if port == "" {
port = "8080"
}
http.ListenAndServe(fmt.Sprintf(":%s", port), nil)
}App is binding to 127.0.0.1 instead of 0.0.0.0
What it means: 127.0.0.1 is the loopback address. Picture a
shop that locks its front door and only serves people already inside. That is what binding
to 127.0.0.1 does: the server only accepts connections from within the container
itself. Cloud Run routes traffic from outside the container, so a loopback-only listener is
unreachable.
What it looks like: The same “container failed to start” PORT error. The app logs may show it started successfully and is listening, but Cloud Run still cannot connect.
How to check it: Search your code and framework config for
127.0.0.1, localhost, or host: “127.0.0.1”. Some
frameworks (like Flask in debug mode) default to loopback.
How to fix it: Change the bind address to 0.0.0.0 in your
server configuration. In Express, pass ‘0.0.0.0’ as the second argument to
app.listen(). In Flask, set host=“0.0.0.0”. In Go’s
http.ListenAndServe, use “:PORT” (no IP prefix binds all
interfaces).
Some frameworks bind to 127.0.0.1 by default in development mode. Always
verify the bind address in your production startup command, not just your dev config.
Wrong container port, command, or entrypoint
What it means: The container port configured in Cloud Run does not match the port the application actually listens on, or the entrypoint/command in the Dockerfile does not start the correct process.
What it looks like: The PORT error, or the container exits immediately with a non-zero exit code. If the entrypoint is wrong, the logs may show “file not found” or “permission denied.”
How to check it:
# Check the configured container port
gcloud run services describe my-service \
--region=REGION \
--format="value(spec.template.spec.containers[0].ports[0].containerPort)"
# Check the current entrypoint and command overrides
gcloud run services describe my-service \
--region=REGION \
--format="yaml(spec.template.spec.containers[0].command, spec.template.spec.containers[0].args)"
# Inspect the Dockerfile entrypoint in the image
docker inspect IMAGE --format='{{json .Config.Entrypoint}} {{json .Config.Cmd}}'How to fix it: Make sure the container port in Cloud Run matches what the
app listens on, or remove the override so it defaults to PORT. Verify your
Dockerfile CMD or ENTRYPOINT runs the correct binary.
App crashes before binding to the port
What it means: The process starts but exits before it opens a listener. Cloud Run waits for the startup timeout, then marks the container as failed.
What it looks like: Logs show the app starting, then an exception or error message, then the container exits. Common culprits: missing environment variable, Python import error, failed database connection during init.
How to check it: Open Logs Explorer and filter on the failed revision. Look at the very first log lines. The crash reason is usually there.
# Read the earliest logs from the failed revision
gcloud logging read \
'resource.type="cloud_run_revision" AND resource.labels.revision_name="my-service-00042-abc"' \
--project=PROJECT_ID \
--limit=30 \
--order=asc \
--format="table(timestamp, textPayload, jsonPayload.message)"How to fix it: Fix the crash. Common fixes:
- Add missing environment variables with
gcloud run services update —set-env-vars. - Install missing Python/Node packages in the Dockerfile.
- Add retry logic for database connections during init. See Cloud SQL Connection Refused for connection failures to Cloud SQL.
- Use structured logging so crash details are easier to find in Logs Explorer.
ARM / Apple Silicon / wrong image architecture
What it means: This is like putting a Blu-ray disc into a DVD player. Cloud Run runs on x86-64 (amd64) infrastructure. A container image built for ARM (such as on an Apple Silicon Mac without specifying the platform) contains binaries in the wrong “format” and cannot execute on Cloud Run at all.
What it looks like: The error message is usually exec format error.
The container exits immediately with no application logs.
How to check it:
# Check the architecture of a local image
docker inspect IMAGE --format='{{.Architecture}}'
# Check a remote image in Artifact Registry
gcloud artifacts docker images describe \
REGION-docker.pkg.dev/PROJECT/REPO/IMAGE:TAG \
--show-build-detailsHow to fix it:
# Build for the correct platform explicitly
docker build --platform=linux/amd64 -t my-image:latest .
# Or build a multi-arch image
docker buildx build --platform=linux/amd64,linux/arm64 -t my-image:latest --push .If you use a CI/CD pipeline like Cloud Build, the build runs on x86-64 by default. The architecture mismatch problem mainly affects local builds on Apple Silicon.
Artifact Registry image pull or permission problem
What it means: Cloud Run cannot pull the container image from Artifact Registry. The deployment fails before the container even starts.
What it looks like: The error references an image pull failure. The revision never starts and no application logs appear.
How to check it: Verify the image URL is correct and the tag exists. Check that the Cloud Run service agent has read access to the registry.
# Find the Cloud Run service agent
gcloud projects describe PROJECT_ID \
--format="value(projectNumber)"
# Service agent: service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com
# Grant image pull access
gcloud artifacts repositories add-iam-policy-binding REPO_NAME \
--location=REGION \
--member="serviceAccount:service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com" \
--role="roles/artifactregistry.reader"
# Verify the image tag exists
gcloud artifacts docker images list \
REGION-docker.pkg.dev/PROJECT/REPO/IMAGE \
--include-tagsHow to fix it: Grant roles/artifactregistry.reader to the
Cloud Run service agent. For cross-project registries, the grant must be in the registry’s
project. Double-check the image URL for typos. See
Permission Denied Errors
for broader IAM troubleshooting.
Secret Manager access failure
What it means: The Cloud Run service mounts a secret from Secret Manager,
but the service account lacks roles/secretmanager.secretAccessor. The container
fails before the application process starts.
What it looks like: The container exits immediately with no application logs. The revision conditions mention a permission error related to Secret Manager.
How to check it:
# Check which service account the Cloud Run service uses
gcloud run services describe my-service \
--region=REGION \
--format="value(spec.template.spec.serviceAccountName)"
# Check if the service account has secret accessor role
gcloud secrets get-iam-policy SECRET_NAME \
--format="table(bindings.role, bindings.members)"
# Grant access if missing
gcloud secrets add-iam-policy-binding SECRET_NAME \
--member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"How to fix it: Grant roles/secretmanager.secretAccessor to the
runtime service account on each secret the service references. See
IAM AccessDenied Errors
for diagnosing service account permission issues.
Cloud SQL, private service, or VPC connectivity during startup
What it means: The application tries to connect to a private resource
(Cloud SQL via private IP, an internal API, a Redis instance) during startup. Without
Serverless VPC Access or the
built-in Cloud SQL connector, the connection times out and the app crashes before binding
to PORT.
What it looks like: The app logs show a connection timeout or “connection refused” to an internal IP address, followed by a crash. The startup takes the full timeout period before failing.
How to check it: Check whether your app connects to any private resource during init. Verify the Cloud Run service has a VPC connector or Cloud SQL instance attached.
# Check VPC connector and Cloud SQL instances on the service
gcloud run services describe my-service \
--region=REGION \
--format="yaml(spec.template.metadata.annotations)"
# Attach a Cloud SQL instance (connects via Unix socket)
gcloud run services update my-service \
--region=REGION \
--add-cloudsql-instances=PROJECT_ID:REGION:INSTANCE_NAME
# Or attach a VPC connector for private IP access
gcloud run services update my-service \
--region=REGION \
--vpc-connector=my-connector \
--vpc-egress=private-ranges-onlyHow to fix it: For Cloud SQL, use the built-in connector or a VPC connector. For other private resources, configure Serverless VPC Access. Add retry logic for database connections so a transient failure during cold start does not crash the container. See Cloud SQL Connection Refused for detailed connection troubleshooting.
Your app might connect to a database fine when you run it locally, because your machine can reach the database directly. Cloud Run containers start in an isolated network. Without a VPC connector or Cloud SQL instance attached, the container has no route to private resources. The connection hangs until the startup timeout kills the container.
Startup probe / timeout / health check issue
What it means: The container starts correctly, but the startup probe fails or the startup timeout expires before the app finishes initialising. Cloud Run kills the container even though it was on track to start successfully.
What it looks like: Logs show a healthy startup (loading models, warming caches) that simply stops. No crash, no error, just silence when the timeout fires.
How to check it:
# Check the current startup probe and timeout configuration
gcloud run services describe my-service \
--region=REGION \
--format="yaml(spec.template.spec.containers[0].startupProbe, spec.template.spec.containerConcurrency, spec.template.metadata.annotations['run.googleapis.com/startup-cpu-boost'])"
# Check the container startup timeout
gcloud run services describe my-service \
--region=REGION \
--format="value(spec.template.spec.timeoutSeconds)"How to fix it: Increase the startup timeout for services that need more
init time (loading ML models, large dependency trees). Increase the startup probe
failureThreshold and periodSeconds if using a custom probe.
Enable startup CPU boost to give the container full CPU during init:
# Increase container startup timeout to 300 seconds
gcloud run services update my-service \
--region=REGION \
--timeout=300
# Enable startup CPU boost
gcloud run services update my-service \
--region=REGION \
--cpu-boostSymptom to cause map
Find the symptom that matches what you see. The “Likely cause” column points you to the right section above, and “What to check first” gives you the fastest next step.
| Symptom | Likely cause | What to check first |
|---|---|---|
| ”Failed to start and then listen on the port defined by the PORT environment variable” | App not reading PORT or binding to 127.0.0.1 | App startup code, bind address |
| Revision never becomes Ready | Any startup failure (PORT, crash, timeout) | Revision conditions: gcloud run revisions describe |
| Exit code 1 in logs | App crash (missing env var, import error, dependency failure) | Earliest log lines from the revision |
| ”exec format error” | ARM image on x86-64 infrastructure | Image architecture: docker inspect |
| Image pull failure | Wrong image URL, missing tag, or IAM permission gap | Image URL, service agent permissions on Artifact Registry |
| Works locally but fails on Cloud Run | Hardcoded port, missing env vars, ARM image, private network dependency | Run locally with -e PORT=8080, check env config |
| Intermittent 503 after successful deploy | Runtime crash (OOM, unhandled exception, downstream failure) | Request logs, Cloud Run metrics |
When to use this page
This page solves these situations:
- You deployed a Cloud Run service and got “Container failed to start.”
- You see “The user-provided container failed to start and listen on the port defined by the PORT=8080 environment variable within the allocated timeout.”
- Your Cloud Run revision shows Ready = False after deployment.
- The container works locally but fails on Cloud Run.
- You see “exec format error” in your Cloud Run logs.
- The deploy fails with an image pull error.
When this is actually a different problem
If the revision deployed successfully (Ready = True) but requests are failing, you have a runtime problem, not a startup problem. Here are the right pages for related issues:
Permission errors during API calls (not startup): Permission Denied Errors or IAM AccessDenied Errors
- Database connection failures at runtime:Cloud SQL Connection Refused
- Cloud Functions deployment or runtime failures:Debugging Cloud Functions Failures
- Choosing between Cloud Run and Cloud Functions:Cloud Run vs Cloud Functions
- Setting up monitoring and alerts:Monitoring Cloud Run
Common mistakes
Hardcoding the port in application code. The single most common cause of Cloud Run startup failures. Read the port from the
PORTenvironment variable at runtime. Do not assume8080. Read it dynamically.Building for ARM on Apple Silicon. Always build with
—platform=linux/amd64when deploying to Cloud Run. The “exec format error” message does not mention architecture, which makes this hard to diagnose if you do not know to look for it.Connecting to a database in the startup path without retry logic. A transient connection failure during cold start crashes the container and triggers another cold start, creating a loop. Add retry with backoff for all startup-time connections.
Assuming the startup timeout is long enough. The default startup timeout may be too short for apps that load large models, warm caches, or compile templates. Increase the timeout and enable startup CPU boost.
Using the default service account without granting it access to secrets or registries. The default Compute Engine service account does not have Secret Manager or cross-project Artifact Registry access by default. Use a purpose-specific service account with the exact roles needed.
Not checking the logs. The crash reason is almost always in the first few log lines of the failed revision. Open Logs Explorer before guessing.
Cloud Run container failed to start vs runtime crash
A startup failure and a runtime crash need different fixes. Here is the practical difference:
Startup failure: The deployment itself fails. No traffic reaches the new revision. The previous revision (if any) keeps serving. Fix the container and redeploy.
Runtime crash: The deployment succeeds and the revision serves traffic, but the container crashes under certain conditions. Traffic may still reach the crashing revision, causing 503 errors for users.
A startup failure is safe: no users see errors because the old revision keeps serving. A runtime crash is urgent: users are getting 503s right now. Know which one you have before you start debugging.
To diagnose a startup failure, check the revision conditions. To diagnose a runtime crash, check request logs and error rates in Cloud Run monitoring.
# Startup failure: check revision conditions
gcloud run revisions describe REVISION_NAME \
--region=REGION \
--format="yaml(status.conditions)"
# Runtime crash: check error logs during traffic
gcloud logging read \
'resource.type="cloud_run_revision" AND resource.labels.service_name="my-service" AND severity>=ERROR' \
--project=PROJECT_ID \
--limit=20Frequently asked questions
How do I fix "Failed to start and then listen on the port defined by the PORT environment variable"?
Your application must read the port number from the PORT environment variable at runtime and listen on that value. Cloud Run injects PORT into every container (default 8080). If your app listens on a hardcoded port like 3000 or 5000, Cloud Run cannot route traffic to it. Fix your code to use os.environ.get("PORT", 8080) in Python, process.env.PORT in Node.js, or os.Getenv("PORT") in Go. Then bind to 0.0.0.0, not 127.0.0.1.
Why does my container work locally but fail on Cloud Run?
Four common causes: (1) the container listens on a hardcoded port instead of the PORT environment variable, (2) required environment variables or secrets are not configured in the Cloud Run service, (3) the container tries to reach a private VPC resource without Serverless VPC Access configured, (4) the container was built for ARM (Apple Silicon) and cannot run on Cloud Run x86-64 infrastructure. Run the image locally with docker run -e PORT=8080 -p 8080:8080 to simulate what Cloud Run does.
Can Apple Silicon cause the "container failed to start" error?
Yes. If you build a Docker image on an Apple Silicon Mac (M1/M2/M3/M4) without specifying the target platform, the image contains ARM binaries. Cloud Run runs on x86-64 (amd64) infrastructure, so ARM images fail immediately with "exec format error." Always build with docker build --platform=linux/amd64 when targeting Cloud Run.
How do I know whether this is a startup failure or a runtime crash?
Check the revision status. Run gcloud run revisions describe REVISION --region=REGION --format=yaml and look at the conditions section. If the Ready condition shows status: False, the container never started. If the revision is Ready but requests return 503 or 500, the container started but is crashing at runtime. Startup failures block deployment; runtime crashes happen after traffic begins flowing.
Can secrets, Cloud SQL, or private networking cause "container failed to start"?
Yes. If the service account lacks roles/secretmanager.secretAccessor, secrets mounted via Secret Manager prevent the container from starting. A Cloud SQL connection attempt during init that fails (missing roles/cloudsql.client or no VPC connector for private IP) can crash the process before it binds to PORT. Private dependencies that require Serverless VPC Access will time out during startup if the connector is not configured.