What Is Cloud Run? Serverless Containers on Google Cloud
Cloud Run is Google Cloud’s managed serverless platform for running containers. You package your application into a container image, push it to Google Cloud, and Cloud Run handles everything else: load balancing, scaling, TLS, and the underlying servers. You pay only for the CPU and memory used while your service is actually handling requests. This guide explains what Cloud Run is, how it works, and when it is (and is not) the right tool for your workload.
Cloud Run in simple terms
Imagine you have a web app inside a Docker container sitting on your laptop. Cloud Run takes that container and runs it in Google’s infrastructure. No servers to rent, no OS to patch, no load balancer to configure. When someone sends a request to your app, Cloud Run starts a container instance to handle it. When requests stop, the instances shut down and you stop paying.
The key idea: you bring the container, Cloud Run brings everything else. Your job is to make sure the container listens on the right port and does not store anything important on local disk. Google’s job is to run it reliably and scale it automatically.
A Cloud Run service is like a self-checkout queue that opens new lanes automatically when customers arrive and closes them when it goes quiet. You pay per lane only while it is in use. The trade-off: opening a brand-new lane takes a moment, so the first customer in a newly opened lane waits a bit longer. That delay is the cold start.
Not sure which GCP compute service to use? If your app fits in a container and responds to HTTP requests, Cloud Run is the right default. Switch to something else only when you hit a specific limitation.
What is Cloud Run?
Cloud Run is a fully managed, serverless container platform on Google Cloud. Here is what that means in practice:
- Fully managed: no VMs, no OS patching, no capacity planning. Google manages the underlying infrastructure.
- Serverless: you do not provision or configure servers. Resources appear automatically when needed.
- Container-based: you deploy a standard container image, not code in a specific language or runtime format.
- Request-driven: Cloud Run starts instances when requests arrive and stops them when traffic drops.
- Scales to zero: if no requests come in, all instances stop and compute billing stops with them.
- Stateless: each instance is ephemeral. Nothing written to local disk survives a restart.
Cloud Run runs stateless HTTP services. If your application responds to HTTP requests and does not depend on local disk state between requests, Cloud Run is almost certainly the simplest way to run it on Google Cloud.
How Cloud Run works
Here is the full lifecycle from code to live service:
- You write your app and a Dockerfile. The app must listen on the
PORTenvironment variable. - You build the container image and push it to Artifact Registry, Google’s managed container registry.
- You deploy the image to Cloud Run, choosing a region and configuration options.
- Cloud Run provisions a Google-managed HTTPS endpoint for your service.
- When a request arrives, Cloud Run routes it to an available container instance.
- If no instance is ready, Cloud Run starts a new one. That startup delay is the cold start.
- Each instance can handle multiple concurrent requests simultaneously (default: 80 per instance).
- When all instances are at capacity, Cloud Run starts additional instances automatically.
- When load drops, surplus instances are stopped after a short cooldown window.
- At zero traffic, all instances stop and compute billing drops to zero.
The scaling is continuous and automatic. You do not set up autoscaling rules or configure a load balancer. Cloud Run’s control plane manages all of it. See Cloud Run scaling behaviour for a detailed breakdown of how concurrency, minimum instances, and cold starts interact.
Key Cloud Run concepts
Before you deploy your first service, these terms are worth understanding:
Service: the top-level resource. A service has a stable URL and manages the lifecycle of its revisions. When you deploy an update, you create a new revision.
Revision: an immutable snapshot of a deployment with a specific image, configuration, and resource settings. You can split traffic between revisions for gradual rollouts.
Container image: the packaged application stored in Artifact Registry. Cloud Run pulls the image when starting a new instance. Smaller images mean faster cold starts.
PORT: Cloud Run sets a
PORTenvironment variable (default 8080). Your app must listen on this exact port, not a hardcoded value.Concurrency: how many requests a single instance handles at once. The default is 80. Higher concurrency means fewer instances needed but more memory pressure per instance.
Cold start: the delay when Cloud Run must start a new instance from scratch, caused by pulling the image and initialising the app. Typically 200ms to several seconds depending on image size and startup logic.
Minimum instances: the number of instances kept warm at all times. Setting this to 1 eliminates cold starts for latency-sensitive services, at the cost of always paying for that one warm instance.
Maximum instances: a cap on how many instances Cloud Run can start. Useful for protecting downstream services (a database, for example) from being overwhelmed during a traffic spike.
Stateless design: instances can be stopped at any time. Local filesystem writes do not persist. Use Cloud Storage for files and Cloud SQL or Firestore for data.
Think of a Cloud Run revision like a sealed package. Once deployed, it never changes. If you update your app, Cloud Run creates a new package (revision) and gradually shifts traffic to it. If something goes wrong, you point traffic back to the previous package instantly. Nothing inside either package has been touched.
When to use Cloud Run
Cloud Run works well for most HTTP-driven, stateless workloads. Common use cases:
- REST APIs: public or internal APIs that respond to HTTP requests.
- Web applications: full-stack apps, server-rendered pages, or single-page app back-ends.
- Webhooks: receiving events from third-party services like GitHub, Stripe, or Twilio.
- Internal tools: dashboards, admin UIs, or data export services used within a company.
- Lightweight microservices: individual services in a larger microservices architecture.
- Event-driven HTTP endpoints: services that respond to Pub/Sub push subscriptions or Eventarc events.
- Scheduled batch jobs: using Cloud Run Jobs triggered by Cloud Scheduler.
The sweet spot for Cloud Run is variable traffic. If your service sees bursts of requests in the morning and almost none overnight, Cloud Run scales down to zero overnight and costs you nothing during those idle hours. A VM keeps billing whether it is busy or idle.
If you want to get hands-on immediately, the deploying your first Cloud Run service guide walks through everything from writing a Dockerfile to a live HTTPS URL.
When Cloud Run is not the right choice
Cloud Run is not universal. A few workload types consistently fit better elsewhere:
| Workload | Use Cloud Run? | Better alternative |
|---|---|---|
| Stateless HTTP API, variable traffic | Yes, ideal fit | Cloud Run is the right tool |
| Long-running background daemon | No | Compute Engine VM or GKE workload |
| Batch job with defined start and end | Use Jobs | Cloud Run Jobs or Cloud Batch |
| Stateful app that writes to local disk | No | Compute Engine with persistent disk |
| GPU-accelerated ML inference | Limited | Compute Engine with GPU, or GKE with GPU nodes |
| Kubernetes-native workloads (CRDs, operators) | No | GKE |
| App requiring host-level OS access | No | Compute Engine |
Cloud Run vs other GCP compute options
Choosing between Cloud Run, Compute Engine, Cloud Functions, and GKE is one of the most common questions beginners face. Here is a practical breakdown of each.
Cloud Run vs Compute Engine
Compute Engine gives you a full virtual machine with complete OS access, persistent local disk, and your choice of installed software. Cloud Run gives you none of that. You get a container runtime and nothing more.
- Choose Cloud Run when your workload is stateless, HTTP-driven, and you want zero server management.
- Choose Compute Engine when you need persistent local disk, GPU access, custom kernel modules, or a long-running background process with no HTTP entrypoint.
- Simple rule: if it fits in a container and responds to HTTP, start with Cloud Run. If it does not, use a VM.
See the Cloud Run vs Compute Engine comparison for a deeper look.
Cloud Run vs Cloud Functions
Cloud Functions is a functions-as-a-service platform. You write a single function and Google handles the runtime and surrounding infrastructure. Cloud Run requires you to build and manage a container image, but gives you full control over the runtime environment.
- Choose Cloud Run for full applications, custom runtimes, larger codebases, or when you need precise control over dependencies.
- Choose Cloud Functions for small, single-purpose event handlers where you want minimal boilerplate.
- Simple rule: if you already have a container or a web framework, use Cloud Run. If you are writing a small event handler from scratch, Cloud Functions involves less overhead.
See the full Cloud Run vs Cloud Functions comparison.
Cloud Run vs GKE
GKE is Google’s managed Kubernetes service. It runs containers like Cloud Run does, but with full Kubernetes control: custom networking, sidecars, persistent volumes, and cluster-level configuration. That power comes with significant operational complexity.
- Choose Cloud Run when you want managed scaling with no Kubernetes knowledge required, and your service is stateless HTTP.
- Choose GKE when you need Kubernetes-native features: custom resource definitions, service meshes, persistent storage, or workloads that do not fit the HTTP request model.
- Simple rule: Cloud Run is the right default. Use GKE when you have a specific requirement that Cloud Run cannot satisfy.
See the GKE vs Cloud Run comparison, or the broader choosing between Cloud Run, GKE, and VMs guide.
Container requirements
Cloud Run does not care what language or framework your app uses, but it does have a few hard requirements:
Listen on
PORT: Cloud Run sets aPORTenvironment variable (default 8080). Your app must listen on that port. Do not hardcode port 8080 — readPORTfrom the environment so it works correctly if the value ever changes.Be stateless: instances can stop at any time. Do not write anything to local disk that you need to survive a restart. Use Cloud Storage for files, Cloud SQL or Firestore for database data.
Respond within the request timeout: the maximum request timeout is 3600 seconds. For longer-running tasks, use Cloud Run Jobs or Cloud Batch.
Start fast: slow container startup makes cold starts worse. Use small base images, minimise dependencies, and defer any non-essential initialisation work. The container images in GCP guide covers how to build lean, fast-starting images.
The PORT requirement is the most common first mistake. If you
hardcode a port like 8080 and Cloud Run sets PORT to something
different, your container will start but requests will never reach your app.
Always read PORT dynamically from the environment.
Simple deployment example
This is the core deployment command. It pulls a container image from Artifact
Registry, deploys it to Cloud Run in us-central1, and makes it
publicly accessible over HTTPS:
# Deploy a container from Artifact Registry
gcloud run deploy my-service \
--image=us-central1-docker.pkg.dev/PROJECT_ID/my-repo/my-app:latest \
--region=us-central1 \
--platform=managed \
--allow-unauthenticatedThe —allow-unauthenticated flag makes the service public. Anyone
can call it without a Google identity token. For internal services, remove this
flag and callers must present a valid ID token.
You can also pass configuration at deploy time:
# Deploy with resource limits and environment variables
gcloud run deploy my-service \
--image=us-central1-docker.pkg.dev/PROJECT_ID/my-repo/my-app:latest \
--region=us-central1 \
--set-env-vars="DB_HOST=10.0.0.5,APP_ENV=production" \
--memory=512Mi \
--cpu=1
# Get the live URL after deployment
gcloud run services describe my-service \
--region=us-central1 \
--format="value(status.url)"For a complete step-by-step walkthrough including Dockerfile writing, image building, and first deployment, see deploying your first Cloud Run service.
Secrets, config, and service identity
Use plain environment variables for non-sensitive configuration like feature flags, log levels, and hostnames. For secrets (database passwords, API keys, service credentials), use Secret Manager.
Never hardcode secrets as environment variables in your Cloud Run service definition. Those values are stored in plaintext and visible to anyone with access to the service configuration. Use Secret Manager mounts instead.
# Set plain environment variables at deploy time
gcloud run deploy my-service \
--image=IMAGE \
--region=us-central1 \
--set-env-vars="APP_ENV=production,LOG_LEVEL=info"
# Mount a secret as an environment variable
gcloud run deploy my-service \
--image=IMAGE \
--region=us-central1 \
--set-secrets="DB_PASSWORD=my-db-password:latest"
# Mount a secret as a file
gcloud run deploy my-service \
--image=IMAGE \
--region=us-central1 \
--set-secrets="/secrets/key.json=my-service-key:latest"Each Cloud Run service runs as a service account. That service account must have
roles/secretmanager.secretAccessor on each secret it needs. GCP
injects the secret value at instance startup and your application reads it as a
normal environment variable or file path.
For a full explanation of how Cloud Run handles authentication and authorisation, including the difference between public and private services, see the Cloud Run security model.
Pricing and scaling basics
Cloud Run’s pricing model is one of its biggest advantages: you only pay for what you actually use.
Scale to zero: if your service handles no requests, it scales down to zero instances and compute billing stops entirely. A hobby project or low-traffic internal tool can cost nothing overnight.
Billed while handling requests: you are charged for the CPU and memory used during request processing, rounded to the nearest 100ms. Idle time between requests is free by default.
Minimum instances cost money at all times: setting
—min-instances=1keeps one container warm permanently. This eliminates cold starts but means you pay for that instance 24/7, even at zero traffic.Concurrency affects instance count: at the default concurrency of 80, a single instance handles 80 simultaneous requests. Reduce concurrency for CPU-heavy work and increase it for I/O-bound work that spends most of its time waiting.
Max instances protect downstream services: capping max instances prevents a traffic spike from overwhelming a downstream database or API that cannot scale as fast as Cloud Run can.
# Keep 1 instance warm at all times (eliminates cold starts)
gcloud run services update my-service \
--region=us-central1 \
--min-instances=1
# Set a maximum instance count to cap costs and protect dependencies
gcloud run services update my-service \
--region=us-central1 \
--max-instances=50For a low-traffic internal tool, cold starts are usually acceptable and minimum instances is unnecessary overhead. For a customer-facing API where the first request of the day must be fast, minimum instances is worth the cost.
For more detail on cost controls, budget alerts, right-sizing, and avoiding surprise bills, see Cloud Run cost optimisation.
Cold starts and minimum instances
Cold starts are the biggest Cloud Run trade-off. When your service has been idle and a request arrives, Cloud Run starts a new instance from scratch. Depending on image size and application startup time, that can add 200ms to several seconds of latency to the first request.
Cold starts are measured in seconds, not milliseconds. For a customer-facing
API that goes idle overnight, the first request of the morning may feel broken
to users. If your service must always respond quickly, set
—min-instances=1.
The factors that make cold starts worse:
- Large container images (more to pull from the registry)
- Interpreted languages with slow initialisation (Python, Ruby, JVM)
- Apps that do expensive setup work at startup (loading large ML models, opening many connections)
The factors that reduce cold start impact:
- Minimum instances set to 1, keeping one warm instance always ready
- Small, lean container images built from minimal base images
- Deferring non-essential setup until after the server is ready to accept requests
See Cloud Run scaling behaviour for a full breakdown of how cold starts, minimum instances, and concurrency interact in production.
Observability and networking
Cloud Run automatically sends logs to Cloud Logging. You can view request logs, container logs, and audit logs without any setup. For metrics, Cloud Monitoring provides request count, latency, container instance count, and CPU/memory utilisation out of the box. The Cloud Run monitoring guide covers how to use these effectively in production.
By default, Cloud Run services cannot reach resources inside a VPC, such as a Cloud SQL instance on a private IP or a Redis cache. To access private VPC resources, configure Serverless VPC Access.
Common beginner mistakes
Storing state on the container’s local filesystem. Cloud Run instances are ephemeral. Any file written to the local filesystem is gone when the instance stops. Use Cloud Storage for files and Cloud SQL or Firestore for database data.
Ignoring cold starts for latency-sensitive services. When traffic drops to zero, the next request waits for a new instance to start. This is measured in seconds for a cold container, not milliseconds. Set minimum instances to 1 if the first request of the day must be fast.
Not understanding public vs authenticated access. Cloud Run requires a valid Google identity token by default. A public website or API returns HTTP 403 to anonymous visitors until you explicitly set
—allow-unauthenticated. Conversely, do not set that flag on an internal service that should not be publicly accessible.Using Cloud Run for indefinite background processes. Cloud Run services are HTTP servers. A background worker with no incoming requests, or a job running longer than 3600 seconds, needs Cloud Run Jobs, Cloud Batch, or a Compute Engine VM.
Deploying bloated container images that start slowly. Cold start time is directly proportional to image size and startup logic. Avoid large base images, unused packages, and expensive initialisation work at startup. Keep images lean and defer non-essential setup until after the server is ready to accept requests.
Summary
- Cloud Run runs stateless containers with no VM management and scales to zero at no cost
- You deploy a container image from Artifact Registry; Cloud Run handles load balancing, scaling, and TLS
- Billed per CPU/memory during request handling only, rounded to 100ms
- Cold starts add latency when no warm instance exists; set min-instances=1 to eliminate them
- Maximum request timeout is 3600 seconds (not suitable for indefinite background processes)
- Container must listen on the
PORTenv variable and follow stateless design principles - Use Secret Manager for credentials, not hardcoded environment variables
Frequently asked questions
What is Cloud Run used for?
Cloud Run is used for stateless HTTP workloads: REST APIs, web apps, webhooks, internal tools, and lightweight microservices. It handles variable traffic well, scaling up when load increases and down to zero when idle. It is a good fit whenever you want to run a containerised app without managing servers, VMs, or load balancers.
Is Cloud Run the same as Cloud Functions?
No. Cloud Run runs a container image you build and control. You define the runtime, the web framework, and all dependencies. Cloud Functions is a functions-as-a-service product where you write a single function and Google handles the rest. Cloud Run gives more control and is better for full applications. Cloud Functions is simpler for small, single-purpose event handlers.
Does Cloud Run need Docker?
Cloud Run needs a container image, which is most commonly built with Docker. You write a Dockerfile, build the image, push it to Artifact Registry, and deploy it to Cloud Run. You can also use Cloud Build to build images without installing Docker locally. The container must listen on the PORT environment variable that Cloud Run sets.
When should I use Cloud Run instead of Compute Engine?
Use Cloud Run when your workload is HTTP-driven, stateless, and has variable or unpredictable traffic. Cloud Run removes all VM management and scales to zero at no cost. Use Compute Engine when you need persistent local state, GPU access, custom kernel modules, a long-running daemon with no HTTP entrypoint, or requests exceeding 3600 seconds. For stateless APIs and web apps, Cloud Run is almost always the simpler and cheaper choice.
Can Cloud Run run background jobs?
Not with Cloud Run services, which are HTTP servers. For batch work with a defined start and end, use Cloud Run Jobs. You deploy the same kind of container but it runs to completion rather than waiting for HTTP requests. For indefinitely running background workers, use a Compute Engine VM or a GKE workload. Cloud Run Jobs can be triggered manually, on a schedule with Cloud Scheduler, or from Workflows.