What Is Cloud Run? Serverless Containers on Google Cloud

Cloud Run is Google Cloud’s managed serverless platform for running containers. You package your application into a container image, push it to Google Cloud, and Cloud Run handles everything else: load balancing, scaling, TLS, and the underlying servers. You pay only for the CPU and memory used while your service is actually handling requests. This guide explains what Cloud Run is, how it works, and when it is (and is not) the right tool for your workload.

Cloud Run in simple terms

Imagine you have a web app inside a Docker container sitting on your laptop. Cloud Run takes that container and runs it in Google’s infrastructure. No servers to rent, no OS to patch, no load balancer to configure. When someone sends a request to your app, Cloud Run starts a container instance to handle it. When requests stop, the instances shut down and you stop paying.

The key idea: you bring the container, Cloud Run brings everything else. Your job is to make sure the container listens on the right port and does not store anything important on local disk. Google’s job is to run it reliably and scale it automatically.

Analogy

A Cloud Run service is like a self-checkout queue that opens new lanes automatically when customers arrive and closes them when it goes quiet. You pay per lane only while it is in use. The trade-off: opening a brand-new lane takes a moment, so the first customer in a newly opened lane waits a bit longer. That delay is the cold start.

Tip

Not sure which GCP compute service to use? If your app fits in a container and responds to HTTP requests, Cloud Run is the right default. Switch to something else only when you hit a specific limitation.

What is Cloud Run?

Cloud Run is a fully managed, serverless container platform on Google Cloud. Here is what that means in practice:

  • Fully managed: no VMs, no OS patching, no capacity planning. Google manages the underlying infrastructure.
  • Serverless: you do not provision or configure servers. Resources appear automatically when needed.
  • Container-based: you deploy a standard container image, not code in a specific language or runtime format.
  • Request-driven: Cloud Run starts instances when requests arrive and stops them when traffic drops.
  • Scales to zero: if no requests come in, all instances stop and compute billing stops with them.
  • Stateless: each instance is ephemeral. Nothing written to local disk survives a restart.

Cloud Run runs stateless HTTP services. If your application responds to HTTP requests and does not depend on local disk state between requests, Cloud Run is almost certainly the simplest way to run it on Google Cloud.

How Cloud Run works

Here is the full lifecycle from code to live service:

  1. You write your app and a Dockerfile. The app must listen on the PORT environment variable.
  2. You build the container image and push it to Artifact Registry, Google’s managed container registry.
  3. You deploy the image to Cloud Run, choosing a region and configuration options.
  4. Cloud Run provisions a Google-managed HTTPS endpoint for your service.
  5. When a request arrives, Cloud Run routes it to an available container instance.
  6. If no instance is ready, Cloud Run starts a new one. That startup delay is the cold start.
  7. Each instance can handle multiple concurrent requests simultaneously (default: 80 per instance).
  8. When all instances are at capacity, Cloud Run starts additional instances automatically.
  9. When load drops, surplus instances are stopped after a short cooldown window.
  10. At zero traffic, all instances stop and compute billing drops to zero.

The scaling is continuous and automatic. You do not set up autoscaling rules or configure a load balancer. Cloud Run’s control plane manages all of it. See Cloud Run scaling behaviour for a detailed breakdown of how concurrency, minimum instances, and cold starts interact.

Key Cloud Run concepts

Before you deploy your first service, these terms are worth understanding:

  • Service: the top-level resource. A service has a stable URL and manages the lifecycle of its revisions. When you deploy an update, you create a new revision.

  • Revision: an immutable snapshot of a deployment with a specific image, configuration, and resource settings. You can split traffic between revisions for gradual rollouts.

  • Container image: the packaged application stored in Artifact Registry. Cloud Run pulls the image when starting a new instance. Smaller images mean faster cold starts.

  • PORT: Cloud Run sets a PORT environment variable (default 8080). Your app must listen on this exact port, not a hardcoded value.

  • Concurrency: how many requests a single instance handles at once. The default is 80. Higher concurrency means fewer instances needed but more memory pressure per instance.

  • Cold start: the delay when Cloud Run must start a new instance from scratch, caused by pulling the image and initialising the app. Typically 200ms to several seconds depending on image size and startup logic.

  • Minimum instances: the number of instances kept warm at all times. Setting this to 1 eliminates cold starts for latency-sensitive services, at the cost of always paying for that one warm instance.

  • Maximum instances: a cap on how many instances Cloud Run can start. Useful for protecting downstream services (a database, for example) from being overwhelmed during a traffic spike.

  • Stateless design: instances can be stopped at any time. Local filesystem writes do not persist. Use Cloud Storage for files and Cloud SQL or Firestore for data.

Analogy

Think of a Cloud Run revision like a sealed package. Once deployed, it never changes. If you update your app, Cloud Run creates a new package (revision) and gradually shifts traffic to it. If something goes wrong, you point traffic back to the previous package instantly. Nothing inside either package has been touched.

When to use Cloud Run

Cloud Run works well for most HTTP-driven, stateless workloads. Common use cases:

  • REST APIs: public or internal APIs that respond to HTTP requests.
  • Web applications: full-stack apps, server-rendered pages, or single-page app back-ends.
  • Webhooks: receiving events from third-party services like GitHub, Stripe, or Twilio.
  • Internal tools: dashboards, admin UIs, or data export services used within a company.
  • Lightweight microservices: individual services in a larger microservices architecture.
  • Event-driven HTTP endpoints: services that respond to Pub/Sub push subscriptions or Eventarc events.
  • Scheduled batch jobs: using Cloud Run Jobs triggered by Cloud Scheduler.
Tip

The sweet spot for Cloud Run is variable traffic. If your service sees bursts of requests in the morning and almost none overnight, Cloud Run scales down to zero overnight and costs you nothing during those idle hours. A VM keeps billing whether it is busy or idle.

If you want to get hands-on immediately, the deploying your first Cloud Run service guide walks through everything from writing a Dockerfile to a live HTTPS URL.

When Cloud Run is not the right choice

Cloud Run is not universal. A few workload types consistently fit better elsewhere:

WorkloadUse Cloud Run?Better alternative
Stateless HTTP API, variable trafficYes, ideal fitCloud Run is the right tool
Long-running background daemonNoCompute Engine VM or GKE workload
Batch job with defined start and endUse JobsCloud Run Jobs or Cloud Batch
Stateful app that writes to local diskNoCompute Engine with persistent disk
GPU-accelerated ML inferenceLimitedCompute Engine with GPU, or GKE with GPU nodes
Kubernetes-native workloads (CRDs, operators)NoGKE
App requiring host-level OS accessNoCompute Engine

Cloud Run vs other GCP compute options

Choosing between Cloud Run, Compute Engine, Cloud Functions, and GKE is one of the most common questions beginners face. Here is a practical breakdown of each.

Cloud Run vs Compute Engine

Compute Engine gives you a full virtual machine with complete OS access, persistent local disk, and your choice of installed software. Cloud Run gives you none of that. You get a container runtime and nothing more.

  • Choose Cloud Run when your workload is stateless, HTTP-driven, and you want zero server management.
  • Choose Compute Engine when you need persistent local disk, GPU access, custom kernel modules, or a long-running background process with no HTTP entrypoint.
  • Simple rule: if it fits in a container and responds to HTTP, start with Cloud Run. If it does not, use a VM.

See the Cloud Run vs Compute Engine comparison for a deeper look.

Cloud Run vs Cloud Functions

Cloud Functions is a functions-as-a-service platform. You write a single function and Google handles the runtime and surrounding infrastructure. Cloud Run requires you to build and manage a container image, but gives you full control over the runtime environment.

  • Choose Cloud Run for full applications, custom runtimes, larger codebases, or when you need precise control over dependencies.
  • Choose Cloud Functions for small, single-purpose event handlers where you want minimal boilerplate.
  • Simple rule: if you already have a container or a web framework, use Cloud Run. If you are writing a small event handler from scratch, Cloud Functions involves less overhead.

See the full Cloud Run vs Cloud Functions comparison.

Cloud Run vs GKE

GKE is Google’s managed Kubernetes service. It runs containers like Cloud Run does, but with full Kubernetes control: custom networking, sidecars, persistent volumes, and cluster-level configuration. That power comes with significant operational complexity.

  • Choose Cloud Run when you want managed scaling with no Kubernetes knowledge required, and your service is stateless HTTP.
  • Choose GKE when you need Kubernetes-native features: custom resource definitions, service meshes, persistent storage, or workloads that do not fit the HTTP request model.
  • Simple rule: Cloud Run is the right default. Use GKE when you have a specific requirement that Cloud Run cannot satisfy.

See the GKE vs Cloud Run comparison, or the broader choosing between Cloud Run, GKE, and VMs guide.

Container requirements

Cloud Run does not care what language or framework your app uses, but it does have a few hard requirements:

  • Listen on PORT: Cloud Run sets a PORT environment variable (default 8080). Your app must listen on that port. Do not hardcode port 8080 — read PORT from the environment so it works correctly if the value ever changes.

  • Be stateless: instances can stop at any time. Do not write anything to local disk that you need to survive a restart. Use Cloud Storage for files, Cloud SQL or Firestore for database data.

  • Respond within the request timeout: the maximum request timeout is 3600 seconds. For longer-running tasks, use Cloud Run Jobs or Cloud Batch.

  • Start fast: slow container startup makes cold starts worse. Use small base images, minimise dependencies, and defer any non-essential initialisation work. The container images in GCP guide covers how to build lean, fast-starting images.

Note

The PORT requirement is the most common first mistake. If you hardcode a port like 8080 and Cloud Run sets PORT to something different, your container will start but requests will never reach your app. Always read PORT dynamically from the environment.

Simple deployment example

This is the core deployment command. It pulls a container image from Artifact Registry, deploys it to Cloud Run in us-central1, and makes it publicly accessible over HTTPS:

# Deploy a container from Artifact Registry
gcloud run deploy my-service \
  --image=us-central1-docker.pkg.dev/PROJECT_ID/my-repo/my-app:latest \
  --region=us-central1 \
  --platform=managed \
  --allow-unauthenticated

The —allow-unauthenticated flag makes the service public. Anyone can call it without a Google identity token. For internal services, remove this flag and callers must present a valid ID token.

You can also pass configuration at deploy time:

# Deploy with resource limits and environment variables
gcloud run deploy my-service \
  --image=us-central1-docker.pkg.dev/PROJECT_ID/my-repo/my-app:latest \
  --region=us-central1 \
  --set-env-vars="DB_HOST=10.0.0.5,APP_ENV=production" \
  --memory=512Mi \
  --cpu=1

# Get the live URL after deployment
gcloud run services describe my-service \
  --region=us-central1 \
  --format="value(status.url)"

For a complete step-by-step walkthrough including Dockerfile writing, image building, and first deployment, see deploying your first Cloud Run service.

Secrets, config, and service identity

Use plain environment variables for non-sensitive configuration like feature flags, log levels, and hostnames. For secrets (database passwords, API keys, service credentials), use Secret Manager.

Danger

Never hardcode secrets as environment variables in your Cloud Run service definition. Those values are stored in plaintext and visible to anyone with access to the service configuration. Use Secret Manager mounts instead.

# Set plain environment variables at deploy time
gcloud run deploy my-service \
  --image=IMAGE \
  --region=us-central1 \
  --set-env-vars="APP_ENV=production,LOG_LEVEL=info"

# Mount a secret as an environment variable
gcloud run deploy my-service \
  --image=IMAGE \
  --region=us-central1 \
  --set-secrets="DB_PASSWORD=my-db-password:latest"

# Mount a secret as a file
gcloud run deploy my-service \
  --image=IMAGE \
  --region=us-central1 \
  --set-secrets="/secrets/key.json=my-service-key:latest"

Each Cloud Run service runs as a service account. That service account must have roles/secretmanager.secretAccessor on each secret it needs. GCP injects the secret value at instance startup and your application reads it as a normal environment variable or file path.

For a full explanation of how Cloud Run handles authentication and authorisation, including the difference between public and private services, see the Cloud Run security model.

Pricing and scaling basics

Cloud Run’s pricing model is one of its biggest advantages: you only pay for what you actually use.

  • Scale to zero: if your service handles no requests, it scales down to zero instances and compute billing stops entirely. A hobby project or low-traffic internal tool can cost nothing overnight.

  • Billed while handling requests: you are charged for the CPU and memory used during request processing, rounded to the nearest 100ms. Idle time between requests is free by default.

  • Minimum instances cost money at all times: setting —min-instances=1 keeps one container warm permanently. This eliminates cold starts but means you pay for that instance 24/7, even at zero traffic.

  • Concurrency affects instance count: at the default concurrency of 80, a single instance handles 80 simultaneous requests. Reduce concurrency for CPU-heavy work and increase it for I/O-bound work that spends most of its time waiting.

  • Max instances protect downstream services: capping max instances prevents a traffic spike from overwhelming a downstream database or API that cannot scale as fast as Cloud Run can.

# Keep 1 instance warm at all times (eliminates cold starts)
gcloud run services update my-service \
  --region=us-central1 \
  --min-instances=1

# Set a maximum instance count to cap costs and protect dependencies
gcloud run services update my-service \
  --region=us-central1 \
  --max-instances=50
Note

For a low-traffic internal tool, cold starts are usually acceptable and minimum instances is unnecessary overhead. For a customer-facing API where the first request of the day must be fast, minimum instances is worth the cost.

For more detail on cost controls, budget alerts, right-sizing, and avoiding surprise bills, see Cloud Run cost optimisation.

Cold starts and minimum instances

Cold starts are the biggest Cloud Run trade-off. When your service has been idle and a request arrives, Cloud Run starts a new instance from scratch. Depending on image size and application startup time, that can add 200ms to several seconds of latency to the first request.

Warning

Cold starts are measured in seconds, not milliseconds. For a customer-facing API that goes idle overnight, the first request of the morning may feel broken to users. If your service must always respond quickly, set —min-instances=1.

The factors that make cold starts worse:

  • Large container images (more to pull from the registry)
  • Interpreted languages with slow initialisation (Python, Ruby, JVM)
  • Apps that do expensive setup work at startup (loading large ML models, opening many connections)

The factors that reduce cold start impact:

  • Minimum instances set to 1, keeping one warm instance always ready
  • Small, lean container images built from minimal base images
  • Deferring non-essential setup until after the server is ready to accept requests

See Cloud Run scaling behaviour for a full breakdown of how cold starts, minimum instances, and concurrency interact in production.

Observability and networking

Cloud Run automatically sends logs to Cloud Logging. You can view request logs, container logs, and audit logs without any setup. For metrics, Cloud Monitoring provides request count, latency, container instance count, and CPU/memory utilisation out of the box. The Cloud Run monitoring guide covers how to use these effectively in production.

By default, Cloud Run services cannot reach resources inside a VPC, such as a Cloud SQL instance on a private IP or a Redis cache. To access private VPC resources, configure Serverless VPC Access.

Common beginner mistakes

  1. Storing state on the container’s local filesystem. Cloud Run instances are ephemeral. Any file written to the local filesystem is gone when the instance stops. Use Cloud Storage for files and Cloud SQL or Firestore for database data.

  2. Ignoring cold starts for latency-sensitive services. When traffic drops to zero, the next request waits for a new instance to start. This is measured in seconds for a cold container, not milliseconds. Set minimum instances to 1 if the first request of the day must be fast.

  3. Not understanding public vs authenticated access. Cloud Run requires a valid Google identity token by default. A public website or API returns HTTP 403 to anonymous visitors until you explicitly set —allow-unauthenticated. Conversely, do not set that flag on an internal service that should not be publicly accessible.

  4. Using Cloud Run for indefinite background processes. Cloud Run services are HTTP servers. A background worker with no incoming requests, or a job running longer than 3600 seconds, needs Cloud Run Jobs, Cloud Batch, or a Compute Engine VM.

  5. Deploying bloated container images that start slowly. Cold start time is directly proportional to image size and startup logic. Avoid large base images, unused packages, and expensive initialisation work at startup. Keep images lean and defer non-essential setup until after the server is ready to accept requests.

Frequently asked questions

What is Cloud Run used for?

Cloud Run is used for stateless HTTP workloads: REST APIs, web apps, webhooks, internal tools, and lightweight microservices. It handles variable traffic well, scaling up when load increases and down to zero when idle. It is a good fit whenever you want to run a containerised app without managing servers, VMs, or load balancers.

Is Cloud Run the same as Cloud Functions?

No. Cloud Run runs a container image you build and control. You define the runtime, the web framework, and all dependencies. Cloud Functions is a functions-as-a-service product where you write a single function and Google handles the rest. Cloud Run gives more control and is better for full applications. Cloud Functions is simpler for small, single-purpose event handlers.

Does Cloud Run need Docker?

Cloud Run needs a container image, which is most commonly built with Docker. You write a Dockerfile, build the image, push it to Artifact Registry, and deploy it to Cloud Run. You can also use Cloud Build to build images without installing Docker locally. The container must listen on the PORT environment variable that Cloud Run sets.

When should I use Cloud Run instead of Compute Engine?

Use Cloud Run when your workload is HTTP-driven, stateless, and has variable or unpredictable traffic. Cloud Run removes all VM management and scales to zero at no cost. Use Compute Engine when you need persistent local state, GPU access, custom kernel modules, a long-running daemon with no HTTP entrypoint, or requests exceeding 3600 seconds. For stateless APIs and web apps, Cloud Run is almost always the simpler and cheaper choice.

Can Cloud Run run background jobs?

Not with Cloud Run services, which are HTTP servers. For batch work with a defined start and end, use Cloud Run Jobs. You deploy the same kind of container but it runs to completion rather than waiting for HTTP requests. For indefinitely running background workers, use a Compute Engine VM or a GKE workload. Cloud Run Jobs can be triggered manually, on a schedule with Cloud Scheduler, or from Workflows.

Last verified: 22 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.