Cloud Run vs GKE vs Compute Engine: Which Should You Use in GCP?

GCP gives you three main ways to run workloads: Cloud Run, GKE, and Compute Engine. The short answer: start with Cloud Run for new stateless HTTP services, use GKE when you need Kubernetes features specifically, and use Compute Engine when you need full OS control or cannot containerise your workload. Most teams choose wrong because they reach for what is familiar rather than what fits. This page gives you a clear decision framework so you can pick the right option the first time.

Simple explanation

If this is your first time choosing a GCP compute platform, here is the plain-English version:

Cloud Run — You package your app in a container and hand it to Google. Google handles servers, scaling, load balancing, and TLS certificates. You pay only when your app is actually handling requests. Best for web APIs, webhooks, and HTTP services that do not store data locally.
GKE (Google Kubernetes Engine) — You run a managed Kubernetes cluster. Kubernetes is an orchestration system that manages containers across multiple machines, with advanced features like multi-container pods, persistent storage, and traffic management. More control than Cloud Run, but significantly more to manage.
Compute Engine — You get a full virtual machine. You choose the operating system, install software, configure networking, and manage updates yourself. Maximum control, maximum responsibility. Best for legacy apps, Windows workloads, and anything that cannot run in a container.

Analogy

Think of it like food service. Cloud Run is like ordering delivery: you describe what you want, someone else handles all the cooking and logistics, and you only pay per order. GKE is like renting a commercial kitchen: you have full control over what gets cooked and how, but you staff and manage the kitchen yourself. Compute Engine is like buying the entire restaurant building: you own everything, you control everything, and you are responsible for the plumbing.

At a glance

	Cloud Run	GKE	Compute Engine
Best for	Stateless HTTP and gRPC services	Container orchestration with Kubernetes features	Full OS control, legacy apps, non-containerisable workloads
Operational overhead	Very low	Medium to high	High
Scaling model	Automatic, per-request	HPA + node autoscaler	Manual or managed instance groups
Scales to zero	Yes	No (node pools persist)	No
Stateful workloads	No	Yes (PersistentVolumes)	Yes (persistent disks)
HTTP services	Yes, natively	Yes, via Ingress	Yes, manually configured
Background workers	Limited (Cloud Run Jobs)	Yes	Yes
Sidecars / multi-container	No	Yes (pods)	No
OS-level control	No	Node OS access in Standard mode	Full
Non-HTTP protocols	gRPC yes; TCP/UDP limited	Yes	Yes
Kubernetes required	No	Yes	No
Pricing model	Per request (CPU + memory during handling)	Per node (always on) + cluster fee	Per VM-second while running
Typical team fit	Small teams, no Kubernetes expertise needed	Teams with Kubernetes experience	Teams managing servers or doing lift-and-shift
Main downside	Cold starts; stateless containers only	Operational complexity and cost floor	Manual management of everything

How this decision works

The decision is not about picking the most powerful option. It is about picking the simplest option that meets your actual requirements. Each step up in complexity adds genuine operational burden. Only take that step when a specific workload requirement demands it.

Tip

Before choosing a platform, ask one question first: is the workload stateless and HTTP-driven? If yes, start with Cloud Run. You can always migrate to GKE using the same container image later if you hit a genuine limitation.

Frame the full choice around these questions:

Can it be containerised? If your workload requires a custom kernel module, Windows Server, specific hardware drivers, or an OS configuration that cannot be captured in a container image, you need a VM. See Compute Engine Overview for what a VM gives you.
Is it stateless? A stateless service handles each request independently. It stores nothing locally between requests and any instance can serve any request. Cloud Run requires this. If your service writes to local disk or maintains in-memory session state across requests, it is not stateless. Read more in Stateless vs Stateful Services.
Is it HTTP or gRPC? Cloud Run is built around HTTP and gRPC. If your workload uses raw TCP, UDP, or other protocols, Cloud Run is not a good fit.
Do you need Kubernetes features specifically? GKE is the right answer when you need features only Kubernetes provides. Sidecars are second containers running alongside your main app in the same pod, commonly used for log collectors, service mesh proxies, or init containers. PersistentVolumes are storage volumes attached to pods that survive restarts, used for self-hosted databases and caches. A service mesh like Istio adds mTLS and advanced traffic management between services. If you just need to run containers, Cloud Run does that without Kubernetes overhead. See What Is Kubernetes for background.
How much operational complexity can your team sustain? GKE requires expertise in cluster management, node pools, Kubernetes RBAC, version upgrades, and networking. Cloud Run requires none of that. A small team without dedicated platform engineering capacity should start on Cloud Run.

When to use Cloud Run

Cloud Run is the right default for new services on GCP. You deploy a container image and Cloud Run handles load balancing, TLS termination, and scaling automatically. There is no cluster to configure, no OS to patch, and no idle cost when traffic drops to zero.

Note

The main trade-off is the cold start: when your service has been idle and a new request arrives, Cloud Run must boot a fresh container instance from scratch. This adds anywhere from a few hundred milliseconds to several seconds on that first request. For latency-sensitive services, set minimum instances to 1 to keep a warm instance ready. Learn more in Cloud Run Scaling Behaviour.

Ideal workloads

REST APIs and GraphQL endpoints with variable or unpredictable traffic
Webhooks and event handlers triggered by Pub/Sub, Eventarc, or third-party services
Public-facing web applications with stateless rendering
Internal microservices communicating over HTTP or gRPC
Scheduled jobs and batch tasks via Cloud Run Jobs
Short-lived processing: image resizing, PDF generation, data transformation

When not to choose Cloud Run

Your service writes to local disk and expects that data to persist — Cloud Run instances are ephemeral
You need multi-container pods with a sidecar proxy, log collector, or init container — that requires GKE
Your workload is a long-running daemon with no HTTP interface
You need non-HTTP protocols at scale, such as raw TCP or UDP
Request processing exceeds 3600 seconds — Cloud Run has a hard maximum timeout

Concrete examples

User authentication API. Stateless, HTTP, low to moderate traffic, variable load. Cloud Run scales to zero overnight and wakes on demand, costing nothing at idle.
Image thumbnail service. Triggered by a Pub/Sub message when a file lands in Cloud Storage, resizes the image and writes the result back. Scales to zero between bursts.
Webhook receiver. Listens for payment provider events and writes records to Cloud SQL. No persistent local state, variable traffic, minimal deployment overhead.
Internal reporting API. Called by an admin dashboard a few hundred times per day. Scales to zero completely to avoid idle VM costs.

For security details on how Cloud Run handles identity and access, see Cloud Run Security Model. For a full overview of the service, see Cloud Run Overview. For a direct comparison with Compute Engine specifically, see Cloud Run vs Compute Engine.

When to use GKE

GKE is Google’s managed Kubernetes service. Google operates the Kubernetes control plane so you do not have to manage etcd, the API server, or control plane upgrades. But you still manage node pools, cluster version upgrades, networking configuration, RBAC policies, and pod security.

GKE is worth that overhead when you genuinely need what Kubernetes uniquely provides. If the main reason you are considering GKE is that your app runs in a container, that is not sufficient. Cloud Run also runs containers with far less operational burden.

Warning

GKE Standard clusters carry a cluster management fee of roughly $0.10 per cluster-hour (around $74/month), plus always-on node costs. A two-node cluster running a stateless API that Cloud Run handles equally well is paying a significant operational and financial tax for no benefit. Only choose GKE when you have a concrete need for a Kubernetes feature — not just containers.

Ideal workloads

Applications that need sidecar containers — a service mesh proxy (Envoy), a log collector (Fluentd), or an init container running setup before the main app starts
Stateful workloads requiring PersistentVolumes — self-hosted databases, message queues, and distributed caches where pod restarts must not lose data
Workloads using non-HTTP protocols: raw TCP, UDP, or custom binary protocols
ML training jobs with GPU node pools and custom node affinity rules
Teams already running Kubernetes manifests that would require significant rework to migrate away
Architectures using a service mesh (Istio, Cloud Service Mesh) for mTLS, traffic splitting, or detailed observability

When not to choose GKE

You are building new stateless HTTP services — Cloud Run is simpler and cheaper
Your team has no Kubernetes experience and no plan to develop it
You are a small team without a dedicated infrastructure engineer
Your main reason for choosing GKE is familiarity with Kubernetes from a previous role

Concrete examples

Service mesh microservices. Ten services communicating over mTLS with Istio routing policies. Requires sidecar injection per pod, which only works in Kubernetes.
Self-hosted Kafka or Redis. Needs PersistentVolumes so data survives pod restarts. Kubernetes StatefulSets handle this correctly; Cloud Run cannot.
ML training pipeline. Requires GPU node pools, custom tolerations, and node affinity rules. GKE gives direct control over node configuration.
Migrating a large existing Kubernetes deployment. The team has hundreds of Helm charts and years of Kubernetes manifests. GKE Autopilot reduces node management while preserving the existing deployment model.

If you are choosing between GKE Autopilot and GKE Standard, see GKE Autopilot vs Standard. For a direct head-to-head comparison of GKE and Cloud Run, see GKE vs Cloud Run.

When to use Compute Engine VMs

Compute Engine gives you a full virtual machine. You choose the OS, install packages, configure services, and manage everything yourself. It is the highest-overhead option, but also the most flexible. There is nothing you cannot run on a VM if you are prepared to manage it.

Lift-and-shift means moving an existing on-premises application to the cloud with minimal changes. If an application was designed for a physical server and was never containerised, a VM is often the fastest path to getting it into GCP — even if the long-term goal is to containerise it later.

Tip

Compute Engine is often the right starting point for legacy migrations, not the permanent destination. Get the workload running on a VM first, then evaluate whether containerising it makes sense once it is stable in GCP.

Ideal workloads

Legacy applications that depend on specific OS configurations, kernel modules, or native libraries
Lift-and-shift migrations from on-premises infrastructure
Windows Server workloads that cannot run in a Linux container
Long-running daemons with no HTTP interface that run indefinitely
Applications managing their own distributed clustering: Cassandra, Elasticsearch, Redis Cluster
Workloads requiring specialised hardware: GPU types unsupported by Cloud Run, local NVMe SSDs, or custom networking cards

When not to choose Compute Engine

Your workload is a new stateless HTTP API — a VM adds management overhead for no benefit
Traffic is variable or unpredictable — a VM runs 24/7 and you pay whether it is busy or idle
Your team lacks capacity to manage OS patching, security hardening, and VM lifecycle
You want to containerise and modernise — starting on a VM slows that path down

Concrete examples

Legacy .NET Framework application. Built for IIS on Windows Server, uses COM components that cannot be containerised. Runs on a Windows Compute Engine VM.
On-premises PostgreSQL cluster. Being migrated to GCP with minimal changes before eventual modernisation. Running on Compute Engine VMs first reduces migration risk.
Elasticsearch cluster. Manages its own distributed state, requires low-latency local storage, and handles node discovery internally. Runs on VMs with local SSDs.
GPU video transcoding worker. Long-running, GPU-intensive, non-HTTP job. Uses a Compute Engine VM with a GPU attached and runs continuously.

For VM types, sizing, and configuration options, see Compute Engine Overview. For cost reduction strategies on VMs, see Compute Engine Cost Optimisation.

Cloud Run vs GKE vs Compute Engine: decision flow

Work through these in order. Stop at the first step that gives a clear answer. Most new workloads end at step 2.

Can the workload be containerised?
If no — it needs Windows Server, a custom kernel module, specific hardware drivers, or an OS configuration that cannot go in a container — use a Compute Engine VM.
Is it stateless and HTTP or gRPC?
If yes — it handles each request independently, stores no local state, and communicates over HTTP or gRPC — use Cloud Run. This is the right answer for the majority of new services.
Does it need PersistentVolumes, sidecars, or a service mesh?
If yes — it requires Kubernetes-native features that Cloud Run does not provide — use GKE.
Does it need non-HTTP protocols or indefinite background processing?
If yes — it uses raw TCP, UDP, or needs to run without serving HTTP — use GKE (if containerised) or Compute Engine (if not).
Still not sure?
Use Cloud Run. Migrate to GKE or Compute Engine only when you hit a specific limitation it cannot meet. You can move a Cloud Run service to GKE using the same container image with minimal code changes.

Common beginner mistakes

Choosing GKE just because the app is containerised. Running containers does not mean you need Kubernetes. Cloud Run runs containers with no cluster management, no RBAC configuration, and no node pool sizing. GKE is the right choice when you need Kubernetes features specifically — not just containers.
Putting a simple API on a VM. A VM running one stateless API is on 24/7, requires OS patching, and costs $25–$35/month minimum before storage or networking. The same API on Cloud Run costs a few dollars per month at low traffic and zero at idle. If the workload is stateless and containerisable, a VM adds overhead without benefit.
Choosing based on familiarity, not requirements. Teams comfortable with Kubernetes pick GKE for everything. Teams reluctant to containerise keep everything on VMs. Evaluate each workload against its actual requirements, not your team’s current comfort zone.
Confusing “needs to run all the time” with “must be on a VM”. A service that handles requests continuously does not need a VM. Cloud Run with minimum instances set to 1 is always warm. GKE keeps containers running indefinitely. The difference is about OS control, protocol support, and stateful storage — not uptime.
Treating the decision as permanent. A workload that starts simple enough for Cloud Run may grow to need GKE features. A VM-based workload may become containerisable. Revisit compute platform choices when requirements change rather than accepting early decisions as permanent constraints.

Real-world workload examples

Workload	Best fit	Why
Small REST API with variable traffic	Cloud Run	Stateless, HTTP, scales to zero overnight to eliminate idle cost
Internal admin tool (low request volume)	Cloud Run	Scales to zero, no idle cost, minimal operational overhead
Event-driven image processor	Cloud Run	Triggered by Pub/Sub, short-lived processing, scales with event volume
Legacy Windows application	Compute Engine	Requires Windows Server; cannot run in a Linux container
Self-hosted message broker (RabbitMQ)	GKE or Compute Engine	Needs persistent storage and long-running daemon behaviour; use GKE if containerised
ML model training job	Compute Engine (GPU VM)	Long-running, GPU-intensive, not HTTP-driven; GPU VMs give direct hardware access
Long-running background worker	GKE or Compute Engine	Cloud Run is an HTTP server; indefinite background daemons need GKE deployments or a VM
Microservices with a service mesh	GKE	Sidecar injection for Istio proxies requires Kubernetes pods; Cloud Run does not support sidecars

Cost considerations

The three platforms have fundamentally different billing models, so cost comparisons depend heavily on traffic patterns.

Cloud Run bills per request: CPU and memory used only while a request is being handled. At zero traffic, cost is zero unless minimum instances is set. For variable or low-traffic services, Cloud Run is almost always cheapest. A low-volume API processing 100,000 requests per day at 100ms average duration typically costs a few dollars per month.
GKE bills per node (always on) plus a cluster management fee of roughly $0.10 per cluster-hour (around $74/month for GKE Standard; GKE Autopilot has no separate cluster fee but you pay for pod resource requests). A minimal two-node cluster costs more than most Cloud Run workloads at low to moderate traffic. At high sustained load with committed use discounts on nodes, GKE can become cost-competitive.
Compute Engine bills per VM-second while the instance is running, regardless of whether it is doing any work. A small e2-medium VM running 24/7 costs roughly $25–$35/month before storage and networking. At very high and very steady traffic, flat VM billing can beat per-request Cloud Run pricing — but the crossover depends on request rate, average duration, memory allocation, and whether you use committed use discounts.

Warning

Do not choose a compute platform based on cost estimates alone. Verify with real traffic data and GCP’s pricing calculator. The billing models are different enough that a workload that looks cheaper on Cloud Run at low traffic may look different at sustained high volume — and vice versa.

For detailed cost reduction strategies, see Cloud Run Cost Optimisation and Compute Engine Cost Optimisation.

Quick recommendations

Start with Cloud Run when you are building a new stateless HTTP service, your team does not have Kubernetes expertise, your traffic is variable or unpredictable, or you want the fastest path to a production deployment.
Choose GKE when you need multi-container pods with sidecars, PersistentVolumes for stateful workloads, a service mesh, GPU node pools, or you are running existing Kubernetes deployments that would be costly to rearchitect.
Choose Compute Engine when your workload cannot be containerised, you are doing a lift-and-shift migration, you need Windows Server, you need long-running daemons with no HTTP interface, or you need hardware that neither Cloud Run nor GKE supports.
Revisit the decision when a Cloud Run service starts needing stateful storage or sidecars (consider GKE), a VM-based workload gets containerised (consider Cloud Run or GKE), or your traffic pattern changes significantly enough to affect cost.

Frequently asked questions

What is the default choice for a new GCP service?

Cloud Run. It requires the least operational overhead, scales to zero at no cost, and handles most stateless HTTP workloads well. Start with Cloud Run and move to GKE or Compute Engine only when you hit a specific limitation: stateful storage, multi-container sidecars, non-HTTP protocols, or custom OS requirements. Most developers who start with GKE because it feels familiar end up with unnecessary operational complexity.

Is GKE overkill for most apps?

For most new services, yes. GKE makes sense when you genuinely need Kubernetes primitives: multi-container pods with sidecars, PersistentVolumes for stateful workloads, a service mesh like Istio, or GPU node pools for ML training. If the only reason you are considering GKE is that your app runs in a container, that is not a strong enough reason. Cloud Run runs containers too, with a fraction of the operational overhead.

Can Cloud Run replace Compute Engine VMs?

For stateless HTTP services, Cloud Run replaces VMs entirely and with less overhead. But Cloud Run cannot replace VMs when: the workload cannot be containerised (custom kernel, specific drivers, legacy app), you need Windows Server, you need full OS access, you need indefinitely running background daemons with no HTTP interface, or you need hardware that Cloud Run does not support.

When is Compute Engine better than Cloud Run?

Compute Engine is better when you need full OS control, are doing a lift-and-shift migration from on-premises, need Windows Server, need long-running daemons with no HTTP entrypoint, or need specialised hardware like custom GPUs or local NVMe SSDs. Also consider VMs for workloads with very high and very steady traffic where per-second VM billing becomes cheaper than per-request Cloud Run billing.

Can I start on Cloud Run and migrate to GKE later?

Yes, and this is the recommended approach. Cloud Run uses standard OCI containers. The same container image runs on GKE with a Kubernetes Deployment and Service. You would add Kubernetes manifests and configure an Ingress or load balancer, but your application code stays the same. Starting on Cloud Run reduces early operational burden and gives you a clear path to GKE if you later hit a limitation.

Last verified: 22 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.

Cloud Run vs GKE vs Compute Engine: Which Should You Use in GCP?

Simple explanation

At a glance

How this decision works

When to use Cloud Run

Ideal workloads

When not to choose Cloud Run

Concrete examples

When to use GKE

Ideal workloads

When not to choose GKE

Concrete examples

When to use Compute Engine VMs

Ideal workloads

When not to choose Compute Engine

Concrete examples

Cloud Run vs GKE vs Compute Engine: decision flow

Common beginner mistakes

Real-world workload examples

Cost considerations

Quick recommendations

Summary

Related topics to read next

Frequently asked questions