Serverless vs Virtual Machines in GCP: Cost, Control, and Use Cases

Serverless vs virtual machines in GCP comes down to one question: do you want to manage infrastructure, or just deploy code? In Google Cloud, serverless means Cloud Run or Cloud Functions. Virtual machines means Compute Engine. For most new workloads, serverless is the faster, cheaper starting point. This page explains how each model works, when each one wins, and how to decide.

Simple explanation

Serverless means you give Google your code or container image. Google runs it on machines you never see. When requests arrive, compute spins up. When requests stop, compute shuts down. You do not pick a machine type, patch an OS, or plan capacity. You pay only while your code is running.

Virtual machines means you rent a full computer from Google. You choose the CPU, memory, disk, and operating system. The machine starts and stays running until you stop it. You manage everything on it: OS updates, software installation, monitoring, restarts. You pay for every second the machine is on, whether it is busy or idle.

Think of it this way

Serverless is like taking a taxi. You say where you want to go, someone else drives, and you pay only for the ride. When you are not riding, you pay nothing. You cannot customise the car or choose the route, but you also never change the oil.

A VM is like owning a car. You pick the model, maintain it, insure it, and park it. It is always available the moment you need it, and you can modify anything under the hood. But you pay for it whether you drive 10,000 km this month or leave it in the garage.

Default to serverless for stateless, bursty, or low-ops workloads. Choose VMs when you need OS control, persistent local state, special hardware, or always-on infrastructure where committed use discounts bring cost below per-request billing.

How it works

How serverless works

Provisioning. You deploy a container image (Cloud Run) or a function (Cloud Functions). There is no machine to configure. Google assigns compute resources when requests arrive.

Scaling. The platform adds instances automatically as traffic increases and removes them as traffic drops. Cloud Run can scale from zero to hundreds of instances without manual configuration. See Cloud Run scaling behaviour for details on how autoscaling decisions work.

Billing. You are billed per request and per 100ms of CPU and memory consumed during request handling. When no requests are in flight, billing stops entirely.

Operational responsibility. Google manages the servers, networking, load balancing, and TLS certificates. You manage your application code, container image, and environment variables.

When traffic drops to zero. All instances shut down. Cost drops to zero. The next request triggers a cold start. A new instance boots, typically in 100ms to 2 seconds depending on container size.

Long-running and background work. Cloud Run requests time out at 60 minutes. Cloud Functions (2nd gen) also caps at 60 minutes. Persistent background processes like daemons and queue pollers do not work because instances are only active during request handling. For scheduled or long-running batch work, use Cloud Run Jobs or Batch.

How virtual machines work

Provisioning. You create a VM with a specific machine type (CPU + memory), choose a boot disk image, configure networking, and start it. The machine boots and runs continuously until you stop or delete it.

Scaling. A single VM does not scale horizontally. To handle more traffic, you create managed instance groups with autoscaling rules. You define the scaling metric (CPU, load balancer utilisation, or a custom metric) and the min/max instance count.

Billing. You are billed per second from the moment the VM starts until it stops, regardless of whether it handles any traffic. Committed use discounts (1-year or 3-year) reduce cost significantly for predictable workloads.

Operational responsibility. You manage the OS, patches, runtime dependencies, startup scripts, monitoring agents, and application restarts. Google manages the physical hardware and hypervisor.

When traffic drops to zero. The VM keeps running and billing continues. Autoscaling can reduce instance count, but the minimum is usually 1 unless you script shutdown to zero.

Long-running and background work. No time limits. A VM can run background daemons, queue pollers, cron jobs, database engines, or any long-lived process indefinitely.

Side-by-side comparison

Dimension	Serverless (Cloud Run / Cloud Functions)	Virtual Machines (Compute Engine)
Deployment unit	Container image or function code	Full OS image + application
Provisioning	Automatic. No machine configuration needed.	Manual. You choose machine type, disk, OS.
Scaling	Automatic, per-request, zero to thousands	Manual or autoscaler-based (managed instance groups)
Scale to zero	Yes. Instances stop when idle.	No. Minimum 1 VM unless you script shutdown.
Idle cost	Zero	Full VM cost continues
Billing model	Per request + per 100ms CPU/memory	Per second while running
OS access	None. Managed by Google.	Full root access, SSH
Persistent local state	No. Filesystem is ephemeral.	Yes. Persistent disks survive reboots.
Cold start	100ms–2s on first request after scale-from-zero	30–90s on initial boot (always warm after that)
Background processing	Not supported. Instances active only during requests.	Unlimited. Daemons, pollers, cron, anything.
Max request / runtime	60 min (Cloud Run); 60 min (Functions 2nd gen)	Unlimited
VPC / private networking	Via Serverless VPC Access connector	Full VPC member by default
Debugging / SSH	Logs and traces only	Full SSH, remote desktop, serial console
Best-fit workloads	APIs, webhooks, event processing, lightweight microservices	Databases, GPU workloads, always-on daemons, legacy apps
Worst-fit workloads	Background daemons, persistent state, GPU, custom kernel	Low-traffic APIs, bursty workloads, tiny scheduled jobs

For a deeper service-level comparison with concrete cost examples, see Cloud Run vs Compute Engine.

When to use serverless

Quick decision shortcut

If your workload is stateless, fits in a container, and does not need GPU or persistent local disk, start with Cloud Run. You can always move to a VM later if you hit a limit. Going the other direction (VM to serverless) is harder because it means removing OS dependencies you have already built around.

Public web API. A REST or gRPC API serving variable traffic. Cloud Run scales to zero between bursts and handles spikes without manual intervention.
Internal admin app. A dashboard or tool used a few times per day. Serverless costs nothing during the hours nobody uses it.
Webhook receiver. An endpoint that processes incoming events from Stripe, GitHub, or another service. Traffic is unpredictable and often low.
Queue worker (short tasks). Processing messages from Pub/Sub or Cloud Tasks where each task completes in seconds or minutes.
Nightly batch or cron job. A scheduled job that runs for a few minutes then stops. Use Cloud Run Jobs or Cloud Functions with Cloud Scheduler instead of keeping a VM running 24 hours for 5 minutes of work.
Lightweight microservices. Small, stateless services that do not need shared disk or OS-level access.

When to use virtual machines

Legacy app that needs OS access. Software requiring specific OS packages, kernel modules, or system-level configuration that cannot run inside a container.
GPU or custom driver workload. ML training, video encoding, or any workload that needs GPU, TPU, or specialised hardware drivers attached to the instance.
Self-hosted database. Running PostgreSQL, MySQL, or another database engine where you need persistent local disk with high IOPS and full control over configuration.
Always-on service with predictable heavy load. A service that runs near full utilisation around the clock. Committed use discounts on Compute Engine can cost less than per-request serverless billing at sustained high volume.
Background daemon or long-running process. A queue poller, message broker, or monitoring agent that must run continuously without time limits.
Software that cannot be containerised. Desktop applications, software with kernel-level dependencies, or licensed software tied to a specific machine.

When not to use serverless

Persistent background processes. Serverless instances only run during request handling. A daemon that polls a queue or keeps a connection pool alive will not work reliably.
Workloads that exceed runtime limits. If a single task takes more than 60 minutes, serverless will terminate it. Use Compute Engine or Cloud Run Jobs for long-running batch work.
Workloads needing persistent local disk. The serverless filesystem is ephemeral. Files written during one request may be gone on the next. If your application writes data to local disk and expects it to persist, you need a VM or a managed storage service.
Latency-sensitive services that cannot tolerate cold starts. If your SLA requires sub-100ms response times and scale-from-zero cold starts of 1–2 seconds are unacceptable, either set minimum instances (which adds idle cost) or use an always-on VM.
GPU, TPU, or custom hardware. Serverless platforms do not support attaching GPUs or custom hardware to instances.

Watch out for hidden cold start costs

Setting minimum instances to 1 eliminates cold starts but re-introduces idle cost. You are now paying for an always-warm instance even when no traffic arrives. This can still be cheaper than a full VM, but it is no longer “pay only when code runs.” Check your traffic patterns before committing to minimum instances.

When not to use virtual machines

Low-traffic APIs. Keeping a VM running 24/7 for an API that handles a few hundred requests per day wastes most of the compute budget on idle time.
Bursty or unpredictable workloads. VM autoscaling responds in minutes. Serverless scales in seconds. For flash sales, webhook bursts, or viral traffic, serverless handles spikes faster.
Tiny scheduled jobs. A cron job that runs for 5 minutes per day does not justify a VM running the other 23 hours and 55 minutes. Use Cloud Run Jobs, Batch, or Cloud Functions with Cloud Scheduler instead.
Stateless microservices with no OS requirements. If the app fits in a container and does not need persistent disk, SSH, or OS-level configuration, a VM adds operational overhead with no benefit.

Cost trade-offs

Serverless wins on cost when traffic is low, bursty, or has long idle periods, because idle cost is zero. A Cloud Run service handling a few hundred thousand requests per month with modest resource needs costs far less than even a small always-on VM.

VMs can win on cost when utilisation is sustained and high. If a workload would keep a VM’s CPU consistently above roughly 40–50% utilisation, Compute Engine with committed use discounts often costs less than per-request serverless billing at the same volume. The exact break-even depends on request volume, average duration, and resource allocation. There is no single universal threshold.

Free tier reality check

Cloud Run has a free tier: 2 million requests, 360,000 GB-seconds of memory, and 180,000 vCPU-seconds per month. This covers most experimental and low-traffic production workloads at no cost. A small Compute Engine VM (e2-micro) costs around $7 per month running continuously, even at zero traffic. For side projects and internal tools, the Cloud Run free tier often means your compute bill is literally zero.

For detailed cost strategies, see Cloud Run cost optimisation and Compute Engine cost optimisation.

Common mistakes

Choosing VMs because they feel familiar. Teams experienced with VMs sometimes deploy stateless APIs to Compute Engine out of habit. This adds OS patching, capacity planning, and uptime responsibility for a workload that would run fine on Cloud Run with zero operational overhead.
Treating Cloud Run as a fit for every workload. Cloud Run is not a universal replacement for VMs. Background daemons, GPU workloads, persistent-state applications, and jobs exceeding 60 minutes need a different compute model.
Forgetting cold start implications. A Cloud Run service scaled to zero takes 1–2 seconds to respond to the first request. If your SLA is 200ms, either configure minimum instances or accept the occasional latency spike as a documented trade-off.
Using serverless for background daemons. A process that polls a queue or runs a long-lived connection server needs to stay alive between requests. Serverless instances shut down when idle. Use a VM or GKE for always-on background processes.
Paying for idle VMs for tiny scheduled jobs. Running a VM 24/7 for a report that generates once per night wastes about 99% of the compute budget. Use Cloud Run Jobs, Batch, or Cloud Functions with Cloud Scheduler instead.

Serverless vs virtual machines is the broadest compute model choice in GCP. If your question is more specific, these pages will get you to an answer faster:

Cloud Run vs Cloud Functions: choosing between GCP’s two serverless platforms. See Cloud Run vs Cloud Functions.
Cloud Run vs Compute Engine: a deeper, service-level comparison with concrete cost examples. See Cloud Run vs Compute Engine.
Kubernetes vs serverless: whether GKE makes sense compared to Cloud Run. See Kubernetes vs Serverless.
Containers vs VMs: the packaging model question, not the hosting model question. See Containers vs Virtual Machines.
Choosing between all three: Cloud Run, GKE, and Compute Engine side by side. See Choosing Between Cloud Run, GKE, and Compute Engine.

Frequently asked questions

What is the main difference between serverless and virtual machines?

With virtual machines, you rent a full server and manage the operating system, patches, and scaling yourself. The VM runs and bills continuously. With serverless, you deploy code or a container and Google manages all infrastructure. Cloud Run and Cloud Functions scale to zero when idle, so you only pay when your code is handling requests.

Is serverless always cheaper than a VM?

No. Serverless wins when traffic is low, bursty, or has long idle periods because idle cost is zero. When a workload runs at sustained high utilisation, a Compute Engine VM with a committed use discount can cost less than per-request serverless billing. The break-even depends on request volume, duration, and how steady the load is.

Should I use Cloud Run or Compute Engine for a new API?

Default to Cloud Run. It scales to zero, has a generous free tier, deploys in seconds, and requires no OS management. Move to Compute Engine only when you need persistent local disk, a specific OS, GPU access, or the workload runs at sustained high utilisation where committed use discounts make VMs cheaper.

Can serverless access private resources in a VPC?

Yes. Cloud Run and Cloud Functions connect to VPC resources through Serverless VPC Access connectors. You configure a connector in your VPC and the serverless service routes private traffic through it. This lets you reach Cloud SQL, Memorystore, and internal services without exposing them to the public internet.

What workloads should stay on VMs?

Workloads that need persistent local disk, custom OS or kernel configuration, GPU or TPU access, SSH debugging, always-on background daemons, or software that cannot be containerised. Licensed software tied to specific machine configurations is another common reason to stay on Compute Engine.

Last verified: 28 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.