GCP Spot VMs Explained: Pricing, Preemption, and Use Cases

A Spot VM is a Compute Engine virtual machine that runs on spare Google Cloud capacity at a steep discount, typically 60–91% off the standard on-demand price. The tradeoff is that GCP can reclaim the VM at any time, with only 30 seconds of notice.

That tradeoff makes Spot VMs one of the most cost-effective options in GCP, but only for workloads that can tolerate sudden interruption. If your job can save its progress periodically and resume after being stopped, Spot VMs can dramatically reduce your compute bill. If your workload needs to stay running without interruption (a web server, a primary database, an interactive API), Spot VMs are the wrong choice.

This guide covers how Spot VMs work, how much they save, when to use them, when to avoid them, and how to design workloads that run safely on interruptible capacity.

Simple explanation

Google’s data centres have a large pool of servers. At any given moment, some of that capacity is not being used by customers who are paying full price. Rather than leave those servers idle, Google offers the spare capacity at a large discount through Spot VMs.

The catch is straightforward: if a full-price customer needs that capacity, Google takes it back. Your Spot VM gets a 30-second warning and then shuts down. You are not charged for the time after shutdown, but your workload is interrupted.

Analogy

Think of Spot VMs like standby seats on a flight. You get a much cheaper ticket, but if the flight fills up, your seat goes to a full-fare passenger. If your plans are flexible enough to handle that, the savings are significant. If you absolutely must be on that flight, buy a regular ticket.

How Spot VMs work

In Compute Engine, a Spot VM is created by setting the provisioning-model to SPOT. The VM runs exactly like a standard VM in every other respect (same machine types, same OS images, same networking) but it is subject to preemption.

What preemption means

Preemption is the process of GCP reclaiming a Spot VM. It can happen at any time, for any reason, without advance scheduling. When it happens:

  1. The VM receives a SIGTERM signal. Your application has approximately 30 seconds to save state and shut down gracefully.
  2. If the application has not stopped within 30 seconds, the VM is terminated with SIGKILL.
  3. The VM transitions to one of two states depending on your configuration. STOP: The VM is stopped but its persistent disk is preserved and you can restart it later when capacity is available. DELETE: The VM and its disks are deleted entirely.

You control which behaviour applies by setting the instance-termination-action flag when you create the VM. For most workloads, STOP is the safer default because it preserves the disk.

Important limitation

Spot VMs do not support automatic restart after preemption. When a Spot VM is reclaimed, it stays in the TERMINATED or DELETED state until you manually restart it or use a Managed Instance Group to recreate it. Spot VMs also do not support live migration during host maintenance events. They are always terminated.

This means resilience must be designed into the workload itself. GCP will not automatically bring your Spot VM back. Your architecture needs to handle that.

Pricing and savings

Spot VM discounts typically range from 60% to 91% off the on-demand price for the same machine type. The exact discount depends on several factors:

  • Machine type: Less popular machine families and configurations tend to have higher discounts
  • Region and zone: Zones with more spare capacity offer better Spot pricing
  • Current demand: Spot pricing and availability fluctuate based on overall demand in a zone
No bidding model

Unlike AWS Spot Instances, GCP Spot VMs do not use a bidding model. The price is set by Google and the discount is applied automatically. You either get a Spot VM at the discounted rate, or the request fails because no capacity is available. There is no auction or price fluctuation to manage.

To put the savings in perspective: for a batch workload running 20 VMs for a week, the difference between on-demand and Spot pricing can easily be an 80% reduction in cost. For teams running large-scale data processing, ML training, or rendering pipelines, Spot VMs can make previously unaffordable compute jobs economically viable.

Spot pricing fits into the broader GCP pricing model alongside on-demand rates, sustained use discounts, and committed use discounts. Each serves a different use case. Spot is specifically for cost optimisation on interruptible work, not for steady guaranteed capacity.

When to use Spot VMs

Spot VMs work well for workloads that meet two criteria: they can tolerate interruption, and they can resume or retry without losing significant progress. Common examples:

  • Batch data processing: ETL pipelines, data transformations, and analytics jobs that process data in chunks and can checkpoint between chunks
  • CI/CD build and test runners: Build jobs that can be retried if interrupted. A failed build due to preemption is an inconvenience, not a disaster
  • Video rendering and transcoding: Frame-by-frame or segment-by-segment rendering where each unit of work is independent
  • ML model training: Training runs that save checkpoints periodically and can resume from the last saved epoch
  • Distributed worker pools: Systems where many workers pull tasks from a queue. If one worker is preempted, the task returns to the queue and another worker picks it up
  • Fault-tolerant analytics: Frameworks like Apache Spark or Dataflow that can redistribute work when a node fails
  • Non-critical dev and test environments: Development VMs where an occasional restart is acceptable

The common thread across all of these is that the loss of a single VM does not cause data loss, service downtime, or require the entire job to restart from scratch.

When not to use Spot VMs

Spot VMs are a poor fit for any workload where interruption causes an outage, data loss, or unacceptable user impact:

  • Primary databases: A preempted database instance risks data corruption or unavailability. Even with replication, using Spot for a primary is unnecessarily risky
  • User-facing web servers: A web server that disappears mid-request causes errors for real users. If you need stable serving capacity, use on-demand VMs with committed use discounts
  • Latency-sensitive single-instance workloads: Any service where one VM handles all traffic and a restart means downtime
  • Stateful services without checkpointing or failover: If the service holds important state in memory and has no way to persist it on shutdown, preemption means data loss
  • Anything that cannot tolerate a 30-second shutdown: Long-running transactions, real-time streaming with strict latency SLAs, or processes that cannot be safely interrupted mid-operation
The key question to ask

Before choosing Spot, ask: “What happens to this workload if the VM disappears right now?” If the answer involves user-visible downtime or lost data, use on-demand or committed instances instead. The discount is meaningless if a preemption causes an incident.

Spot VMs vs preemptible VMs vs standard VMs

GCP offers three provisioning models for Compute Engine VMs. This comparison covers the key differences. For a broader look at all GCP pricing tiers including sustained use discounts and committed use discounts, see the pricing models guide.

Standard (on-demand)Spot VMPreemptible VM (legacy)
PriceFull on-demand rate60–91% off on-demand60–91% off on-demand
Interruption riskNone (runs until you stop it)Can be reclaimed at any timeCan be reclaimed at any time
Maximum runtimeUnlimitedNo limit (runs as long as capacity is available)24 hours maximum, always terminated at 24h
Automatic restartSupported (configurable)Not supportedNot supported
Live migrationSupportedNot supportedNot supported
Availability guaranteeStandard SLA appliesNo guarantee, depends on spare capacityNo guarantee, depends on spare capacity
Best forProduction services, databases, anything requiring uptimeBatch jobs, CI/CD, ML training, renderingLegacy. Migrate to Spot VMs
Operational complexityLowMedium (requires checkpointing and retry logic)Medium (same as Spot, plus forced 24h restarts)

For new workloads, choose between standard VMs and Spot VMs. Preemptible VMs are the legacy product and offer no advantages over Spot VMs. The preemptible and Spot VMs reference page covers the technical details of both.

How to design Spot workloads safely

Running on Spot VMs requires designing your workload to survive interruption. These patterns keep your jobs resilient without adding excessive complexity.

Checkpoint progress regularly

This is the most important pattern. Save your job’s progress to durable storage (Cloud Storage, a database, or a shared file system) at regular intervals. When the job restarts, whether on the same VM or a different one, it reads the last checkpoint and resumes from there instead of starting over. For a multi-hour batch job, checkpointing every 15 to 30 minutes means you lose at most that much work on preemption.

Analogy

Checkpointing is like saving your progress in a video game. If the power goes out, you do not restart the entire game. You reload from your last save point and pick up where you left off. Without save points, you lose everything.

Make jobs idempotent

An idempotent job produces the same result whether it runs once or multiple times. If a job is preempted partway through writing results, re-running it from the last checkpoint should not create duplicates or corrupt data. This is especially important for jobs that write to databases or publish messages.

Use work queues

Instead of assigning work directly to specific VMs, put work items into a queue (Cloud Tasks, Pub/Sub, or a custom queue). Workers pull tasks, process them, and acknowledge completion. If a worker is preempted mid-task, the unacknowledged task returns to the queue and another worker picks it up. This pattern makes your system naturally resilient to individual VM failures.

Spread across multiple zones

Spot availability varies by zone. If capacity tightens in one zone, all your Spot VMs in that zone may be preempted at once. Distributing your Spot VMs across multiple zones within a region reduces the chance of losing your entire fleet simultaneously.

Use mixed fleets for guaranteed baseline capacity

For workloads that need some minimum throughput at all times, run a small baseline of on-demand VMs and add Spot VMs for additional scale. The on-demand VMs guarantee that work keeps progressing even during periods of low Spot availability. The Spot VMs provide burst capacity at a fraction of the cost.

Use Managed Instance Groups for automatic replacement

A Managed Instance Group (MIG) automatically attempts to recreate preempted Spot VMs when capacity becomes available. This removes the need to manually monitor and restart instances. Combined with a startup script that resumes from the last checkpoint, a MIG-based Spot fleet can run large batch jobs with minimal manual intervention.

The strongest pattern

The most resilient Spot architecture combines several of these patterns: a work queue feeding tasks to a MIG of Spot VMs that checkpoint progress, spread across multiple zones, with a small on-demand baseline to guarantee minimum throughput. Each layer adds protection against a different failure mode.

Creating Spot VMs

Creating a Spot VM requires two flags beyond a normal VM creation command:

# Create a Spot VM that stops (preserves disk) on preemption
gcloud compute instances create my-spot-vm \
  --zone=us-central1-a \
  --machine-type=n2-standard-4 \
  --provisioning-model=SPOT \
  --instance-termination-action=STOP

The key flags:

  • —provisioning-model=SPOT tells Compute Engine to create a Spot VM instead of a standard VM
  • —instance-termination-action=STOP tells GCP to stop the VM on preemption rather than delete it. The persistent disk is preserved so you can restart later. Use DELETE instead if you do not need the disk after preemption
# List your current Spot VMs
gcloud compute instances list \
  --filter="scheduling.provisioningModel=SPOT"

Spot VMs in Managed Instance Groups

For batch workloads that need many Spot VMs, a Managed Instance Group is the standard approach. A MIG monitors the health of its instances and automatically attempts to recreate any that are preempted.

Capacity is not guaranteed

Recreation depends on capacity being available. If the zone is under heavy demand, the MIG may not be able to recreate all preempted instances immediately. It will keep trying, but there is no guarantee of when capacity will return. For this reason, MIGs work best for batch workloads where temporary reductions in fleet size are acceptable, and for fleets that span multiple zones.

# Create an instance template for a Spot VM fleet
gcloud compute instance-templates create spot-batch-template \
  --machine-type=n2-standard-4 \
  --provisioning-model=SPOT \
  --instance-termination-action=STOP \
  --metadata=startup-script='#!/bin/bash
    gsutil cp gs://my-bucket/job-script.py /tmp/job-script.py
    python3 /tmp/job-script.py'

# Create a MIG with 20 Spot VMs
gcloud compute instance-groups managed create spot-batch-group \
  --template=spot-batch-template \
  --size=20 \
  --zone=us-central1-a

Each time a VM in this group starts (or restarts after preemption), the startup script runs. If job-script.py is designed to load its last checkpoint and resume, the MIG effectively creates a self-healing batch fleet.

Common mistakes

  1. Using Spot VMs for critical always-on services. A Spot VM running a production web server or primary database will eventually be preempted. When it is, users experience downtime and you may lose data. The discount is not worth the operational risk. Use on-demand or committed use discounts for services that must stay available.

  2. Skipping checkpointing. A batch job that runs for 8 hours without saving progress and gets preempted at hour 7 loses all 7 hours of work. With 30-minute checkpoints, the worst case is 30 minutes of lost progress. Checkpointing is non-optional for any multi-hour Spot workload.

  3. Assuming low preemption rates mean safety. Some Spot VMs run for days or weeks without interruption. This creates a false sense of security. Preemption frequency depends on zone demand and can change at any time. Design as if preemption is inevitable and you will never be caught off guard.

  4. Running all Spot VMs in a single zone. When capacity tightens in one zone, all your Spot VMs in that zone can be preempted at the same time. Spreading across multiple zones within a region reduces the risk of losing your entire fleet in one event.

  5. Not monitoring retry costs and failure rates. Spot VMs save money on per-hour compute, but if your workload is preempted frequently and each preemption causes hours of rework, the effective cost may be higher than running on-demand. Track your actual preemption rate, job completion times, and wasted compute to verify that Spot is actually saving you money. This kind of cost visibility is a core part of FinOps practice.

  6. Expecting automatic restart after preemption. Spot VMs do not support automatic restart. After preemption, the VM stays in TERMINATED state until you restart it manually or a Managed Instance Group recreates it. If your workflow depends on the VM coming back on its own, you need a MIG or an external orchestrator.

Frequently asked questions

What is the difference between Spot VMs and preemptible VMs?

Spot VMs replaced preemptible VMs as the current Compute Engine product for discounted, interruptible capacity. Both offer large savings and can be reclaimed by GCP at any time. The key difference is that preemptible VMs had a hard 24-hour maximum runtime and were always terminated at the 24-hour mark. Spot VMs have no maximum runtime limit and can run for days or weeks if capacity remains available, but can still be reclaimed at any moment. For all new workloads, use Spot VMs. Preemptible VMs are the legacy product.

Can Spot VMs be restarted automatically after preemption?

No. Spot VMs do not support automatic restart. When GCP reclaims a Spot VM, the instance moves to a TERMINATED or DELETED state depending on your termination action setting. You must restart it manually or use a Managed Instance Group, which will attempt to recreate preempted instances automatically when capacity becomes available again.

Are Spot VMs good for production workloads?

Spot VMs are good for production batch workloads that are designed to handle interruption. Examples include distributed data pipelines, ML training with checkpointing, and rendering jobs that can retry failed tasks. They are not suitable for production services that must stay continuously available, such as web servers, databases, or APIs, because GCP can reclaim the VM at any time with only 30 seconds of warning.

How much can I really save with Spot VMs?

Spot VM discounts typically range from 60% to 91% off on-demand pricing, depending on the machine type, region, and current capacity availability. The exact discount is not fixed and varies over time and by zone. For large batch workloads running many VMs, the savings can be substantial enough to make previously unaffordable compute jobs economically viable.

What workloads are safest on Spot VMs?

The safest workloads are ones that can be divided into small, independently resumable units of work: batch data processing, CI/CD build pipelines, video rendering, genomics analysis, ML training with periodic checkpointing, and distributed analytics jobs. The common thread is that if a single VM is preempted, the job loses at most a small amount of progress and can resume from a checkpoint or be retried by another worker.

Last verified: 27 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.