GCP Spot VMs Explained: Pricing, Preemption, and Use Cases
A Spot VM is a Compute Engine virtual machine that runs on spare Google Cloud capacity at a steep discount, typically 60–91% off the standard on-demand price. The tradeoff is that GCP can reclaim the VM at any time, with only 30 seconds of notice.
That tradeoff makes Spot VMs one of the most cost-effective options in GCP, but only for workloads that can tolerate sudden interruption. If your job can save its progress periodically and resume after being stopped, Spot VMs can dramatically reduce your compute bill. If your workload needs to stay running without interruption (a web server, a primary database, an interactive API), Spot VMs are the wrong choice.
This guide covers how Spot VMs work, how much they save, when to use them, when to avoid them, and how to design workloads that run safely on interruptible capacity.
At a glance
- Cost: 60–91% cheaper than on-demand, varying by machine type and region
- Availability guarantee: None. GCP can reclaim the VM at any time
- Preemption warning: 30-second SIGTERM before the VM is stopped or deleted
- Best for: Batch processing, CI/CD, rendering, ML training, fault-tolerant analytics
- Not suitable for: Databases, web servers, stateful services without failover
- Key design requirement: Workloads must checkpoint progress and handle restarts gracefully
Simple explanation
Google’s data centres have a large pool of servers. At any given moment, some of that capacity is not being used by customers who are paying full price. Rather than leave those servers idle, Google offers the spare capacity at a large discount through Spot VMs.
The catch is straightforward: if a full-price customer needs that capacity, Google takes it back. Your Spot VM gets a 30-second warning and then shuts down. You are not charged for the time after shutdown, but your workload is interrupted.
Think of Spot VMs like standby seats on a flight. You get a much cheaper ticket, but if the flight fills up, your seat goes to a full-fare passenger. If your plans are flexible enough to handle that, the savings are significant. If you absolutely must be on that flight, buy a regular ticket.
How Spot VMs work
In Compute Engine, a Spot VM is created by setting the provisioning-model
to SPOT. The VM runs exactly like a standard VM in every other respect
(same machine types, same OS images, same networking) but it is subject to preemption.
What preemption means
Preemption is the process of GCP reclaiming a Spot VM. It can happen at any time, for any reason, without advance scheduling. When it happens:
- The VM receives a SIGTERM signal. Your application has approximately 30 seconds to save state and shut down gracefully.
- If the application has not stopped within 30 seconds, the VM is terminated with SIGKILL.
- The VM transitions to one of two states depending on your configuration. STOP: The VM is stopped but its persistent disk is preserved and you can restart it later when capacity is available. DELETE: The VM and its disks are deleted entirely.
You control which behaviour applies by setting the instance-termination-action
flag when you create the VM. For most workloads, STOP is the safer default
because it preserves the disk.
Spot VMs do not support automatic restart after preemption. When a Spot VM is reclaimed, it stays in the TERMINATED or DELETED state until you manually restart it or use a Managed Instance Group to recreate it. Spot VMs also do not support live migration during host maintenance events. They are always terminated.
This means resilience must be designed into the workload itself. GCP will not automatically bring your Spot VM back. Your architecture needs to handle that.
Pricing and savings
Spot VM discounts typically range from 60% to 91% off the on-demand price for the same machine type. The exact discount depends on several factors:
- Machine type: Less popular machine families and configurations tend to have higher discounts
- Region and zone: Zones with more spare capacity offer better Spot pricing
- Current demand: Spot pricing and availability fluctuate based on overall demand in a zone
Unlike AWS Spot Instances, GCP Spot VMs do not use a bidding model. The price is set by Google and the discount is applied automatically. You either get a Spot VM at the discounted rate, or the request fails because no capacity is available. There is no auction or price fluctuation to manage.
To put the savings in perspective: for a batch workload running 20 VMs for a week, the difference between on-demand and Spot pricing can easily be an 80% reduction in cost. For teams running large-scale data processing, ML training, or rendering pipelines, Spot VMs can make previously unaffordable compute jobs economically viable.
Spot pricing fits into the broader GCP pricing model alongside on-demand rates, sustained use discounts, and committed use discounts. Each serves a different use case. Spot is specifically for cost optimisation on interruptible work, not for steady guaranteed capacity.
When to use Spot VMs
Spot VMs work well for workloads that meet two criteria: they can tolerate interruption, and they can resume or retry without losing significant progress. Common examples:
- Batch data processing: ETL pipelines, data transformations, and analytics jobs that process data in chunks and can checkpoint between chunks
- CI/CD build and test runners: Build jobs that can be retried if interrupted. A failed build due to preemption is an inconvenience, not a disaster
- Video rendering and transcoding: Frame-by-frame or segment-by-segment rendering where each unit of work is independent
- ML model training: Training runs that save checkpoints periodically and can resume from the last saved epoch
- Distributed worker pools: Systems where many workers pull tasks from a queue. If one worker is preempted, the task returns to the queue and another worker picks it up
- Fault-tolerant analytics: Frameworks like Apache Spark or Dataflow that can redistribute work when a node fails
- Non-critical dev and test environments: Development VMs where an occasional restart is acceptable
The common thread across all of these is that the loss of a single VM does not cause data loss, service downtime, or require the entire job to restart from scratch.
When not to use Spot VMs
Spot VMs are a poor fit for any workload where interruption causes an outage, data loss, or unacceptable user impact:
- Primary databases: A preempted database instance risks data corruption or unavailability. Even with replication, using Spot for a primary is unnecessarily risky
- User-facing web servers: A web server that disappears mid-request causes errors for real users. If you need stable serving capacity, use on-demand VMs with committed use discounts
- Latency-sensitive single-instance workloads: Any service where one VM handles all traffic and a restart means downtime
- Stateful services without checkpointing or failover: If the service holds important state in memory and has no way to persist it on shutdown, preemption means data loss
- Anything that cannot tolerate a 30-second shutdown: Long-running transactions, real-time streaming with strict latency SLAs, or processes that cannot be safely interrupted mid-operation
Before choosing Spot, ask: “What happens to this workload if the VM disappears right now?” If the answer involves user-visible downtime or lost data, use on-demand or committed instances instead. The discount is meaningless if a preemption causes an incident.
Spot VMs vs preemptible VMs vs standard VMs
GCP offers three provisioning models for Compute Engine VMs. This comparison covers the key differences. For a broader look at all GCP pricing tiers including sustained use discounts and committed use discounts, see the pricing models guide.
| Standard (on-demand) | Spot VM | Preemptible VM (legacy) | |
|---|---|---|---|
| Price | Full on-demand rate | 60–91% off on-demand | 60–91% off on-demand |
| Interruption risk | None (runs until you stop it) | Can be reclaimed at any time | Can be reclaimed at any time |
| Maximum runtime | Unlimited | No limit (runs as long as capacity is available) | 24 hours maximum, always terminated at 24h |
| Automatic restart | Supported (configurable) | Not supported | Not supported |
| Live migration | Supported | Not supported | Not supported |
| Availability guarantee | Standard SLA applies | No guarantee, depends on spare capacity | No guarantee, depends on spare capacity |
| Best for | Production services, databases, anything requiring uptime | Batch jobs, CI/CD, ML training, rendering | Legacy. Migrate to Spot VMs |
| Operational complexity | Low | Medium (requires checkpointing and retry logic) | Medium (same as Spot, plus forced 24h restarts) |
For new workloads, choose between standard VMs and Spot VMs. Preemptible VMs are the legacy product and offer no advantages over Spot VMs. The preemptible and Spot VMs reference page covers the technical details of both.
How to design Spot workloads safely
Running on Spot VMs requires designing your workload to survive interruption. These patterns keep your jobs resilient without adding excessive complexity.
Checkpoint progress regularly
This is the most important pattern. Save your job’s progress to durable storage (Cloud Storage, a database, or a shared file system) at regular intervals. When the job restarts, whether on the same VM or a different one, it reads the last checkpoint and resumes from there instead of starting over. For a multi-hour batch job, checkpointing every 15 to 30 minutes means you lose at most that much work on preemption.
Checkpointing is like saving your progress in a video game. If the power goes out, you do not restart the entire game. You reload from your last save point and pick up where you left off. Without save points, you lose everything.
Make jobs idempotent
An idempotent job produces the same result whether it runs once or multiple times. If a job is preempted partway through writing results, re-running it from the last checkpoint should not create duplicates or corrupt data. This is especially important for jobs that write to databases or publish messages.
Use work queues
Instead of assigning work directly to specific VMs, put work items into a queue (Cloud Tasks, Pub/Sub, or a custom queue). Workers pull tasks, process them, and acknowledge completion. If a worker is preempted mid-task, the unacknowledged task returns to the queue and another worker picks it up. This pattern makes your system naturally resilient to individual VM failures.
Spread across multiple zones
Spot availability varies by zone. If capacity tightens in one zone, all your Spot VMs in that zone may be preempted at once. Distributing your Spot VMs across multiple zones within a region reduces the chance of losing your entire fleet simultaneously.
Use mixed fleets for guaranteed baseline capacity
For workloads that need some minimum throughput at all times, run a small baseline of on-demand VMs and add Spot VMs for additional scale. The on-demand VMs guarantee that work keeps progressing even during periods of low Spot availability. The Spot VMs provide burst capacity at a fraction of the cost.
Use Managed Instance Groups for automatic replacement
A Managed Instance Group (MIG) automatically attempts to recreate preempted Spot VMs when capacity becomes available. This removes the need to manually monitor and restart instances. Combined with a startup script that resumes from the last checkpoint, a MIG-based Spot fleet can run large batch jobs with minimal manual intervention.
The most resilient Spot architecture combines several of these patterns: a work queue feeding tasks to a MIG of Spot VMs that checkpoint progress, spread across multiple zones, with a small on-demand baseline to guarantee minimum throughput. Each layer adds protection against a different failure mode.
Creating Spot VMs
Creating a Spot VM requires two flags beyond a normal VM creation command:
# Create a Spot VM that stops (preserves disk) on preemption
gcloud compute instances create my-spot-vm \
--zone=us-central1-a \
--machine-type=n2-standard-4 \
--provisioning-model=SPOT \
--instance-termination-action=STOPThe key flags:
—provisioning-model=SPOTtells Compute Engine to create a Spot VM instead of a standard VM—instance-termination-action=STOPtells GCP to stop the VM on preemption rather than delete it. The persistent disk is preserved so you can restart later. UseDELETEinstead if you do not need the disk after preemption
# List your current Spot VMs
gcloud compute instances list \
--filter="scheduling.provisioningModel=SPOT"Spot VMs in Managed Instance Groups
For batch workloads that need many Spot VMs, a Managed Instance Group is the standard approach. A MIG monitors the health of its instances and automatically attempts to recreate any that are preempted.
Recreation depends on capacity being available. If the zone is under heavy demand, the MIG may not be able to recreate all preempted instances immediately. It will keep trying, but there is no guarantee of when capacity will return. For this reason, MIGs work best for batch workloads where temporary reductions in fleet size are acceptable, and for fleets that span multiple zones.
# Create an instance template for a Spot VM fleet
gcloud compute instance-templates create spot-batch-template \
--machine-type=n2-standard-4 \
--provisioning-model=SPOT \
--instance-termination-action=STOP \
--metadata=startup-script='#!/bin/bash
gsutil cp gs://my-bucket/job-script.py /tmp/job-script.py
python3 /tmp/job-script.py'
# Create a MIG with 20 Spot VMs
gcloud compute instance-groups managed create spot-batch-group \
--template=spot-batch-template \
--size=20 \
--zone=us-central1-aEach time a VM in this group starts (or restarts after preemption), the startup script
runs. If job-script.py is designed to load its last checkpoint and resume,
the MIG effectively creates a self-healing batch fleet.
Common mistakes
Using Spot VMs for critical always-on services. A Spot VM running a production web server or primary database will eventually be preempted. When it is, users experience downtime and you may lose data. The discount is not worth the operational risk. Use on-demand or committed use discounts for services that must stay available.
Skipping checkpointing. A batch job that runs for 8 hours without saving progress and gets preempted at hour 7 loses all 7 hours of work. With 30-minute checkpoints, the worst case is 30 minutes of lost progress. Checkpointing is non-optional for any multi-hour Spot workload.
Assuming low preemption rates mean safety. Some Spot VMs run for days or weeks without interruption. This creates a false sense of security. Preemption frequency depends on zone demand and can change at any time. Design as if preemption is inevitable and you will never be caught off guard.
Running all Spot VMs in a single zone. When capacity tightens in one zone, all your Spot VMs in that zone can be preempted at the same time. Spreading across multiple zones within a region reduces the risk of losing your entire fleet in one event.
Not monitoring retry costs and failure rates. Spot VMs save money on per-hour compute, but if your workload is preempted frequently and each preemption causes hours of rework, the effective cost may be higher than running on-demand. Track your actual preemption rate, job completion times, and wasted compute to verify that Spot is actually saving you money. This kind of cost visibility is a core part of FinOps practice.
Expecting automatic restart after preemption. Spot VMs do not support automatic restart. After preemption, the VM stays in TERMINATED state until you restart it manually or a Managed Instance Group recreates it. If your workflow depends on the VM coming back on its own, you need a MIG or an external orchestrator.
Frequently asked questions
What is the difference between Spot VMs and preemptible VMs?
Spot VMs replaced preemptible VMs as the current Compute Engine product for discounted, interruptible capacity. Both offer large savings and can be reclaimed by GCP at any time. The key difference is that preemptible VMs had a hard 24-hour maximum runtime and were always terminated at the 24-hour mark. Spot VMs have no maximum runtime limit and can run for days or weeks if capacity remains available, but can still be reclaimed at any moment. For all new workloads, use Spot VMs. Preemptible VMs are the legacy product.
Can Spot VMs be restarted automatically after preemption?
No. Spot VMs do not support automatic restart. When GCP reclaims a Spot VM, the instance moves to a TERMINATED or DELETED state depending on your termination action setting. You must restart it manually or use a Managed Instance Group, which will attempt to recreate preempted instances automatically when capacity becomes available again.
Are Spot VMs good for production workloads?
Spot VMs are good for production batch workloads that are designed to handle interruption. Examples include distributed data pipelines, ML training with checkpointing, and rendering jobs that can retry failed tasks. They are not suitable for production services that must stay continuously available, such as web servers, databases, or APIs, because GCP can reclaim the VM at any time with only 30 seconds of warning.
How much can I really save with Spot VMs?
Spot VM discounts typically range from 60% to 91% off on-demand pricing, depending on the machine type, region, and current capacity availability. The exact discount is not fixed and varies over time and by zone. For large batch workloads running many VMs, the savings can be substantial enough to make previously unaffordable compute jobs economically viable.
What workloads are safest on Spot VMs?
The safest workloads are ones that can be divided into small, independently resumable units of work: batch data processing, CI/CD build pipelines, video rendering, genomics analysis, ML training with periodic checkpointing, and distributed analytics jobs. The common thread is that if a single VM is preempted, the job loses at most a small amount of progress and can resume from a checkpoint or be retried by another worker.