GCP High Availability: Design Systems That Survive Failures

High availability in GCP means designing your system so that when a VM crashes, a zone goes down for maintenance, or a database instance fails, your users never notice. Most production workloads achieve this by running across multiple zones within a single region, using health checks and load balancers to route traffic away from failures automatically.

What high availability actually means

A highly available system stays online and responsive even when parts of it break. Instead of relying on one server that must never fail, you run multiple copies of your application across separate failure domains. When one copy goes down, the others keep handling requests while the broken copy gets replaced.

In GCP, the most common failure domain is a zone. A zone is an isolated data centre within a region. Zones in the same region share a geographic area but have independent power, cooling, and networking. If one zone fails, the others are unaffected.

Analogy

Think of zones as separate buildings in a business park. They share an address but have independent utilities. If one building loses power, the others stay running. A highly available system keeps copies of itself in multiple buildings so it never depends on any single one.

”Good enough” high availability for most production services means deploying across two or three zones in one region. This protects against the most common failures (hardware faults, zone maintenance windows, and single-zone outages) without the cost and complexity of multi-region. Multi-region deployment is a separate concern, covered in the Multi-Region Architectures guide.

How high availability works in GCP

High availability is not a single feature you enable. It is a combination of infrastructure patterns working together. Here is how each piece fits.

Zones vs regions

A region is a geographic location like us-central1 (Iowa) or europe-west1 (Belgium). Each region contains three or more zones, which are independent data centres like us-central1-a, us-central1-b, and us-central1-c. For a full explanation of how GCP organises its infrastructure, see Regions and Zones.

Failure domains

A failure domain is the blast radius of a single failure. A single VM is a failure domain. A zone is a larger failure domain. A region is the largest. High availability design means placing redundant copies across different failure domains so that no single failure takes everything down.

Quick decision

Ask yourself: “If this component disappears right now, does the whole system go down?” If the answer is yes, that component is a single point of failure and needs redundancy in a different zone.

Redundancy

Run at least two copies of every critical component, in different zones. For stateless services, this is straightforward: run more instances. For stateful services like databases, use managed HA modes that maintain a standby replica in a separate zone.

Health checks

A health check is a periodic probe that GCP runs against your instances. If an instance fails enough consecutive checks, it is removed from the load balancer and flagged for replacement. The health check should hit an HTTP endpoint that verifies the application can actually serve traffic, not just that the process is running.

Load balancing

A load balancer distributes incoming traffic across healthy instances and automatically stops sending requests to instances that fail health checks. GCP’s Global External HTTPS Load Balancer runs at the network edge, not in any single zone or region, so it is not itself a single point of failure.

Autohealing and rescheduling

When a managed instance group detects that an instance has failed its health check, it automatically deletes the unhealthy instance and creates a replacement. Cloud Run and GKE perform similar rescheduling for containers. You do not need to detect or fix failures manually.

Graceful degradation

A well-designed system does not return HTTP 500 because one dependency is slow. If a recommendations API times out, serve a default set of results. If a search index is temporarily unreachable, return cached results. Identify which features can degrade and define fallback behaviour for each.

Analogy

A restaurant that runs out of one dish does not close for the night. It crosses that item off the menu and keeps serving everything else. Your application should do the same when a non-critical dependency goes down.

Observability and alerting

You cannot fix what you cannot see. Use Cloud Monitoring to track error rates, latency percentiles, and instance health. Set up uptime checks against your public endpoints so you are alerted before users report an outage. Alert on symptoms (elevated error rates, rising latency) not just causes (CPU at 100%).

Failover testing

Your HA design is only as reliable as your last test. Intentionally shut down instances or simulate a zone failure in staging and verify that health checks detect the problem, the load balancer reroutes traffic, and replacements come up cleanly. If you have never tested failover, you do not know if it works.

Warning

An HA setup that has never been tested is not high availability. It is a hope. Schedule a failover test in staging before your first production launch, then repeat quarterly.

Single-zone vs multi-zone vs multi-region

Choosing between these deployment patterns is the most important high availability decision you make. Here is how they compare.

Pattern	Protects against	Complexity	Cost	When to use
Single-zone	Nothing. Any zone issue causes downtime	Minimal	Lowest	Dev/test environments, non-critical internal tools
Multi-zone (single region)	Hardware failures, zone maintenance, zone outages	Low	Moderate (small cross-zone traffic cost)	Most production workloads
Multi-region	Full regional outages, geographic latency	High (data replication, traffic management, split-brain risk)	Highest	Strict SLAs (99.99%+), global user base, regulatory requirements

Start with multi-zone. Upgrade to multi-region only when you have a specific requirement that justifies the complexity. Most teams never need multi-region for high availability. They need it for disaster recovery or global latency.

Rule of thumb

If all your users are in one country and your SLA is 99.9% or 99.99%, multi-zone in a single region is almost certainly enough. Save multi-region for when the business case is clear and documented.

High-availability patterns by workload

Compute Engine with regional managed instance groups

A regional managed instance group (MIG) distributes VM instances across all zones in a region. Combined with autohealing, it replaces failed instances automatically. Pair it with autoscaling to handle traffic spikes during a zone failure when remaining instances absorb the load.

# Create a regional managed instance group across three zones
gcloud compute instance-groups managed create my-app-mig \
  --template=my-app-template \
  --size=3 \
  --region=us-central1 \
  --project=my-app-prod

# Configure autohealing with a health check
gcloud compute instance-groups managed update my-app-mig \
  --health-check=my-app-health-check \
  --initial-delay=120 \
  --region=us-central1 \
  --project=my-app-prod

Use —region instead of —zone to get zone-level redundancy. Set —initial-delay to at least twice your application startup time, or the MIG will replace instances that have not finished booting yet.

Common trap

If the initial delay is shorter than your app’s startup time, the MIG marks new instances as unhealthy and deletes them before they finish booting. This creates an infinite replacement loop. If you see instances being created and destroyed repeatedly, check this value first.

Cloud Run

Cloud Run handles most HA concerns automatically. It deploys containers across multiple zones within a region. If a zone becomes unavailable, traffic shifts to instances in other zones. No infrastructure configuration needed.

# Deploy a Cloud Run service with minimum instances for HA
gcloud run deploy my-app \
  --image=us-central1-docker.pkg.dev/my-app-prod/my-repo/my-app:v1 \
  --region=us-central1 \
  --min-instances=2 \
  --max-instances=100 \
  --project=my-app-prod

Set —min-instances=2 for production services. This keeps warm instances available so a zone failure does not cause cold-start latency at the worst possible moment. For help choosing between Cloud Run and VMs, see Choosing Between Cloud Run, GKE, and VMs.

Tip

Cloud Run is the fastest path to multi-zone HA. If your workload fits in a stateless container, you get zone-level redundancy with zero infrastructure management. No MIGs, no health check configuration, no load balancer setup.

GKE regional clusters

A regional GKE cluster runs control plane nodes and worker nodes across three zones. If a zone fails, Kubernetes reschedules pods to nodes in healthy zones. Use pod disruption budgets to ensure enough replicas stay running during voluntary disruptions like node upgrades.

Warning

Always create GKE clusters with —region rather than —zone. A zonal GKE cluster has a single control plane node. If that zone goes down, you lose the ability to manage the cluster entirely, even if worker nodes in other zones are still running.

Databases: Cloud SQL, Firestore, and Spanner

Databases are the hardest component to make highly available because they hold state. Each managed database service handles HA differently.

Cloud SQL: Enable Regional availability to create a standby instance in a second zone. Failover is automatic and takes 60-120 seconds. Applications must implement connection retry logic. See Cloud SQL Overview and Backups and High Availability for configuration details.

# Create a Cloud SQL instance with high availability enabled
gcloud sql instances create my-app-db \
  --database-version=POSTGRES_15 \
  --tier=db-custom-2-8192 \
  --region=us-central1 \
  --availability-type=REGIONAL \
  --project=my-app-prod

The —availability-type=REGIONAL flag is what activates the standby. Without it, you have a single-zone database that can take your entire application offline.

Firestore: Multi-zone availability is built in for regional instances. No additional configuration needed.

Cloud Spanner: Designed for high availability from the ground up. Regional configurations replicate across three zones automatically. Multi-region configurations replicate across regions for the highest availability.

Analogy

Making compute highly available is like hiring extra staff. Making a database highly available is like keeping a synchronised backup of your filing cabinet in another building, updated in real time. The second problem is harder because the data must stay consistent across copies.

Configuring health checks correctly

Health checks are the mechanism that makes everything else work. A bad health check undermines your entire HA design.

Use an HTTP health check against a dedicated /health endpoint, not a TCP port check. TCP being open only proves the process is listening. Your /health endpoint should verify that the application can actually serve traffic: database connection is alive, critical caches are loaded, dependencies are reachable.

# Create an HTTP health check
gcloud compute health-checks create http my-app-health-check \
  --port=8080 \
  --request-path=/health \
  --check-interval=10 \
  --timeout=5 \
  --healthy-threshold=2 \
  --unhealthy-threshold=3 \
  --project=my-app-prod

This configuration checks every 10 seconds, allows 5 seconds for a response, requires 2 consecutive passes to mark healthy, and 3 consecutive failures to mark unhealthy. Tuning these thresholds avoids flapping: a single slow response will not remove a healthy instance from rotation.

What to check in your /health endpoint

A good health check endpoint verifies: (1) the database connection pool has available connections, (2) critical caches are populated, and (3) the application can perform a basic read operation. Do not call external APIs from your health check. If an external dependency is slow, your health check becomes slow, and healthy instances get removed from rotation for a problem that is not theirs.

When you need this

Who this is for: Anyone running a production service where downtime costs money, trust, or user experience. If an outage means lost revenue, missed SLAs, or pager alerts, you need high availability.

Typical production scenarios:

A customer-facing web application that must stay online during zone maintenance
An API backend serving mobile apps where downtime means a broken user experience
A payment processing service with contractual uptime commitments
An internal platform that blocks other teams when it goes down

When simpler architecture is enough: Development and staging environments, internal dashboards used during business hours, batch processing jobs that can retry on failure, and any workload where a few minutes of downtime is acceptable. Do not over-engineer. A single-zone deployment is fine when the cost of downtime is low.

Tip

Before adding HA complexity, estimate the actual cost of one hour of downtime for your service. If the answer is “not much,” a single-zone deployment with good backups is the right call. HA engineering time is better spent on the services where downtime genuinely hurts.

Common beginner mistakes

Running everything in a single zone. A single-zone deployment means any zone event (maintenance, outage, hardware fault) takes your service offline. Use a regional managed instance group or Cloud Run’s built-in multi-zone spreading.
Health checks that only verify the process is running. A TCP port check passes even when the application has lost its database connection and is returning 500 errors to every request. Your /health endpoint must verify the application can serve valid responses.
Leaving the database as a single point of failure. Cloud SQL without —availability-type=REGIONAL is a single-zone instance. A zone failure takes your database offline regardless of how many compute instances you have.
No retry logic or timeouts on outbound calls. When a dependency slows down, your service slows down with it unless you set explicit timeouts. When a database connection drops during failover, your application must retry with backoff instead of crashing.
Never testing failover. If you have not intentionally killed an instance and watched the system recover, you do not know if your HA design works. Test in staging before you rely on it in production.
Weak monitoring and alerting. Health checks automate recovery, but you still need to know when failures happen. Without monitoring and alerting on error rates and latency, you may not notice a partially degraded system until users complain.
Stateful bottlenecks. Storing session state on a single VM means losing that VM loses all active sessions. Use an external session store (Memorystore, Firestore) or design your application to be stateless so any instance can handle any request.

High availability vs disaster recovery

These two concepts are related but solve different problems. Most production systems need both.

	High Availability	Disaster Recovery
Handles	Routine failures: VM crash, zone outage	Rare, severe failures: region outage, data corruption
Failover	Automatic, seconds	Manual or semi-automatic, minutes to hours
Scope	Within a single region (multi-zone)	Across regions
Goal	Prevent downtime	Recover from downtime
Cost	Moderate	Variable, depends on RPO/RTO targets

HA keeps your service running during the failures that actually happen most often. DR gets it back after the failures that almost never happen but would be catastrophic. For a full breakdown of DR strategies, see Disaster Recovery Strategies.

Note

HA and DR are not interchangeable. A system can be highly available within a region and have no disaster recovery plan at all. If the entire region goes offline, HA alone will not save you. Build HA first (it covers 99% of real failures), then layer in DR for the rare catastrophic scenarios.

Beginner checklist: pre-production HA review

Before launching a production service, walk through this checklist.

Compute is multi-zone. Regional MIG, Cloud Run, or regional GKE cluster. Not a single-zone deployment.
Database has HA enabled. Cloud SQL uses —availability-type=REGIONAL. Firestore or Spanner handles this automatically.
Health checks hit an HTTP endpoint that verifies the application can serve traffic, not just that a port is open.
A load balancer distributes traffic and removes unhealthy instances from rotation automatically.
Autoscaling is configured so remaining instances can absorb traffic when a zone goes down.
Timeouts and retries are set on all outbound calls: database connections, API calls, external dependencies.
Graceful degradation is defined. You know what the service returns when each dependency is unavailable.
Monitoring and alerting are active. Error rate, latency, and uptime check alerts are configured and routed to the right team.
Failover has been tested. You have intentionally killed an instance or simulated a zone failure and verified recovery works.
No single points of failure remain. Walk through the request path and confirm every component has redundancy or a defined fallback.

Frequently asked questions

What does high availability mean in GCP?

High availability means your system keeps serving users even when individual components fail. In GCP, this typically means running workloads across multiple zones, using health checks to detect failures, and relying on load balancers and managed services to route around problems automatically.

What SLA does GCP offer for multi-zone Compute Engine deployments?

GCP offers a 99.99% monthly uptime SLA for Compute Engine instances deployed across two or more zones in a regional managed instance group. A single-zone deployment has a 99.9% SLA. The difference between 99.9% and 99.99% is roughly 43 minutes versus 4 minutes of allowed monthly downtime.

What is the difference between high availability and disaster recovery?

High availability handles routine failures like a VM crash or a zone outage with automatic failover measured in seconds. Disaster recovery covers rare, severe events like an entire region going offline. HA keeps the service running during common problems. DR gets it back after a major event. Most production systems need both.

How long does Cloud SQL failover take in HA mode?

Cloud SQL automatic failover to a standby instance in a different zone typically completes in 60 to 120 seconds. During that window, active connections are dropped. Applications must implement connection retry logic with exponential backoff to handle the brief interruption without user-facing errors.

Do I need multi-region for high availability?

Usually not. Multi-zone within a single region protects against the most common failures (hardware faults, zone maintenance, and zone-level outages) at low cost and complexity. Multi-region is needed only when you must survive an entire region going offline, or when you serve a global audience and need low latency everywhere.

Last verified: 26 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.

GCP High Availability: Design Systems That Survive Failures

What high availability actually means

How high availability works in GCP

Zones vs regions

Failure domains

Redundancy

Health checks

Load balancing

Autohealing and rescheduling

Graceful degradation

Observability and alerting

Failover testing

Single-zone vs multi-zone vs multi-region

High-availability patterns by workload

Compute Engine with regional managed instance groups

Cloud Run

GKE regional clusters

Databases: Cloud SQL, Firestore, and Spanner

Configuring health checks correctly

When you need this

Common beginner mistakes

High availability vs disaster recovery

Beginner checklist: pre-production HA review

Summary

Related topics to read next

Frequently asked questions