Cloud SQL Backups, PITR, and High Availability: What to Enable for Production

Cloud SQL gives you four tools for building a resilient database: automated backups, on-demand backups, point-in-time recovery (PITR), and high availability (HA). Each solves a different problem. None of them replaces the others. The hardest part is not the configuration. It is knowing which one you actually need.

Backups let you recover from catastrophic data loss. PITR lets you recover from a bad DELETE or a failed migration by rewinding to a precise moment. High availability keeps your instance online when a zone goes down, automatically promoting a warm standby in roughly 60 seconds. Read replicas handle read traffic, but they are not the same as HA, and assuming they are is one of the most common Cloud SQL configuration mistakes.

This page walks through all four mechanisms, explains what each one protects against, and helps you decide what to enable for your workload. If you are new to Cloud SQL itself, start with the Cloud SQL overview first.

What each mechanism actually does

Before getting into configuration, here is the plain-English version of each tool:

Automated backup: a daily snapshot of your entire database. If something goes badly wrong, you can restore to the state it was in at the last backup.
On-demand backup: a manual snapshot you trigger yourself. Useful before risky operations like schema changes or bulk updates. Persists indefinitely until you delete it.
Point-in-time recovery (PITR): recovery to any specific second within the log retention window. Not just “yesterday’s backup” but “yesterday at 14:27:43, before that migration ran.”
High availability (HA): a second copy of your instance in a different availability zone, kept in sync with every write to the primary. If the primary zone fails, Cloud SQL promotes the standby automatically. The standby is never directly accessible; it exists only to take over.
Read replica: a separate Cloud SQL instance that receives changes asynchronously from the primary. You can query it for reads. It does not fail over automatically if the primary fails.

Think of it like this

A backup is a safety net. PITR is a time machine. High availability is a co-pilot who takes the controls the moment something goes wrong. A read replica is a colleague who can handle some of your calls, but cannot take the wheel in an emergency.

What this page helps you decide

Whether automated backups alone are enough for your workload
When PITR is worth enabling, and what MySQL requires that PostgreSQL does not
When high availability justifies the extra cost
Why read replicas are not a substitute for HA, and what happens if you treat them as one
What a sensible production Cloud SQL setup looks like before go-live

How Cloud SQL recovery works end to end

These mechanisms form a layered recovery model, not four separate unrelated settings. Here is how they fit together.

Every day, Cloud SQL takes a full backup of your instance and stores it in Cloud Storage, replicated across regions for durability. That is your baseline recovery point. By default you keep seven of these; you can increase the count up to 365.

Between full backups, Cloud SQL can continuously archive logs: binary logs for MySQL, WAL (write-ahead log) for PostgreSQL. These logs fill the gap between backup snapshots and give you PITR. When you trigger a point-in-time restore, Cloud SQL takes the nearest full backup and replays the captured logs up to your chosen timestamp. The result is a new Cloud SQL instance at that exact state.

High availability is separate from both. It does not change how backups or PITR work. It provisions a standby instance in a different zone within the same region and keeps it synchronised in real time. When the primary zone fails, Cloud SQL detects this automatically and promotes the standby. Your application reconnects to the same IP address; Cloud SQL updates the DNS record behind the scenes.

Read replicas sit outside this recovery stack entirely. They are useful for distributing read traffic across multiple instances, but they use asynchronous replication and do not provide automatic failover. For a broader view of availability patterns across GCP services, see the guide on designing highly available systems on GCP.

Automated backups

Cloud SQL performs one automated full backup per day during a configurable backup window. Backups are stored in Cloud Storage and replicated across regions for durability. Schedule the window during a low-traffic period to minimise any performance impact.

# Enable backups and set the backup window at instance creation
gcloud sql instances create my-db-instance \
  --database-version=POSTGRES_15 \
  --region=europe-west2 \
  --tier=db-n1-standard-2 \
  --backup-start-time=02:00 \
  --retained-backups-count=14

# Update backup settings on an existing instance
gcloud sql instances patch my-db-instance \
  --backup-start-time=02:00 \
  --retained-backups-count=14

# List available backups
gcloud sql backups list --instance=my-db-instance

Retention is by count, not by days

Setting —retained-backups-count=14 keeps the last 14 daily backups, roughly two weeks. But if the instance is paused or a backup job fails on a given day, that day does not count. Plan your retention number with this in mind.

For PostgreSQL instances, enabling backups also automatically enables WAL archiving, which is required for PITR. For MySQL, PITR requires an additional flag covered in the next section.

On-demand backups

You can take a backup at any time. On-demand backups are not affected by the retained-backups-count limit; they persist until you delete them manually, which makes them useful as long-lived checkpoints.

# Create an on-demand backup immediately
gcloud sql backups create --instance=my-db-instance

# Describe a specific backup to confirm it succeeded
gcloud sql backups describe BACKUP_ID --instance=my-db-instance

# Delete a backup when it is no longer needed
gcloud sql backups delete BACKUP_ID --instance=my-db-instance

When to use on-demand backups

Take one immediately before a schema migration, a bulk data load or delete, or before promoting a read replica. This gives you a clean restore point that will not age out of the rolling window before you are confident the operation succeeded.

Point-in-time recovery (PITR)

PITR lets you recover your database to any second within the log retention window. This is the right tool when the damage was not an infrastructure failure but a logical one: someone ran DELETE FROM orders WHERE status = ‘pending’ at the wrong time, or a migration introduced corrupted data and you need to rewind to before it ran.

The behaviour differs between database engines:

MySQL: PITR requires binary logging, which must be explicitly enabled at instance creation with —enable-bin-log. The logs only exist from the moment it is turned on; you cannot enable it retroactively and recover historical data. See the MySQL on Cloud SQL guide for more on binary logging and replica configuration.
PostgreSQL: WAL archiving is enabled automatically whenever backups are turned on. No additional flag is needed.

MySQL users: enable binary logging at creation

If you create a MySQL instance without —enable-bin-log, PITR is not available. You cannot add it later and recover data from before it was enabled. This is easy to miss because automated backups still work without it.

# MySQL: create an instance with binary logging enabled for PITR
gcloud sql instances create my-mysql-instance \
  --database-version=MYSQL_8_0 \
  --region=europe-west2 \
  --tier=db-n1-standard-2 \
  --enable-bin-log \
  --backup-start-time=02:00

# Restore to a specific point in time (creates a new instance, not an overwrite)
gcloud sql instances clone my-db-instance my-db-restored \
  --point-in-time=2026-03-07T14:30:00.000Z

PITR always creates a new Cloud SQL instance rather than overwriting the existing one. This is intentional. During incident response, you can restore and verify the recovered data while the production instance stays accessible. Once you have confirmed the state looks correct, update your application’s connection string to point at the restored instance, or export and re-import specific tables back into the primary.

Timestamps must be in UTC

During an incident it is easy to enter a local time by mistake. Restoring to 14:30 UTC when you meant 14:30 BST means arriving an hour late to the recovery point. That is a frustrating and avoidable error.

High availability in Cloud SQL

A Cloud SQL high-availability instance consists of a primary in one zone and a standby in a different zone within the same region. Every write to the primary is synchronously replicated to the standby before being acknowledged. The standby is always current, but this adds a small amount of write latency compared to a single-zone instance.

If the primary zone becomes unavailable, Cloud SQL detects this automatically and promotes the standby. Failover typically completes within 60 seconds. Your application connects to the same IP address; Cloud SQL updates the DNS record behind the scenes. The former standby becomes the new primary, and Cloud SQL provisions a replacement standby in another zone.

The standby is not accessible for reads. It does not serve queries. It exists solely to become the new primary during a zone failure. If you need to distribute read load across instances, that is what read replicas are for, and they are independent of HA.

HA does not protect your data from you

If someone runs DROP TABLE or a bad migration on the primary, that change replicates to the standby immediately. HA will not save you from it. For protection against logical errors, you need backups and PITR. HA and backups solve completely different problems.

Cost is roughly double that of a single-zone instance because you are running two instances (primary plus standby) with duplicated compute and storage. For workloads where zone downtime is unacceptable, that cost is easy to justify. For internal tools or non-critical databases, ZONAL is often fine.

# Create a high-availability (regional) instance
gcloud sql instances create my-ha-instance \
  --database-version=POSTGRES_15 \
  --region=europe-west2 \
  --tier=db-n1-standard-2 \
  --availability-type=REGIONAL \
  --backup-start-time=02:00

# Enable HA on an existing instance (requires a brief restart)
gcloud sql instances patch my-db-instance \
  --availability-type=REGIONAL

# Trigger a manual failover to test the behaviour
gcloud sql instances failover my-ha-instance

Test failover before you need it

The default availability type is ZONAL, a single instance with no standby. For production, set —availability-type=REGIONAL. Then run gcloud sql instances failover in a staging environment to see how your application behaves during the switchover. Finding out during a real outage is not the time to discover a connection-handling issue.

Restoring from a backup

# Restore a backup to a different instance (safer for investigation)
gcloud sql backups restore BACKUP_ID \
  --restore-instance=my-db-restored \
  --backup-instance=my-db-instance

# Restore a backup to the same instance (overwrites current data immediately)
gcloud sql backups restore BACKUP_ID \
  --restore-instance=my-db-instance

Never restore to production while still diagnosing

Restoring to the same instance overwrites all current data immediately and irreversibly. During incident response, always restore to a new instance first. Verify it contains what you expect before touching the production instance. Overwriting production while still diagnosing can destroy evidence and make the situation significantly harder to recover from.

Backups, PITR, HA, and read replicas compared

These four features are frequently confused because they all relate to resilience. Here is a direct comparison by what each one actually does.

Automated backups

Purpose: data recovery from catastrophic loss.

Protects against instance deletion, data corruption, and accidental drops. Runs daily and is automatic once configured. Does not reduce downtime during a zone failure. Does not protect against in-window logical errors.

Point-in-time recovery (PITR)

Purpose: granular recovery to a specific moment.

Protects against accidental deletions, bad migrations, and data corruption within the log window. Does not reduce infrastructure downtime. Does not help with read scaling. Requires enabling binary logs on MySQL. Restore creates a new instance, never an overwrite.

High availability (HA)

Purpose: reducing downtime during zone failure.

Provides automatic failover within roughly 60 seconds. Does not protect against accidental deletion or bad queries; those replicate to the standby immediately. Does not help with read scaling. Costs roughly double. Must be explicitly enabled with —availability-type=REGIONAL.

Read replicas: not a failover mechanism

Purpose: distributing read traffic.

Provides scalable read throughput across multiple instances. Does not provide automatic failover; promotion is manual and takes time. Does not protect against data loss. Works well alongside HA, not as a replacement for it.

The key distinction: HA and read replicas address availability; backups and PITR address recoverability. A resilient production database uses all of them. For a wider view of recovery architecture in GCP, see the disaster recovery strategies guide.

When to use each option

The right configuration depends on what level of downtime and data loss your workload can tolerate. Here are four common scenarios:

Small internal tool or development environment: automated backups with the default 7-day retention are usually sufficient. ZONAL availability is fine. For PostgreSQL, PITR is free to enable since it is automatic when backups are on. Skip HA unless the team depends on this database heavily during the working day.
Production app with moderate uptime requirements: enable backups with 14 to 30 day retention, enable PITR (add —enable-bin-log at creation for MySQL), and consider HA. If a few minutes of downtime during a zone failure is covered by your SLA, ZONAL may still be acceptable, but be explicit about that decision.
Business-critical production database: enable HA, enable PITR, set backup retention to 30 days or more, take on-demand backups before major operations, and test your failover and restore processes before go-live. Set up alerts in Cloud Monitoring for backup failures and replication lag.
Read-heavy reporting workload: add one or more read replicas to offload analytics queries from the primary. Keep HA enabled on the primary if it is production-facing. Read replicas do not provide failover; they are for scaling, not resilience.

If you are still deciding whether Cloud SQL is the right choice for your use case, the choosing the right storage service guide covers that decision across Cloud SQL, Firestore, Bigtable, and others.

Common mistakes

Assuming a read replica provides automatic failover. A read replica does not take over if the primary fails. Promotion is manual and takes time. If your application depends on automatic failover, you need —availability-type=REGIONAL, not a read replica. This is the most common Cloud SQL HA misconfiguration.
Not enabling binary logging for MySQL PITR. Automated backups alone do not enable PITR on MySQL. Without —enable-bin-log, the closest you can recover to is the previous full backup. This flag must be set at instance creation; you cannot enable it retroactively and recover historical data.
Thinking HA replaces backups. HA protects against zone failure. It does not protect against accidental data deletion or a bad migration. Those changes replicate to the standby immediately. You still need backups and PITR for data recovery.
Restoring directly to the production instance while diagnosing an incident. Restoring to the same instance overwrites all current data immediately. Always restore to a new instance first when investigating. Overwriting production before you understand what happened can destroy evidence and make recovery harder.
Using local time instead of UTC for PITR timestamps. Cloud SQL PITR timestamps must be in UTC. During a stressful incident it is easy to enter a local time by mistake. Restoring to the wrong point because of a timezone error is common and frustrating.
Leaving backup retention at the default without reviewing it. Seven backups is fine for development, but limiting for production. If you discover a data problem more than a week after it occurred, a 7-backup window will not cover you. Review and increase retention when you first configure the instance, not after an incident.

Production checklist for Cloud SQL resilience

Before taking a Cloud SQL instance to production, work through this list. Most of these settings cannot be changed retroactively without downtime or data risk:

Automated backups are enabled with a window during off-peak hours
Backup retention count is set to 14 or more (review what your recovery window requires)
PITR is confirmed: for MySQL, verify —enable-bin-log was set at creation; for PostgreSQL, confirm backups are on
High availability is set to REGIONAL if zone downtime is unacceptable for this workload
A manual failover test has been run with gcloud sql instances failover to confirm the behaviour
A backup restore has been tested on a non-production instance to confirm the process works end to end
The team knows to take an on-demand backup before running migrations or bulk operations
Alerts are configured in Cloud Monitoring for backup failures and replication lag
Connections to the instance are secured via the Auth Proxy or private IP (see connecting to Cloud SQL securely)

Frequently asked questions

How does point-in-time recovery work in Cloud SQL?

PITR lets you restore a database to any second within the log retention window. For MySQL, binary logging must be explicitly enabled with --enable-bin-log at instance creation — without it, the closest you can recover to is the last full backup. For PostgreSQL, WAL archiving is automatic whenever backups are enabled. You specify a UTC timestamp and Cloud SQL replays the captured logs on top of the nearest full backup to reach that exact state. PITR always creates a new instance rather than overwriting the existing one.

What is the difference between a Cloud SQL HA standby and a read replica?

An HA standby sits in a different zone in the same region, uses synchronous replication, is not user-accessible, and fails over automatically within roughly 60 seconds if the primary zone goes down. A read replica is a separate Cloud SQL instance using asynchronous replication that you can query for reads — but it does not fail over automatically and must be manually promoted. Both can run simultaneously: the standby handles zone failure, the replica handles read scaling. They are complementary, not interchangeable.

How many automated backups should I retain?

The default is 7 (roughly one week). For most production workloads, 14 to 30 is a more practical choice. Retention is counted by number of backups, not calendar days: retaining 14 means you keep the last 14 daily backups. If the instance pauses or a backup job fails on a given day, that day does not count. On-demand backups are not included in this count and persist until you delete them manually.

Does high availability replace backups in Cloud SQL?

No. HA protects against zone failure by failing over to a standby — it does not help you recover from accidental data deletion, a bad migration, or corruption. If someone runs DROP TABLE on the primary, that change replicates immediately to the standby. You need backups and PITR for data recovery. HA and backups solve different problems and should both be enabled for production.

When should I enable high availability in Cloud SQL?

Enable HA for any database where unplanned downtime has a real business impact — production apps, customer-facing services, or anything with an SLA. It roughly doubles the instance cost because you are paying for both the primary and the standby. For internal tools, staging environments, or workloads where a short outage is acceptable, ZONAL (single-zone) is often fine.

Does HA protect against accidental deletion or a bad SQL query?

No. HA is for infrastructure failure, not logical errors. If you accidentally delete rows or run a destructive migration, the HA standby replicates that change immediately — it cannot save you from it. To recover from logical errors, you need PITR or a backup restore. HA and backups are complementary and both belong in a production setup.

Last verified: 23 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.

Cloud SQL Backups, PITR, and High Availability: What to Enable for Production

What each mechanism actually does

What this page helps you decide

How Cloud SQL recovery works end to end

Automated backups

On-demand backups

Point-in-time recovery (PITR)

High availability in Cloud SQL

Restoring from a backup

Backups, PITR, HA, and read replicas compared

When to use each option

Common mistakes

Production checklist for Cloud SQL resilience

Summary

Related guides

Frequently asked questions