Rollbacks in Cloud Deploy: How to Revert Safely
When a deployment breaks production, your fastest path to recovery is a rollback. This page explains how rollbacks work in Cloud Deploy specifically: what happens when you run one, how to identify the right release to return to, and what a rollback can and cannot fix. Cloud Run and GKE examples are included for comparison, because the mechanics differ in ways that matter.
What is a rollback, exactly?
A rollback means reverting to a previous known-good version of your application after a bad deployment. It does not erase what happened. It redeploys an older, working version so users stop experiencing the problem while you investigate the root cause.
In Cloud Deploy, a rollback means creating a new rollout that deploys an earlier release to the affected target. The failed deployment stays in your pipeline history. Cloud Deploy does not treat this as an undo. It treats it as a forward deployment of older code, which means the same deployment steps run again, including verification jobs and approval gates.
Think of it like pushing a new git commit
When you discover a bug in a recent commit, you do not rewrite git history. You push a new commit that brings the code back to a working state, and the bad commit stays in the log. Cloud Deploy works the same way. A rollback is a new rollout pointing at an older release. The failed rollout stays on record and your pipeline history remains intact.
How rollbacks in Cloud Deploy work
Cloud Deploy organises deployments around three concepts: releases, rollouts, and targets. A release is a versioned snapshot of what you want to deploy. A rollout is the act of deploying a release to a specific target, such as your production environment.
When you roll back, you are telling Cloud Deploy: create a new rollout for a previous release, to this target, now. Cloud Deploy executes that rollout the same way it executes any other, running pre-deploy hooks, applying the deployment, and running any post-deploy verification jobs you have configured.
Because rollback is a new rollout, your pipeline history stays intact. You will see both the failed rollout and the rollback rollout recorded against the target, which is useful during post-incident review.
If the target has requireApproval: true, the rollback rollout still needs to be approved before it proceeds. Having a named approver available during incidents is part of rollback readiness, not an afterthought. Document who can approve in your runbook.
Cloud Deploy supports automated rollbacks triggered by Cloud Monitoring alerts. When post-deploy verification detects that error rates have crossed a configured threshold, Cloud Deploy can initiate a rollback without human intervention. This works best when paired with canary deployments, where problems surface at a small traffic slice before affecting all users.
When to roll back
Not every production problem calls for a rollback. These are the situations where rolling back is usually the right first move:
- Error rates spiked immediately after a new release reached the target
- Latency increased sharply after deployment and has not recovered
- Health checks started failing on the new version
- A bad configuration change or incorrect manifest reached production
- A canary or post-deploy verification step caught a regression before full rollout
- The failure is clearly caused by the code change, not by an upstream dependency or infrastructure issue
If the root cause is unclear, rolling back is still often the right call. It stops user impact while you investigate. But if the deployment involved database schema changes or external state mutations, assess the risk before acting. Sometimes taking a few minutes to understand the failure is worth it before choosing between rollback and roll forward.
Rollback is operational recovery, not a fix. After rolling back, the underlying problem still exists in your codebase. Find it and fix it before deploying forward again.
How to roll back in Cloud Deploy
Start by identifying which release was last known good. Cloud Deploy lists all releases in a pipeline chronologically:
# List recent releases to identify the last known-good version
gcloud deploy releases list \
--delivery-pipeline=api-pipeline \
--region=europe-west2 \
--project=my-app-prodOnce you know the release name, create a new rollout targeting that release on the affected target:
# Roll back to a specific previous release on the production target
gcloud deploy rollouts create rollback-001 \
--release=api-release-v1-9-0 \
--delivery-pipeline=api-pipeline \
--deploy-target=prod-target \
--region=europe-west2 \
--project=my-app-prodCloud Deploy executes the rollout immediately if the target has no approval requirement. If approval is required, the rollout waits in a pending state until an approver acts. After the rollout completes, verify that your error rates and latency have returned to baseline in Cloud Monitoring. A technically successful rollback does not guarantee all downstream effects have resolved. Watch your dashboards for at least 30 minutes.
Emergency rollback: targeting production directly
Cloud Deploy pipelines define a promotion sequence, typically dev, then staging, then production. In an emergency, you do not need to re-run earlier stages. Specifying the production target directly bypasses the sequence:
gcloud deploy rollouts create emergency-rollback-001 \
--release=api-release-v1-9-0 \
--delivery-pipeline=api-pipeline \
--deploy-target=prod-target \
--region=europe-west2 \
--project=my-app-prodCloud Deploy deploys to the named target without requiring promotion through dev or staging. This is the expected path for production incidents, not a workaround. Approval gates on the target still apply.
If your pipeline uses approval gates, have a named on-call approver available during every production deployment window. An emergency rollback that waits 20 minutes for an approver defeats the purpose of rolling back quickly. Document who can approve and how to reach them in your incident runbook.
What a rollback does not fix
Rolling back your application code does not reverse everything that happened during the bad deployment. These changes persist regardless of which version is running.
- Database schema changes. If your deployment ran a migration, that migration is still applied. The older code will run against the newer schema. This may be safe or it may not be. Assess this for your specific case before rolling back.
- Destructive data operations. Deleted rows, purged queues, or overwritten records are not restored by rolling back the application.
- Messages already processed. If your service consumed and acknowledged queue messages during the bad deployment window, those messages are gone. Rolling back does not replay them.
- External side effects. API calls made to third-party services, emails sent, or webhooks fired during the bad period are not reversed.
- Corrupted downstream state. If other services ingested bad data from your deployment, rolling back your service does not clean up what they already received.
This is why incident response does not end when the rollback completes. Understand the full scope of impact before declaring the incident resolved. In some cases, data remediation is needed alongside or instead of a code rollback.
Cloud Deploy vs Cloud Run vs GKE rollback
The three platforms handle rollback differently. Understanding the mechanics helps you work faster during an incident.
Cloud Deploy rollback
Creates a new rollout that deploys a previous release through your delivery pipeline. Approval gates and verification jobs still apply. Pipeline history is preserved. Use this when you want the rollback to go through the same controlled delivery process as any other deployment: full auditability, consistent process.
gcloud deploy rollouts create rollback-001 \
--release=api-release-v1-9-0 \
--delivery-pipeline=api-pipeline \
--deploy-target=prod-target \
--region=europe-west2Cloud Run rollback
Shifts traffic to a previous revision. No redeployment needed because the revision is already running and warm. The change takes effect in milliseconds since it is a traffic configuration change, not a new deployment. No cold start, no image pull, no container startup time. See CI/CD pipelines for Cloud Run for how revision management fits into a broader delivery workflow.
# List revisions to find the last known-good one
gcloud run revisions list \
--service=api-service \
--region=europe-west2 \
--project=my-app-prod
# Shift 100% of traffic to a previous revision
gcloud run services update-traffic api-service \
--region=europe-west2 \
--to-revisions=api-service-00049-xyz=100 \
--project=my-app-prodGKE rollback
Reverts a Kubernetes Deployment to a previous revision tracked in rollout history. Kubernetes keeps up to 10 revision history entries by default. The rollback triggers a rolling update back to the previous pod spec.
# View rollout history
kubectl rollout history deployment/api-service -n production
# Roll back to the previous revision
kubectl rollout undo deployment/api-service -n production
# Roll back to a specific revision
kubectl rollout undo deployment/api-service -n production --to-revision=3
# Watch rollback progress
kubectl rollout status deployment/api-service -n productionSet revisionHistoryLimit explicitly in your Deployment spec to ensure you always have sufficient history. The default of 10 is usually enough, but reducing it limits your options during an incident.
The practical difference: Cloud Deploy rollback goes through your delivery pipeline and is fully auditable. Cloud Run rollback is the fastest option when speed matters most. GKE rollback is the native Kubernetes path, appropriate when deploying directly to GKE without a Cloud Deploy pipeline. When Cloud Deploy manages your GKE deployments, prefer the Cloud Deploy rollback so your pipeline state stays consistent.
Common mistakes
Not practising rollbacks before an incident. The commands are straightforward when you are calm. Under pressure, they are not. Run a rollback drill in staging at least once a quarter, covering the full procedure: listing releases, creating the rollback rollout, checking approval status, and verifying recovery.
Not documenting the last known-good version. During an incident, you should not have to guess which release was stable. Keep a short note in your runbook, or use a label on the release itself, so the answer is obvious when you need it.
Assuming rollback fixes database or data issues. Rolling back application code does not reverse schema migrations, deleted records, or processed messages. Assess data impact before assuming a rollback fully resolves the incident.
Deleting Cloud Run revisions to reduce clutter. Revisions cost nothing when they receive no traffic. Deleting them removes your rollback options. Only clean up revisions that are many versions old, and keep the last two or three at minimum.
Redeploying the same broken version immediately after rolling back. A rollback stops user impact; it does not fix the bug. Understand what caused the failure before deploying forward again. Deploying the same broken version creates another incident.
Best practices for safer rollbacks
The goal is not to avoid rollbacks. It is to make them fast, predictable, and low-stress when you need one.
- Practise in staging. Run deliberate rollback drills so the procedure is familiar. Include the approval step if your pipeline requires it.
- Maintain runbooks. Each production service should document how to identify the last known-good release, how to initiate a rollback, and who can approve it. Keep these current.
- Use canary or blue/green to reduce blast radius. Canary deployments expose a bad release to a small percentage of traffic first, so you catch failures before they affect everyone. Blue/green deployments keep the previous version warm and ready for an instant switch. Both patterns reduce the urgency of a rollback by limiting initial impact.
- Pair rollbacks with monitoring. When a rollback completes, watch your dashboards actively for 30 to 60 minutes. Confirm that error rates and latency have returned to baseline before closing the incident.
- Use secure CI/CD practices. Rollbacks are easier when your pipeline enforces consistent, auditable delivery. Pipelines with proper security controls give you confidence that the release you are rolling back to is trustworthy.
Summary
- Cloud Deploy rollback creates a new rollout for a previous release. Pipeline history stays intact and approval gates still apply.
- Rollback is operational recovery, not root cause resolution. Fix the underlying problem before deploying forward again.
- Rolling back does not reverse database migrations, processed messages, or external side effects.
- Cloud Run rollback is a traffic shift to a previous revision: instant, no redeploy required.
- GKE rollback uses
kubectl rollout undoto revert to a previous Deployment revision. - Practise rollbacks in staging, maintain runbooks, and know your approvers before an incident forces the decision.
- Canary and blue/green patterns reduce blast radius and make rollbacks less urgent when they happen.
Frequently asked questions
How do I roll back a release in Cloud Deploy?
List your recent releases with gcloud deploy releases list, identify the last known-good version, then create a new rollout targeting that release with gcloud deploy rollouts create. Cloud Deploy treats the rollback as a new forward deployment of an older release, so your pipeline history stays intact.
Is a Cloud Deploy rollback the same as undo?
No. Cloud Deploy rollback does not reverse what happened. It creates a new rollout that deploys a previous release to the affected target, and the failed rollout remains in your history. Think of it as redeploying an older known-good version, not erasing the bad deployment.
Does rollback skip approval gates?
No, not by default. If the target has requireApproval: true, a rollback rollout still needs approval before it proceeds. For emergency rollbacks, document who your approvers are and how to reach them. You can configure approval requirements to allow faster rollback if your pipeline design calls for it.
Can I roll back if my database schema changed?
Rolling back application code does not reverse database migrations. If your deployment ran schema changes, rolling back may leave your older code running against a newer schema. Assess whether this is safe for your specific case. In many situations, rolling forward with a fix is the safer path when data changes are involved.
Should I roll back or roll forward?
Roll back when the problem is isolated to the code change and reverting restores safe behaviour without data risk. Roll forward when the deployment involved schema migrations, the fix is quick and well-understood, or the rollback itself carries risk. The right call depends on the specific failure.