Managing Environments in GCP CI/CD: Dev, Staging, and Production

Managing environments in CI/CD means deciding how code flows from a developer’s machine to production, how each environment is configured differently, and how to stop a mistake in one place from reaching users. On GCP, the standard answer is one project per environment, with branch-based triggers that control exactly where each push goes.

This page explains the pattern, why it works, and the steps needed to configure it. It also covers the common shortcuts that look fine at first but create real problems later.

What environment management actually means

At its core, environment management is about four things: isolation, promotion, configuration, and safety.

Isolation means that what happens in dev cannot affect prod. Promotion means code travels through environments in a controlled sequence rather than jumping straight to production. Configuration means each environment gets the right values for its context. Safety means a human decision is required before anything reaches real users.

The three environments serve distinct purposes:

  • Development is where code gets written and integration-tested. It accepts frequent, unstable deployments. Speed matters more than reliability here.
  • Staging is the last check before production. It mirrors production closely enough that any problem visible there would also appear live. If staging is too small or too different from prod, it fails at its only job.
  • Production serves real users. Every change requires a deliberate decision, a controlled deployment, and ideally an approval step.
Analogy

Think of a recipe moving from a test kitchen through a taste-test kitchen to a restaurant. The test kitchen can be messy. That is the point. The taste-test kitchen is nearly identical to the restaurant setup so it catches any issue with the real equipment and portion sizes before a customer sees the dish. The restaurant is where it counts.

Why environment management matters

Teams that skip environment management do not avoid problems. They just discover them in production. Here are the specific failure modes that proper environment management prevents:

Analogy

Skipping environment isolation is like a surgeon practising a new technique directly on a patient rather than on a model first. The problem is not that something might go wrong. The problem is that when it does, there is no safe place for it to go wrong.

Config drift

Without a clear system, dev and prod end up configured differently over time. A database connection string, a cache timeout, a feature flag. Nobody documents these differences. Eventually something breaks in prod in a way that nobody can reproduce in dev because the environments have silently diverged.

Note

Config drift is invisible until it matters. It does not cause an immediate failure. It accumulates quietly over weeks or months, then surfaces as a bug that only happens in production and cannot be reproduced anywhere else.

Accidental production deployment

If production deploys automatically on every push to main, any developer can push a half-finished feature directly to users. A single rushed merge becomes a production incident.

Warning

Auto-deploying to production on every push to main is one of the most common setup mistakes on GCP. It feels efficient until the first time a broken commit goes live. Main should deploy to staging. Production needs a deliberate trigger.

Secrets leakage

If dev and prod share a project or a service account, a developer with broad dev access may have access to production secrets. A compromised dev credential then becomes a prod credential.

Schema mismatch

A database migration that runs fine in dev can fail or corrupt data in prod if staging was not tested with representative data and the same migration path. Staging must run the same scripts against a realistic dataset before prod does.

Unrealistic staging

A staging environment running on a micro-tier database with scale-to-zero Cloud Run will not surface cold-start latency, connection pool exhaustion, or memory pressure under realistic concurrency. You get false confidence before every release. See Dev vs Staging vs Production for the concrete configuration differences that matter.

Warning

Staging that is structurally too different from production catches code bugs but not configuration bugs, performance issues, or IAM errors. Those pass straight through and surface in front of users.

The recommended GCP pattern: one project per environment

The standard approach is three separate GCP projects: my-app-dev, my-app-staging, and my-app-prod. Each project has its own resources, IAM bindings, quotas, billing account, and Terraform state. Nothing crosses project boundaries unless you explicitly configure it to.

What you get from separate projects

  • IAM isolation: developers can have broad editor access in dev but read-only or no direct access in prod. Production changes go through the pipeline, not through the console.
  • Blast radius reduction: a mistyped terraform destroy in dev cannot touch prod resources. They live in different projects with separate Terraform state files.
  • Billing clarity: you can see exactly what each environment costs. Dev is usually much cheaper than prod, and you can set separate budgets and alerts per project.
  • Quota separation: a load test or burst of CI builds in dev does not consume prod API quotas or Cloud Run concurrency limits.
  • Secrets separation: Secret Manager secrets live in each project. Dev secrets cannot be accessed from the prod project even with a misconfigured IAM binding.
  • Audit clarity: Cloud Audit Logs per project means prod changes have their own clean audit trail, separate from dev and staging noise.
Tip

If you use Terraform, keep each environment in a separate directory with its own remote state backend in Cloud Storage. This makes it structurally impossible for a terraform apply in the dev directory to affect prod resources, regardless of what is in the config.

How it works in practice

The overall flow follows a simple rule: code moves forward through environments, and each step requires more deliberation than the last.

Step-by-step release flow

  1. Feature branch: tests only. A developer opens a pull request. Cloud Build runs tests automatically on every push. Nothing gets deployed until tests pass and the PR is approved.
  2. Merge to main: automatic staging deploy. Once the PR merges, Cloud Build deploys the new image to the staging project. This is automatic and intentional. Continuous deployment to staging means there is always visibility into whether main is actually deployable.
  3. Staging: validate the release candidate. Automated smoke tests run against the staging URL. The team reviews behaviour manually if the change is significant. Logs and metrics confirm nothing regressed.
  4. Production: deliberate promotion. Promotion to production requires an explicit action. Options: push a version tag like v1.4.0, approve a Cloud Deploy rollout, or run a manual release trigger. No automatic push from main to prod.

Build once, promote the same artifact

The Docker image is built once from the commit SHA and pushed to Artifact Registry. The same image SHA that passes staging is the one deployed to production. If you build a new image for the production deployment, you are deploying code that was never tested in staging. The SHA is the guarantee that both environments ran identical software.

Analogy

Think of the image SHA like a serial number on a manufactured part. The part is tested during assembly. If it passes, that exact part, with that exact serial number, goes into the final product. Nobody manufactures a fresh replacement part for the customer and hopes it performs the same as the tested one.

Note

Cloud Build handles the build-and-test phase. Cloud Deploy handles controlled promotion with approval gates and a full release history. For simple setups, Cloud Build triggers alone are enough. For teams that want formal approvals and an audited delivery record, adding Cloud Deploy is the right next step.

Environment-specific configuration

Each environment needs different values: different database URLs, different service sizes, different resource limits. The way you manage this separation determines how maintainable your pipeline is over time.

Non-sensitive configuration: Cloud Build substitutions

For values that are not secrets, define them as substitution variables on each Cloud Build trigger. Each trigger (one per environment) passes different values for the same variable names. The same cloudbuild.yaml works across all environments because only the trigger-level values change:

# Staging trigger: fires on push to main
trigger:
  branch: main
substitutions:
  _ENVIRONMENT: staging
  _PROJECT_ID: my-app-staging
  _REGION: europe-west2

---

# Production trigger: fires on push of v* tag
trigger:
  tag: "v.*"
substitutions:
  _ENVIRONMENT: production
  _PROJECT_ID: my-app-prod
  _REGION: europe-west2

The build file references these variables by name. No environment-specific values appear in the YAML committed to version control:

steps:
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    args:
      - gcloud
      - run
      - deploy
      - api-service
      - --image=europe-west2-docker.pkg.dev/$_PROJECT_ID/api/api:$SHORT_SHA
      - --region=$_REGION
      - --set-env-vars=ENVIRONMENT=$_ENVIRONMENT
      - --project=$_PROJECT_ID

Secrets: Secret Manager with consistent naming

Each environment project has its own Secret Manager instance. Name secrets identically across projects: database-url in my-app-dev and database-url in my-app-prod are separate secrets with separate values, but the same name. The pipeline uses $PROJECT_ID to build the resource path, so the configuration is identical across all environments:

availableSecrets:
  secretManager:
    - versionName: projects/$PROJECT_ID/secrets/database-url/versions/latest
      env: 'DATABASE_URL'
    - versionName: projects/$PROJECT_ID/secrets/api-key/versions/latest
      env: 'API_KEY'

See Secrets in CI/CD Pipelines for the full availableSecrets syntax and the common mistake of passing secrets as substitution variables, which appear in build logs unredacted.

Infrastructure differences

Infrastructure should stay as similar as possible between staging and production, with only deliberate differences. Acceptable differences: smaller instance counts in staging, a one-tier-lower database, lower max-instances. These save cost without making staging useless as a pre-production gate.

Differences to avoid: scale-to-zero when prod does not scale to zero, missing IAM bindings that exist in prod, different network configurations, or a fundamentally different Cloud Run revision strategy. These stop staging from catching the issues it exists to catch.

Tip

A good rule of thumb: if you had to explain why staging is configured differently from prod, and you cannot give a clear cost or practical reason, the difference should probably not exist. Unexplained infrastructure differences are where staging stops being a reliable gate.

Keeping staging and production in sync

Staging is only useful if it closely mirrors production. Environments drift apart over time unless you actively keep them aligned. This drift is slow and invisible until it matters.

Analogy

Staging drift is like a map that was accurate when it was printed but has not been updated since. You trust it because it used to be right. The moment you rely on it for a critical decision, you discover it is out of date.

What should be identical

  • The Docker image SHA being deployed
  • Database migration scripts, run in the same order
  • Terraform module versions and configuration structure (only variable values differ)
  • Application configuration shape: same environment variable names, different values
  • Service behaviour expectations: same concurrency settings, same health check paths, same startup behaviour

What may legitimately differ

  • Resource sizes: staging can use smaller Cloud SQL tiers and lower Cloud Run min-instances to reduce cost
  • Instance counts: staging does not need full production capacity
  • Data: staging uses anonymised or synthetic data, never real user data
  • On-call: staging does not need an on-call rotation
Warning

Periodically refresh staging data from an anonymised production snapshot. Tests against stale or synthetic data can miss query performance issues, missing index coverage, and edge cases that only appear with realistic data volumes. Never copy unmasked production data to any non-production environment.

When to use this pattern

The three-project, branch-based pattern is the right default for most teams building anything that serves real users. It fits well when:

  • The product has paying or dependent users
  • More than one engineer commits code to the repository
  • A production incident has a real cost in user trust, revenue, or on-call time
  • Infrastructure is managed with Terraform or another IaC tool
  • The application is subject to compliance or audit requirements

When a lighter setup may be acceptable

Personal experiments, prototypes with no real users, and internal tools used only by the team building them may not need the full three-environment setup. A single project with one environment is fine for code that has no impact if it breaks. As soon as real users depend on the service, or as soon as more than one person is committing code, separate the environments and separate the projects.

Tip

If you use GitHub Actions instead of Cloud Build, the same principles apply. Map environments to GitHub environment protection rules, use Workload Identity Federation per environment project, and enforce required reviewers before production deployments run.

Separate projects versus a shared project

Some teams start with a single GCP project and name resources by environment: api-dev, api-staging, api-prod all in the same project. This works briefly and then becomes a liability.

Warning

A shared project approach has five specific structural problems that naming conventions cannot fix:

  • IAM cannot be cleanly scoped to one environment. A developer with Cloud Run admin for dev has that role across all services in the project, including prod.
  • Terraform state is harder to separate. One bad apply can touch resources in every environment.
  • Billing is lumped together. You cannot see what prod costs versus dev without custom labels and queries.
  • Quotas are shared. A load test against dev consumes prod API rate limits.
  • Secret Manager permissions are harder to scope correctly. A misconfigured binding can expose prod secrets to dev code.

The overhead of separate projects is worth it

Three projects instead of one means some additional setup: cross-project IAM for the CI service account, a slightly more complex Terraform directory structure, and separate billing associations. Once configured, this requires no ongoing maintenance beyond what a single shared project would. The isolation you get in exchange is structural, not procedural. It cannot be accidentally bypassed.

Note

Google’s own guidance recommends one project per environment for application workloads. The secure CI/CD pipeline model depends on this separation to enforce least-privilege access correctly.

Common beginner mistakes

  1. Auto-deploying to production on every push to main. The correct setup: main deploys to staging automatically, production requires a deliberate action. A tag push, a Cloud Deploy approval, or a manual trigger all work. A human must decide when each production deployment happens.

  2. Sharing a GCP project across environments. Dev and prod in one project makes IAM isolation nearly impossible. A developer with editor access in dev has editor access everywhere in that project. Separate projects fix this at the structural level.

  3. Rebuilding the Docker image for production. If your prod trigger builds a new image from the same commit, you are not deploying what staging tested. Build the image once, tag it with the commit SHA, and promote that exact SHA through all environments.

  4. Hardcoding project IDs in cloudbuild.yaml. A separate YAML file per environment creates duplicate configuration that diverges over time. Use substitution variables for all environment-specific values. One build file, different trigger-level values.

  5. Letting staging drift away from production. Staging running an older base image, a different database schema, or different IAM bindings will not catch the issues it exists to catch. Review staging configuration after major infrastructure changes, and periodically refresh the data.

  6. Giving humans direct write access to production infrastructure. Direct console changes bypass Terraform, leave no code review trail, and are hard to roll back consistently. Production changes go through the pipeline. Human access to the prod project should be read-only.

  7. Mixing secrets between environments. If a dev service account can read a prod Secret Manager secret, a compromised dev credential becomes a prod credential. Per-project secrets with per-project IAM is the correct isolation pattern. See Secrets in CI/CD Pipelines for the setup.

A practical example: Cloud Run service release

Here is a complete release path for a Cloud Run service, from a feature branch to production. See CI/CD Pipelines for Cloud Run for the detailed pipeline setup.

  1. Branch: feature/add-payments. Developer opens a PR. Cloud Build runs unit and integration tests. The PR cannot merge until tests pass and a reviewer approves.

  2. Merge to main. The staging Cloud Build trigger fires. It builds the Docker image, tags it with the commit SHA (for example, api:a1b2c3d), pushes to Artifact Registry in my-app-staging, and deploys to Cloud Run in the staging project.

  3. Staging validation. Automated smoke tests run against the staging Cloud Run URL. The team reviews manually if the change is significant. Logs confirm normal behaviour and no regressions.

  4. Release tag. A team member pushes a version tag: git tag v1.4.0 && git push —tags. The production Cloud Build trigger fires. It pulls the same image SHA from Artifact Registry and deploys to Cloud Run in my-app-prod.

  5. Production deploy. Cloud Run serves the new revision. Monitoring confirms success. If something goes wrong, rollback is a single command pointing Cloud Run traffic back to the previous revision.

Tip

To add a formal approval gate before production, replace the git tag trigger with a Cloud Deploy pipeline. Cloud Build creates a Cloud Deploy release as its final step, and a team member approves promotion from staging to prod in the console or via API. See Cloud Deploy Overview for the setup.

Frequently asked questions

Should every environment be a separate GCP project?

Yes, for any real workload. Separate projects give you IAM isolation, separate billing, separate Terraform state, and separate quotas. A mistake in dev cannot affect prod resources because they live in entirely different projects. The overhead of three projects is low compared to the protection they give you.

Do small teams need a staging environment?

If the product has real users, yes. Even a solo developer benefits from a staging environment: it is the last place to catch configuration or infrastructure issues before users are affected. The cost of running a staging environment is low for most applications. The cost of skipping it shows up as avoidable production incidents.

Should production deploy automatically from main?

No. Main should deploy automatically to staging. Production should require a deliberate action: a git tag push, a Cloud Deploy approval gate, or a manual trigger. The cost of a one-click promotion is low; the cost of an accidental production deployment can be high.

How do I manage different secrets per environment?

Create the same secret name in Secret Manager in each environment project. The pipeline uses $PROJECT_ID to build the secret path, so the same cloudbuild.yaml works across all environments. Only the project changes; the secret name stays the same.

What should be the same between staging and production?

The Docker image SHA, the database migration scripts, the Terraform module configuration, and the application configuration structure. Differences should be deliberate and minimal: resource sizes, instance counts, and environment variable values. A staging environment that diverges significantly from production gives false confidence before each release.

Last verified: 25 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.