How to Run Automated Tests in Cloud Build

Cloud Build runs each pipeline step inside a container, so you can use any language, test runner, or toolchain without managing build servers. When a test step fails, the pipeline stops immediately. The Docker image never builds and nothing gets deployed. This page explains how to structure test steps, run them in parallel, and fit testing cleanly into a CI pipeline.

What automated testing in Cloud Build actually means

In Cloud Build, automated testing is just a pipeline step that runs your test suite inside a container. There is no special “test mode.” A test step is a normal build step that happens to run pytest, Jest, or whatever your stack uses. What makes it a quality gate is the exit code: if tests fail, the container exits with code 1 and Cloud Build halts the entire pipeline at that point.

You do not need to configure Cloud Build to understand what a test is. You just need your test runner to exit with a non-zero code on failure, which every mainstream test framework does by default.

Because every step runs in its own container, you get a clean, reproducible environment on every build. There is no state left over from a previous run. The same test that passes on your laptop will pass in Cloud Build — and the same one that fails locally will fail there too, assuming you use the same container image and dependencies.

Analogy

Think of the exit code as a security badge scanner at a factory gate. Exit code 0 means the badge scanned green and the worker gets through. Any other code means the scanner flashes red, the gate stays shut, and no one gets through until the issue is resolved. Cloud Build operates exactly the same way: green means continue, non-zero means stop.

How automated testing works in Cloud Build

Understanding the execution model helps you structure test pipelines that are both reliable and efficient.

Every step is a container

When Cloud Build runs a step, it pulls the container image you specify and executes the command inside it. For test steps, you typically use a public language image such as python:3.11, node:20, or golang:1.22, rather than one of the Google-maintained builders. The container has no memory of previous steps and no packages pre-installed beyond what the base image includes.

The /workspace directory is shared

All steps share one directory: /workspace. Your source code is checked out there before the first step runs. Dependencies installed to a path under /workspace in one step are available in the next. This is how multi-step pipelines work: install in step one, test in step two, build in step three.

A non-zero exit code stops everything

If any step’s container exits with a non-zero code, Cloud Build marks that step as failed and stops the pipeline. Subsequent steps do not run. This is the fail-fast behaviour you want from a CI system.

Watch out

The one escape hatch is allowFailure: true, which tells Cloud Build to continue even when a step fails. Never set this on test steps. It is only appropriate for optional steps like cache restoration, where failure is expected and acceptable. Using it on a test step means broken code can pass through the pipeline unchecked.

Steps can run sequentially or in parallel

By default, steps run one after another. Use the waitFor field to declare dependencies between steps. Two steps that both depend on the same completed step will start simultaneously. This lets you run unit tests and lint in parallel, cutting pipeline time without adding complexity. See Cloud Build Overview for a full explanation of how waitFor and step IDs work.

When to add automated tests to Cloud Build

Test steps are worth adding whenever the cost of a broken deployment is higher than the cost of running tests. In practice, that means almost always. Some specific situations where it pays off immediately:

Validating pull requests: Run tests on every PR before merge. Prevent broken code from reaching your main branch before anyone reviews it.
Before building a Docker image: Image builds take time. Catching failures before the build step saves minutes per failed run, which adds up across a team.
Lint plus unit tests before deploy: Combining a lint step with unit tests in parallel before the Docker build is a common pattern for Cloud Run CI/CD pipelines.
Checking helper scripts and infrastructure code: If your team writes Python or bash scripts that support your infrastructure, test steps can validate those too, not just application code.
Separating fast tests from slow integration tests: Run unit tests on every push. Reserve slower integration tests for merges to main or release branches where the extra time is justified.

Tests belong before the Docker build

Without automated tests in CI, developers rely on manual testing, which is slow, inconsistent, and does not scale. But the order of steps matters as much as having them at all.

A pipeline that builds a Docker image first and then runs tests has the sequence backwards. Building an image for a typical Python or Node.js application takes two to four minutes. If your tests are going to fail, you want that failure before you burn that time. Multiply a wasted three-minute build across dozens of commits per week from a small team and you are looking at meaningful lost time.

The principle is straightforward: fail at the cheapest possible step. Tests come before the image build. The image build comes before pushing to Artifact Registry. The push comes before deployment. Each step gates the next.

Analogy

A pipeline is like an assembly line with inspection stations. You do not ship a product and then inspect it. You inspect it at the earliest possible station, before any expensive work has been done downstream. If it fails inspection, it comes off the line immediately. Tests are your earliest and cheapest inspection station.

Running unit tests in Cloud Build

The pattern is the same for any language: one step installs dependencies, the next step runs tests. Both steps use the same container image. Files written to /workspace during install are available during the test step.

For a Python project using pytest:

steps:
  - name: 'python:3.11'
    entrypoint: pip
    args: ['install', '-r', 'requirements.txt', '--target', '/workspace/.deps']
    id: install-deps

  - name: 'python:3.11'
    entrypoint: python
    args: ['-m', 'pytest', 'tests/', '-v', '--tb=short']
    env:
      - 'PYTHONPATH=/workspace/.deps'
    waitFor: ['install-deps']
    id: run-tests

The —target /workspace/.deps flag installs packages into a directory under /workspace, making them available to the test step. The -v and —tb=short flags give you enough detail in the build log to understand what failed without pages of noise.

For a Node.js project using Jest or Vitest:

steps:
  - name: 'node:20'
    entrypoint: npm
    args: ['ci']
    id: install-deps

  - name: 'node:20'
    entrypoint: npm
    args: ['test', '--', '--ci', '--forceExit']
    waitFor: ['install-deps']
    id: run-tests

Tip

npm ci installs from the lockfile exactly, with no version drift between runs. The —ci flag puts Jest into CI mode, which disables interactive output. The —forceExit flag is important: without it, Jest can hang on open handles after tests complete and stall the build indefinitely, which looks like a timeout rather than a test failure.

If tests fail, Cloud Build marks the run-tests step as failed. The build stops there. The Docker image never builds and nothing gets deployed.

Running test and lint steps in parallel with waitFor

By default, steps run sequentially. Once the dependency install step is done, there is no reason to run unit tests and lint one after another: neither depends on the other. Use waitFor to express that both steps depend on install, and Cloud Build will start them simultaneously.

steps:
  - name: 'python:3.11'
    entrypoint: pip
    args: ['install', '-r', 'requirements.txt', '--target', '/workspace/.deps']
    id: install-deps

  # unit-tests and lint both depend only on install-deps — they run in parallel
  - name: 'python:3.11'
    entrypoint: python
    args: ['-m', 'pytest', 'tests/unit/', '-v', '--tb=short']
    env:
      - 'PYTHONPATH=/workspace/.deps'
    waitFor: ['install-deps']
    id: unit-tests

  - name: 'python:3.11'
    entrypoint: python
    args: ['-m', 'flake8', 'src/']
    env:
      - 'PYTHONPATH=/workspace/.deps'
    waitFor: ['install-deps']
    id: lint

  # Docker image builds only after both unit tests and lint pass
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'europe-west2-docker.pkg.dev/$PROJECT_ID/api/api:$SHORT_SHA', '.']
    waitFor: ['unit-tests', 'lint']
    id: build-image

The build-image step lists both unit-tests and lint in its waitFor. It will not start until both have succeeded. If either fails, the build halts and the image never builds.

Note

Use waitFor: [’-’] to start a step at the very beginning of the build without waiting for any previous step. This is useful for steps that are fully independent from the start, such as pulling a pre-built Docker layer from cache while your install step runs in parallel.

On a pipeline that previously ran tests and lint sequentially, parallelising them typically cuts 30 to 50 percent off the combined check time. Faster pipelines mean faster feedback, which means less context-switching while developers wait for CI results.

Integration tests

Unit tests exercise code in isolation. Integration tests exercise code in combination with external dependencies: a database, a queue, an external API. They catch a different class of bugs and are slower to run, which is why they usually run separately from unit tests.

When integration tests are worth it: If your application talks to Cloud SQL and you want to verify that queries execute correctly against a real schema, integration tests give you that confidence. Unit tests with a mocked database do not.

For tests against a Cloud SQL instance in a dedicated test project, run the Cloud SQL Auth Proxy as a background process within the step:

steps:
  - name: 'python:3.11'
    entrypoint: bash
    args:
      - -c
      - |
        pip install -r requirements.txt --target /workspace/.deps
        # Start the Cloud SQL Auth Proxy in the background
        ./cloud-sql-proxy my-test-project:europe-west2:test-db &
        sleep 2
        # Run integration tests against the proxy on localhost
        PYTHONPATH=/workspace/.deps python -m pytest tests/integration/ -v --tb=short
    secretEnv: ['DATABASE_PASSWORD']
    id: integration-tests
availableSecrets:
  secretManager:
    - versionName: projects/$PROJECT_ID/secrets/test-db-password/versions/latest
      env: DATABASE_PASSWORD

The database password comes from Secret Manager via availableSecrets, not from a hardcoded value in the config file. See Secrets in CI/CD Pipelines for the full pattern. Build configs are committed to version control, so credentials must never appear in them.

Watch out

Always use a dedicated test database in a separate test project. Never point integration tests at a production or staging database. Create and clean up test data within the tests themselves. If an integration test run gets interrupted, it should not leave orphaned data in a shared environment.

If your integration tests are slow, consider running them on a separate trigger, for example only on pushes to main rather than on every branch push or pull request. Fast unit tests on every commit, slower integration tests on merges. See Managing Environments in CI/CD for how to structure triggers around different environments.

Caching dependencies between builds

Cloud Build starts completely clean on every run. There is no persistent disk between builds. The upside is isolation and reproducibility. The downside is that you reinstall dependencies from scratch every time, which can add 30 to 90 seconds to builds that install large dependency trees.

The standard workaround is to store installed packages in a GCS bucket and restore them at the start of each build. This is not a true build cache. It is a manual restore-and-save pattern, but it works well for package managers that support a cache directory flag.

steps:
  # Try to restore cached packages — allowFailure so the build continues on a cold start
  - name: 'gcr.io/cloud-builders/gsutil'
    args: ['cp', 'gs://my-app-build-cache/pip-cache.tar.gz', '/workspace/pip-cache.tar.gz']
    allowFailure: true
    id: restore-cache

  - name: 'python:3.11'
    entrypoint: bash
    args:
      - -c
      - |
        if [ -f /workspace/pip-cache.tar.gz ]; then
          tar xzf /workspace/pip-cache.tar.gz -C /workspace
        fi
        pip install -r requirements.txt --target /workspace/.deps --cache-dir /workspace/.pip-cache
    waitFor: ['restore-cache']
    id: install-deps

  # Save the updated cache for the next build
  - name: 'gcr.io/cloud-builders/gsutil'
    entrypoint: bash
    args:
      - -c
      - |
        tar czf /workspace/pip-cache.tar.gz /workspace/.pip-cache
        gsutil cp /workspace/pip-cache.tar.gz gs://my-app-build-cache/pip-cache.tar.gz
    waitFor: ['install-deps']
    allowFailure: true
    id: save-cache

allowFailure: true on the restore step is intentional: there is no cache on the first run, so the gsutil cp command will fail. The build should continue regardless. The same flag on the save step means a cache write failure does not block the actual pipeline work.

Warning

Caching speeds up dependency install time but does not make your tests run faster. If your test suite is slow, caching will not fix that. More importantly, a stale cache can mask dependency version issues — your pipeline might be running against an outdated package without you realising it. Periodically invalidate the cache by deleting the GCS file and verifying the pipeline still works from a clean install.

An alternative to GCS caching is to bake your dependencies into a custom builder image. Push a Docker image with all packages pre-installed to Artifact Registry, then use that image as your step’s name. This is more work to set up but more predictable than the restore-from-GCS pattern.

Coverage thresholds, flaky tests, and test artifacts

Coverage thresholds

A coverage threshold enforces a minimum percentage of code covered by tests. The mechanism is simple: run your coverage tool, check the output, and exit with code 1 if coverage falls below the threshold. Cloud Build treats that exit code as a failure and stops the pipeline.

  - name: 'node:20-alpine'
    script: |
      npm run test:coverage
      node -e "
        const c = require('./coverage/coverage-summary.json');
        const pct = c.total.lines.pct;
        if (pct < 80) {
          console.error('Coverage ' + pct + '% is below the 80% threshold');
          process.exit(1);
        }
        console.log('Coverage OK: ' + pct + '%');
      "
    id: test-coverage

Warning

Set the threshold at a level your team can actually maintain. A 95% target that fails weekly because of legitimately untestable code (CLI entry points, error paths, generated code) trains engineers to work around it rather than respect it. Start at your current coverage level and raise it incrementally. A threshold that nobody takes seriously is worse than no threshold at all.

Flaky tests

Flaky tests pass sometimes and fail sometimes, usually due to timing issues, test isolation problems, or external dependencies that behave inconsistently. Do not use allowFailure: true broadly to hide them. This masks real failures and erodes confidence in the pipeline. Instead, quarantine known flaky tests in a separate suite that runs but does not block the build. Track them as technical debt and fix them. A pipeline that nobody trusts is worse than no pipeline.

Saving test results as artifacts

Save test result files so you can inspect them after a build fails without having to re-run it:

artifacts:
  objects:
    location: 'gs://my-app-build-artifacts/$BUILD_ID/'
    paths:
      - 'test-results/*.xml'
      - 'coverage/lcov.info'

JUnit XML files can be parsed by many reporting tools. Having them stored against the build ID makes it straightforward to trace failures back to specific test runs.

Unit tests vs integration tests in Cloud Build

These are not competing approaches. They cover different things and belong at different points in the pipeline.

	Unit tests	Integration tests
What they test	Individual functions and classes in isolation	Code working with real external systems
Speed	Fast (seconds to low minutes)	Slower (minutes)
Dependencies required	None beyond your code	Database, APIs, other services
Run on	Every commit and pull request	Merges to main, release branches
Failure tells you	Logic error in your code	Integration problem with an external system
Cloud Build setup	Simple: one container	More complex: needs a proxy, secrets, and a test environment

A mature pipeline runs unit tests first on every push, fails fast if they break, then runs integration tests on a separate trigger or after the unit tests pass on main. This keeps PR feedback fast while still catching integration-level problems before they reach production. See Dev vs Staging vs Production for how environment separation reinforces this pattern.

Common mistakes

Running tests after building the Docker image. If tests fail, you have wasted the entire image build. Tests should block the image build, not follow it. Always put test steps first and use waitFor to make the Docker build step depend on them.
Not understanding step isolation. Each step gets a fresh container. Packages installed in step 1 do not exist in step 2 unless you wrote them to /workspace. If your test step cannot find dependencies, check where your install step wrote them and whether the test step can see that path.
Using allowFailure on test steps. allowFailure: true is appropriate for optional steps like cache restoration. It is never appropriate for test steps. If tests fail and the build continues anyway, your pipeline is lying to you.
Running tests without verbose output. A build log that says “3 tests failed” with no detail is not useful. Pass verbose flags (-v for pytest, —verbose for Jest) so failures are readable in the Cloud Build log without re-running the build locally.
Using production resources in integration tests. Running destructive integration tests against a production database, even a read query that generates unexpected load, is a serious mistake. Always use a dedicated test environment. For Cloud SQL, that means a separate test instance in a separate GCP project.
Putting too much logic into a single build step. A step that installs dependencies, runs migrations, seeds test data, runs tests, and generates coverage reports is hard to debug when it fails. Break complex logic into separate named steps. Each step should do one clear thing. The step IDs in your waitFor fields become self-documenting pipeline structure.

Frequently asked questions

Does Cloud Build stop the pipeline when a test fails?

Yes. Cloud Build treats any step that exits with a non-zero code as a failure. The pipeline stops immediately and nothing after that step runs — the Docker image never builds and nothing gets deployed. This is the fail-fast behaviour you want.

Can Cloud Build run test steps in parallel?

Yes. Use the waitFor field with step IDs. Two steps that both list the same waitFor dependency will run in parallel as soon as that dependency completes. A common pattern is to run unit tests and lint in parallel after the dependency install step finishes.

Where should I install dependencies in Cloud Build?

In a dedicated install step that runs before your test steps. Steps share the /workspace directory, so files written there are available to later steps. Install dependencies to a directory under /workspace (or use the --cache-dir flag to point pip or npm there) so they persist across steps.

Should I run tests before building a Docker image?

Always. Building a Docker image takes time. If your tests are going to fail, you want that failure to happen before you spend two or three minutes on an image build. Put test steps first, use waitFor to block the build step on them, and let the pipeline fail at the cheapest possible point.

Can I run integration tests in Cloud Build?

Yes, but they need more setup. Integration tests that require a database or external service typically use the Cloud SQL Auth Proxy as a background process within the step, connecting to a dedicated test database — never a production or staging one. Pull any credentials from Secret Manager rather than hardcoding them in the build config.

Last verified: 25 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.