What Is Google Cloud Trace? Spans, Waterfalls, and Latency

When a request through your application is slow, the first question is: which part? Cloud Trace is Google Cloud’s distributed tracing service. It records the timeline of a request as it moves through your services, stores it as a set of spans, and renders it as a waterfall view so you can see exactly where time was spent. This page explains what Cloud Trace is, how it works, how to read the console, and when to use it instead of Cloud Monitoring, Cloud Logging, or Cloud Profiler.

The simple explanation

Cloud Trace is a storage and visualization layer for distributed traces in GCP. When a user makes a request to your application, that request touches multiple services: a load balancer, a Cloud Run instance, a database, maybe an external API. Cloud Trace records how long each step took, assembles those measurements into a timeline, and stores it so you can query it later.

The result is a waterfall view: a horizontal bar chart where each bar is one step (called a span), placed on a shared time axis. Wide bars are slow steps. That is the core insight Cloud Trace provides.

What Cloud Trace does

Cloud Trace gives you three things:

A searchable history of traces. Filter by endpoint, status code, latency range, or time window to find the specific slow request you are trying to diagnose.
A waterfall view for each trace. Shows every span as a bar on a shared timeline so you can see parent-child relationships and identify the slow step at a glance.
Latency distribution analysis. Aggregates latency across many traces to show p50, p95, and p99 percentiles and how those change over time. This is how you catch regressions before they escalate.

Cloud Trace does not replace Cloud Monitoring for dashboards and alerting, or Logs Explorer for raw event details. It works alongside them. The three tools answer different questions about the same system.

When to use Cloud Trace

Reach for Cloud Trace when you have a latency problem and need to find where time is going:

A Cloud Run endpoint that normally responds in 200ms is now taking 2 seconds
A GKE microservice chain where one downstream dependency is slow but you do not know which one
Intermittent latency spikes that are invisible in averages but visible in your p99 metric
A deploy that went out yesterday and latency has been slowly creeping up since
You want to know whether the slow step is in application code, a database query, a cache miss, or an external API call

When to reach for a different tool first

Cloud Trace is the right tool for latency questions. For other questions, start elsewhere:

For infrastructure health, uptime, and alerting: Cloud Monitoring
For detailed event investigation inside a service: Cloud Logging
For CPU or memory hotspots inside your application code: Cloud Profiler

Check Monitoring first

If you jump straight to Cloud Trace at the start of an incident, you may spend time in the wrong tool. Confirm the problem exists and which endpoint is affected in Cloud Monitoring first. Traces are for diagnosing a known latency problem, not for discovering that one exists.

How Cloud Trace works

Cloud Trace is built around two concepts: traces and spans.

Traces and spans

A trace represents the complete journey of one request through your system. Every service that touches the request shares the same trace ID, a globally unique identifier that ties all the pieces together.

A span is one unit of work within that journey: one service handling part of the request, one database query, one external API call. Each span records a start time, an end time, and a set of attributes (key-value metadata like HTTP method, URL, status code, database query text, or order ID).

Spans form a parent-child tree. When Service A calls Service B, the span for Service B is a child of the span for Service A. The waterfall view renders this tree as indented horizontal bars on a shared timeline.

Context propagation

For spans from different services to join the same trace, the trace ID must travel with the request. This is done via HTTP headers. The standard is W3C Trace Context: the traceparent header carries the trace ID and the parent span ID. Each service reads the incoming header, creates its own child span, and passes an updated header with any outbound calls it makes.

When you use OpenTelemetry with an instrumented HTTP client, this propagation is handled automatically. For a deeper look at the underlying concepts, sampling strategies, and how context propagation works across service boundaries, see Distributed Tracing.

Analogy

A trace is like tracking a package through a courier network. The package label (trace ID) travels with the parcel at every handoff. Each facility scans it and records the arrival and departure time. Cloud Trace is the system that assembles all those scans into a single timeline.

How traces reach Cloud Trace

Traces reach Cloud Trace through several paths, from most recommended to most low-level:

OpenTelemetry with OTLP (recommended for new setups)

OpenTelemetry is the vendor-neutral open standard for distributed tracing. Instrument your code with the OTel SDK, then export spans using the OTLP exporter. For GCP, you can route spans through an OpenTelemetry Collector configured with the googlecloud exporter, which handles authentication and delivery to Cloud Trace. This approach keeps your application code fully portable: if you switch backends or run locally with a Jaeger collector, you change the collector configuration, not your application code.

OpenTelemetry with the Cloud Trace exporter (direct GCP option)

For simpler GCP-only setups, the CloudTraceSpanExporter (available for Python, Go, Java, and Node.js) sends spans directly to Cloud Trace without needing a collector. This is a valid, working option for applications that run entirely on GCP. It trades portability for simplicity.

Automatic collection from GCP services

Cloud Run, GKE, App Engine, Cloud Functions, and the Cloud Load Balancer automatically emit traces for incoming requests. No instrumentation required for this baseline data. These automatic traces show you request boundaries but not what happens inside your application code, which is why adding application-level spans matters.

Cloud Trace API (low-level)

You can write trace data directly using the Cloud Trace REST or gRPC API. This is the most GCP-specific option and ties your instrumentation to GCP. For new projects, the OTel path is the better starting point.

Note

GCP services like Cloud Run and GKE automatically propagate trace context using both W3C Trace Context and the legacy X-Cloud-Trace-Context header. If all your services run on GCP, some cross-service propagation works without any custom instrumentation. Adding OTel gives you richer span data, custom attributes, and full visibility inside your application code.

A simple request flow example

Here is what a single checkout request looks like as a trace. The user submits a form:

[ROOT]  POST /checkout  (load balancer)  1850ms
  └── [CHILD]  POST /checkout  (Cloud Run: api-service)  1820ms
        ├── [CHILD]  check-inventory  (Cloud Run: inventory-service)  1400ms
        │     └── [CHILD]  SELECT * FROM inventory  (Cloud SQL)  1350ms  ← bottleneck
        ├── [CHILD]  charge-card  (external payment API)  310ms
        └── [CHILD]  send-confirmation  (Pub/Sub publish)  40ms

Each row is a span. The load balancer span is the root. The Cloud Run spans are children. The Cloud SQL query is a grandchild of the inventory service and takes 1350ms of a 1850ms total. That is where you look next. Not by guessing. The waterfall made it obvious.

Without tracing, you would see a slow endpoint in your Cloud Monitoring dashboard and have to work backwards through logs from each service. With tracing, the bottleneck is visible at a glance.

Instrumenting a Python application

The setup below uses the OpenTelemetry SDK with the Cloud Trace exporter, a direct path that works without a collector and is a practical starting point for GCP-only applications. Install the required packages:

pip install opentelemetry-sdk \
            opentelemetry-exporter-gcp-trace \
            opentelemetry-instrumentation-flask

Configure the tracer provider at application startup:

from opentelemetry import trace
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(CloudTraceSpanExporter())
)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

# Create a custom span for a business operation
with tracer.start_as_current_span("process-order") as span:
    span.set_attribute("order.id", order_id)
    span.set_attribute("order.total_usd", order_total)
    # your business logic here

The BatchSpanProcessor buffers spans and exports them asynchronously, so your request handler is not blocked on the network call to Cloud Trace. The same OTel API works in Go, Java, and Node.js; only the exporter package changes. For Flask or FastAPI, the auto-instrumentation library (opentelemetry-instrumentation-flask) wraps HTTP handlers in spans automatically.

Tip

Add span attributes that capture business context. After starting a span, call span.set_attribute(“order.id”, order_id). When a customer reports a problem with a specific order, you can filter Cloud Trace by that attribute to find the exact trace instead of scanning logs by timestamp.

Reading the Cloud Trace console

Trace list

A table of recent traces sorted by time or latency. Each row shows the root span’s HTTP method, URL, total latency, span count, and status. Filter by URL pattern, status code, latency range, or time window. This is your starting point: find a representative slow request, then click into it.

Waterfall view

The waterfall renders the span tree as horizontal bars on a shared time axis. The x-axis is milliseconds from the start of the request. Each bar is one span. Child spans appear indented below their parent.

How to read it: look for the widest bars. A wide bar on a database query span means the database is the bottleneck. A gap between the end of a parent span and the start of its first child means time spent on something not captured as a span (network overhead, middleware, or uninstrumented code). The attributes panel on the right shows the HTTP method, URL, status code, database query, and any custom attributes you added.

Analogy

Think of the waterfall like a multi-track timeline in a video editor. Each track is a service. Each clip is a span. You can immediately see which clip is the longest, which ones run in parallel, and where there are gaps between takes. In Cloud Trace, the longest clip is your bottleneck.

Latency distribution

The latency distribution view aggregates data across many traces for a specific endpoint and shows a histogram: how many requests fell into each latency bucket. This reveals the shape of your latency, not just the average. A bimodal distribution (two humps) often means two distinct code paths with very different performance characteristics, which averages completely hide.

Why averages mislead

Imagine tracking commute times for a month. The average is 28 minutes, which sounds fine. But the histogram shows that 80% of commutes take 18 minutes and 20% take 65 minutes. The average is not representative of any real commute. Latency distributions work the same way: if 1 in 5 requests takes 3 seconds, your average of 600ms hides the problem entirely.

Comparing time periods and spotting regressions

Cloud Trace lets you compare latency distributions between two time windows: before and after a deploy, for example. If p99 moved from 800ms to 2 seconds after Wednesday’s release, the comparison view makes that shift immediately visible. The auto-analysis feature goes further: it scans a sample of traces and surfaces which spans are consistently slow across requests, instead of requiring you to click through individual traces manually.

How to know it is working

After deploying instrumentation, confirm each of these before relying on the data:

Traces appear in the Cloud Trace list for your project
A root span is visible for each incoming request
Child spans are present for database calls, cache lookups, or external API calls you instrumented
Span attributes (HTTP method, URL, status code) are populated
If you configured log correlation, a Logs tab appears within individual traces in the console

Empty waterfall

If you see a single root span with no child spans, your application-level instrumentation is not working. GCP services provide automatic boundary traces, but nothing inside. A waterfall with one bar means you are only seeing the outer shell of each request. Add application spans before trying to diagnose anything from the data.

If traces are not appearing at all, check that the service account running your application has the cloudtrace.agent IAM role, and that the GOOGLE_CLOUD_PROJECT environment variable is set correctly, especially for local development runs.

Jumping between traces and logs

Cloud Trace and Cloud Logging integrate when your structured logs include the trace correlation field. Set logging.googleapis.com/trace to projects/PROJECT_ID/traces/TRACE_ID in your structured JSON log output. Once linked, the Cloud Trace console shows a Logs tab within any trace that contains the log entries emitted during that specific request.

The reverse also works: in Logs Explorer, log entries with a trace link show a trace icon. Clicking it jumps directly to the trace in Cloud Trace. You find the error in the logs, then immediately see the full request timeline that produced it. This cross-signal jump is one of the most practical things in the GCP observability suite. For how all these tools fit into a structured incident investigation, see Debugging Production Systems.

One field, two-way navigation

Adding logging.googleapis.com/trace to your structured log output is a single line of configuration in most frameworks. Once it is there, every log entry links to its trace and every trace shows its log entries. That bidirectional jump eliminates most of the manual searching during an incident.

Cloud Trace vs Cloud Monitoring vs Cloud Logging vs Cloud Profiler

Each tool answers a different question. Use them together, not as alternatives:

Tool	Best for	What it shows	Typical question it answers	Use it first when
Cloud Trace	Latency debugging across services	Request timeline, span durations, bottleneck location	Which service or step is slow?	A specific endpoint has elevated latency
Cloud Monitoring	Dashboards, alerting, system health	Metrics, uptime, SLO burn rate, alert history	Is the system healthy right now?	You need a dashboard or an alert to fire
Cloud Logging	Detailed event investigation	Log entries, errors, structured fields, query text	What exactly happened at this step?	You see an error and need the full details
Cloud Profiler	CPU and memory hotspots inside code	Flame graphs, function-level CPU and heap usage	Which function is burning all the CPU?	Trace shows app code is slow with no sub-spans

A common workflow: Cloud Monitoring fires an alert on elevated p99 latency, Cloud Trace pinpoints the slow span, Cloud Logging shows the error details for that request, and Cloud Profiler shows which function inside that slow span is consuming CPU. See also Monitoring Cloud Run and Monitoring GKE for platform-specific signal combinations.

Common beginner mistakes

Relying only on automatic GCP traces without adding application spans. Automatic traces show you request boundaries: how long each service took to handle the request end-to-end. They do not show what happened inside the service. Without custom spans, the waterfall has one wide bar and no information about which database query, cache lookup, or external call consumed the time.
Not linking logs to traces. Traces and logs are most valuable together. If you do not include the logging.googleapis.com/trace field in your structured log entries, you lose the ability to jump from a slow span to the logs it generated, and vice versa.
Looking at individual traces instead of the distribution. A single slow trace might be an anomaly: a cold start, a noisy neighbor, a one-off spike. The latency distribution view shows whether slowness is consistent, worsening, or clustered at a specific time. Check distribution trends before drawing conclusions from one example.
Skipping span attributes. A span that records only its duration tells you that something happened and how long it took, and nothing else. Add HTTP method, URL, status code, database query, and relevant business context like order ID or user ID. Attributes make traces searchable and dramatically speed up investigation when something goes wrong in production.
Using the Cloud Trace API directly instead of OpenTelemetry. Writing directly to the Cloud Trace API ties your instrumentation to GCP. If you want to run traces locally with Jaeger, or move to a hybrid environment, you have to re-instrument everything. OpenTelemetry gives you the same data with a portable API and a clear migration path.

Frequently asked questions

Does Cloud Trace work with OpenTelemetry?

Yes. Cloud Trace integrates with OpenTelemetry. You can instrument your code with the OTel SDK and send spans to Cloud Trace using the Cloud Trace exporter, or route them through an OpenTelemetry Collector using the OTLP pipeline. The OTel approach keeps your instrumentation vendor-neutral and portable across backends.

What is the difference between a trace and a log?

A trace tracks the structure and timing of one request as it moves across services. A log records what happened at a specific moment inside one service. Traces answer "where did the time go?" Logs answer "what exactly happened at that step?" They are most useful together: link your structured logs to traces using the logging.googleapis.com/trace field and you can jump between both signals in the GCP console.

Can Cloud Trace help with intermittent latency spikes?

Yes, this is one of its strongest use cases. Filter the trace list by high-latency traces for a specific endpoint, then compare slow traces against normal ones. Look for spans that are wide only in the slow traces. The latency distribution view also shows whether spikes cluster around a specific time or deployment, making regressions easy to spot.

Do I need custom spans or are automatic traces enough?

Automatic traces from GCP services show you request boundaries (how long each service took end-to-end). Without custom spans you cannot see inside a service: which database query ran, which cache lookup was slow, which external API call timed out. For real debugging you need application-level spans.

When should I use Cloud Profiler instead of Cloud Trace?

Use Cloud Trace to find which service or span is slow. Use Cloud Profiler to find which function inside that service is slow. If the waterfall shows a wide span on your application code with no obvious database or API call underneath it, that is the signal to open Cloud Profiler and look at CPU and wall-time data for that service.

Last verified: 25 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.