What Is Google Cloud Trace? Spans, Waterfalls, and Latency
When a request through your application is slow, the first question is: which part? Cloud Trace is Google Cloud’s distributed tracing service. It records the timeline of a request as it moves through your services, stores it as a set of spans, and renders it as a waterfall view so you can see exactly where time was spent. This page explains what Cloud Trace is, how it works, how to read the console, and when to use it instead of Cloud Monitoring, Cloud Logging, or Cloud Profiler.
The simple explanation
Cloud Trace is a storage and visualization layer for distributed traces in GCP. When a user makes a request to your application, that request touches multiple services: a load balancer, a Cloud Run instance, a database, maybe an external API. Cloud Trace records how long each step took, assembles those measurements into a timeline, and stores it so you can query it later.
The result is a waterfall view: a horizontal bar chart where each bar is one step (called a span), placed on a shared time axis. Wide bars are slow steps. That is the core insight Cloud Trace provides.
What Cloud Trace does
Cloud Trace gives you three things:
- A searchable history of traces. Filter by endpoint, status code, latency range, or time window to find the specific slow request you are trying to diagnose.
- A waterfall view for each trace. Shows every span as a bar on a shared timeline so you can see parent-child relationships and identify the slow step at a glance.
- Latency distribution analysis. Aggregates latency across many traces to show p50, p95, and p99 percentiles and how those change over time. This is how you catch regressions before they escalate.
Cloud Trace does not replace Cloud Monitoring for dashboards and alerting, or Logs Explorer for raw event details. It works alongside them. The three tools answer different questions about the same system.
When to use Cloud Trace
Reach for Cloud Trace when you have a latency problem and need to find where time is going:
- A Cloud Run endpoint that normally responds in 200ms is now taking 2 seconds
- A GKE microservice chain where one downstream dependency is slow but you do not know which one
- Intermittent latency spikes that are invisible in averages but visible in your p99 metric
- A deploy that went out yesterday and latency has been slowly creeping up since
- You want to know whether the slow step is in application code, a database query, a cache miss, or an external API call
When to reach for a different tool first
Cloud Trace is the right tool for latency questions. For other questions, start elsewhere:
- For infrastructure health, uptime, and alerting: Cloud Monitoring
- For detailed event investigation inside a service: Cloud Logging
- For CPU or memory hotspots inside your application code: Cloud Profiler
If you jump straight to Cloud Trace at the start of an incident, you may spend time in the wrong tool. Confirm the problem exists and which endpoint is affected in Cloud Monitoring first. Traces are for diagnosing a known latency problem, not for discovering that one exists.
How Cloud Trace works
Cloud Trace is built around two concepts: traces and spans.
Traces and spans
A trace represents the complete journey of one request through your system. Every service that touches the request shares the same trace ID, a globally unique identifier that ties all the pieces together.
A span is one unit of work within that journey: one service handling part of the request, one database query, one external API call. Each span records a start time, an end time, and a set of attributes (key-value metadata like HTTP method, URL, status code, database query text, or order ID).
Spans form a parent-child tree. When Service A calls Service B, the span for Service B is a child of the span for Service A. The waterfall view renders this tree as indented horizontal bars on a shared timeline.
Context propagation
For spans from different services to join the same trace, the trace ID must travel with the request. This is done via HTTP headers. The standard is W3C Trace Context: the traceparent header carries the trace ID and the parent span ID. Each service reads the incoming header, creates its own child span, and passes an updated header with any outbound calls it makes.
When you use OpenTelemetry with an instrumented HTTP client, this propagation is handled automatically. For a deeper look at the underlying concepts, sampling strategies, and how context propagation works across service boundaries, see Distributed Tracing.
A trace is like tracking a package through a courier network. The package label (trace ID) travels with the parcel at every handoff. Each facility scans it and records the arrival and departure time. Cloud Trace is the system that assembles all those scans into a single timeline.
How traces reach Cloud Trace
Traces reach Cloud Trace through several paths, from most recommended to most low-level:
OpenTelemetry with OTLP (recommended for new setups)
OpenTelemetry is the vendor-neutral open standard for distributed tracing. Instrument your code with the OTel SDK, then export spans using the OTLP exporter. For GCP, you can route spans through an OpenTelemetry Collector configured with the googlecloud exporter, which handles authentication and delivery to Cloud Trace. This approach keeps your application code fully portable: if you switch backends or run locally with a Jaeger collector, you change the collector configuration, not your application code.
OpenTelemetry with the Cloud Trace exporter (direct GCP option)
For simpler GCP-only setups, the CloudTraceSpanExporter (available for Python, Go, Java, and Node.js) sends spans directly to Cloud Trace without needing a collector. This is a valid, working option for applications that run entirely on GCP. It trades portability for simplicity.
Automatic collection from GCP services
Cloud Run, GKE, App Engine, Cloud Functions, and the Cloud Load Balancer automatically emit traces for incoming requests. No instrumentation required for this baseline data. These automatic traces show you request boundaries but not what happens inside your application code, which is why adding application-level spans matters.
Cloud Trace API (low-level)
You can write trace data directly using the Cloud Trace REST or gRPC API. This is the most GCP-specific option and ties your instrumentation to GCP. For new projects, the OTel path is the better starting point.
GCP services like Cloud Run and GKE automatically propagate trace context using both W3C Trace Context and the legacy X-Cloud-Trace-Context header. If all your services run on GCP, some cross-service propagation works without any custom instrumentation. Adding OTel gives you richer span data, custom attributes, and full visibility inside your application code.
A simple request flow example
Here is what a single checkout request looks like as a trace. The user submits a form:
[ROOT] POST /checkout (load balancer) 1850ms
└── [CHILD] POST /checkout (Cloud Run: api-service) 1820ms
├── [CHILD] check-inventory (Cloud Run: inventory-service) 1400ms
│ └── [CHILD] SELECT * FROM inventory (Cloud SQL) 1350ms ← bottleneck
├── [CHILD] charge-card (external payment API) 310ms
└── [CHILD] send-confirmation (Pub/Sub publish) 40msEach row is a span. The load balancer span is the root. The Cloud Run spans are children. The Cloud SQL query is a grandchild of the inventory service and takes 1350ms of a 1850ms total. That is where you look next. Not by guessing. The waterfall made it obvious.
Without tracing, you would see a slow endpoint in your Cloud Monitoring dashboard and have to work backwards through logs from each service. With tracing, the bottleneck is visible at a glance.
Instrumenting a Python application
The setup below uses the OpenTelemetry SDK with the Cloud Trace exporter, a direct path that works without a collector and is a practical starting point for GCP-only applications. Install the required packages:
pip install opentelemetry-sdk \
opentelemetry-exporter-gcp-trace \
opentelemetry-instrumentation-flaskConfigure the tracer provider at application startup:
from opentelemetry import trace
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider()
provider.add_span_processor(
BatchSpanProcessor(CloudTraceSpanExporter())
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
# Create a custom span for a business operation
with tracer.start_as_current_span("process-order") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("order.total_usd", order_total)
# your business logic hereThe BatchSpanProcessor buffers spans and exports them asynchronously, so your request handler is not blocked on the network call to Cloud Trace. The same OTel API works in Go, Java, and Node.js; only the exporter package changes. For Flask or FastAPI, the auto-instrumentation library (opentelemetry-instrumentation-flask) wraps HTTP handlers in spans automatically.
Add span attributes that capture business context. After starting a span, call span.set_attribute(“order.id”, order_id). When a customer reports a problem with a specific order, you can filter Cloud Trace by that attribute to find the exact trace instead of scanning logs by timestamp.
Reading the Cloud Trace console
Trace list
A table of recent traces sorted by time or latency. Each row shows the root span’s HTTP method, URL, total latency, span count, and status. Filter by URL pattern, status code, latency range, or time window. This is your starting point: find a representative slow request, then click into it.
Waterfall view
The waterfall renders the span tree as horizontal bars on a shared time axis. The x-axis is milliseconds from the start of the request. Each bar is one span. Child spans appear indented below their parent.
How to read it: look for the widest bars. A wide bar on a database query span means the database is the bottleneck. A gap between the end of a parent span and the start of its first child means time spent on something not captured as a span (network overhead, middleware, or uninstrumented code). The attributes panel on the right shows the HTTP method, URL, status code, database query, and any custom attributes you added.
Think of the waterfall like a multi-track timeline in a video editor. Each track is a service. Each clip is a span. You can immediately see which clip is the longest, which ones run in parallel, and where there are gaps between takes. In Cloud Trace, the longest clip is your bottleneck.
Latency distribution
The latency distribution view aggregates data across many traces for a specific endpoint and shows a histogram: how many requests fell into each latency bucket. This reveals the shape of your latency, not just the average. A bimodal distribution (two humps) often means two distinct code paths with very different performance characteristics, which averages completely hide.
Imagine tracking commute times for a month. The average is 28 minutes, which sounds fine. But the histogram shows that 80% of commutes take 18 minutes and 20% take 65 minutes. The average is not representative of any real commute. Latency distributions work the same way: if 1 in 5 requests takes 3 seconds, your average of 600ms hides the problem entirely.
Comparing time periods and spotting regressions
Cloud Trace lets you compare latency distributions between two time windows: before and after a deploy, for example. If p99 moved from 800ms to 2 seconds after Wednesday’s release, the comparison view makes that shift immediately visible. The auto-analysis feature goes further: it scans a sample of traces and surfaces which spans are consistently slow across requests, instead of requiring you to click through individual traces manually.
How to know it is working
After deploying instrumentation, confirm each of these before relying on the data:
- Traces appear in the Cloud Trace list for your project
- A root span is visible for each incoming request
- Child spans are present for database calls, cache lookups, or external API calls you instrumented
- Span attributes (HTTP method, URL, status code) are populated
- If you configured log correlation, a Logs tab appears within individual traces in the console
If you see a single root span with no child spans, your application-level instrumentation is not working. GCP services provide automatic boundary traces, but nothing inside. A waterfall with one bar means you are only seeing the outer shell of each request. Add application spans before trying to diagnose anything from the data.
If traces are not appearing at all, check that the service account running your application has the cloudtrace.agent IAM role, and that the GOOGLE_CLOUD_PROJECT environment variable is set correctly, especially for local development runs.
Jumping between traces and logs
Cloud Trace and Cloud Logging integrate when your structured logs include the trace correlation field. Set logging.googleapis.com/trace to projects/PROJECT_ID/traces/TRACE_ID in your structured JSON log output. Once linked, the Cloud Trace console shows a Logs tab within any trace that contains the log entries emitted during that specific request.
The reverse also works: in Logs Explorer, log entries with a trace link show a trace icon. Clicking it jumps directly to the trace in Cloud Trace. You find the error in the logs, then immediately see the full request timeline that produced it. This cross-signal jump is one of the most practical things in the GCP observability suite. For how all these tools fit into a structured incident investigation, see Debugging Production Systems.
Adding logging.googleapis.com/trace to your structured log output is a single line of configuration in most frameworks. Once it is there, every log entry links to its trace and every trace shows its log entries. That bidirectional jump eliminates most of the manual searching during an incident.
Cloud Trace vs Cloud Monitoring vs Cloud Logging vs Cloud Profiler
Each tool answers a different question. Use them together, not as alternatives:
| Tool | Best for | What it shows | Typical question it answers | Use it first when |
|---|---|---|---|---|
| Cloud Trace | Latency debugging across services | Request timeline, span durations, bottleneck location | Which service or step is slow? | A specific endpoint has elevated latency |
| Cloud Monitoring | Dashboards, alerting, system health | Metrics, uptime, SLO burn rate, alert history | Is the system healthy right now? | You need a dashboard or an alert to fire |
| Cloud Logging | Detailed event investigation | Log entries, errors, structured fields, query text | What exactly happened at this step? | You see an error and need the full details |
| Cloud Profiler | CPU and memory hotspots inside code | Flame graphs, function-level CPU and heap usage | Which function is burning all the CPU? | Trace shows app code is slow with no sub-spans |
A common workflow: Cloud Monitoring fires an alert on elevated p99 latency, Cloud Trace pinpoints the slow span, Cloud Logging shows the error details for that request, and Cloud Profiler shows which function inside that slow span is consuming CPU. See also Monitoring Cloud Run and Monitoring GKE for platform-specific signal combinations.
Common beginner mistakes
- Relying only on automatic GCP traces without adding application spans. Automatic traces show you request boundaries: how long each service took to handle the request end-to-end. They do not show what happened inside the service. Without custom spans, the waterfall has one wide bar and no information about which database query, cache lookup, or external call consumed the time.
- Not linking logs to traces. Traces and logs are most valuable together. If you do not include the
logging.googleapis.com/tracefield in your structured log entries, you lose the ability to jump from a slow span to the logs it generated, and vice versa. - Looking at individual traces instead of the distribution. A single slow trace might be an anomaly: a cold start, a noisy neighbor, a one-off spike. The latency distribution view shows whether slowness is consistent, worsening, or clustered at a specific time. Check distribution trends before drawing conclusions from one example.
- Skipping span attributes. A span that records only its duration tells you that something happened and how long it took, and nothing else. Add HTTP method, URL, status code, database query, and relevant business context like order ID or user ID. Attributes make traces searchable and dramatically speed up investigation when something goes wrong in production.
- Using the Cloud Trace API directly instead of OpenTelemetry. Writing directly to the Cloud Trace API ties your instrumentation to GCP. If you want to run traces locally with Jaeger, or move to a hybrid environment, you have to re-instrument everything. OpenTelemetry gives you the same data with a portable API and a clear migration path.
Summary
- Cloud Trace is GCP’s distributed tracing service: it stores and visualizes the timeline of a request across services
- A trace is the full journey of one request; a span is one step within that journey; spans form a parent-child tree
- The waterfall view shows every span as a horizontal bar on a shared time axis; wide bars are slow operations
- OpenTelemetry is the recommended instrumentation approach; OTLP is the emerging standard export path; the Cloud Trace exporter is a simpler direct option for GCP-only setups
- Automatic traces from GCP services show request boundaries; application-level spans reveal what happens inside each service
- Link structured logs to traces using the
logging.googleapis.com/tracefield to jump between signals in the console - The latency distribution and comparison views reveal regressions across many requests, not just individual slow traces
- Use Cloud Trace for latency; Cloud Monitoring for dashboards; Cloud Logging for event details; Cloud Profiler for CPU and memory hotspots
Frequently asked questions
Does Cloud Trace work with OpenTelemetry?
Yes. Cloud Trace integrates with OpenTelemetry. You can instrument your code with the OTel SDK and send spans to Cloud Trace using the Cloud Trace exporter, or route them through an OpenTelemetry Collector using the OTLP pipeline. The OTel approach keeps your instrumentation vendor-neutral and portable across backends.
What is the difference between a trace and a log?
A trace tracks the structure and timing of one request as it moves across services. A log records what happened at a specific moment inside one service. Traces answer "where did the time go?" Logs answer "what exactly happened at that step?" They are most useful together: link your structured logs to traces using the logging.googleapis.com/trace field and you can jump between both signals in the GCP console.
Can Cloud Trace help with intermittent latency spikes?
Yes, this is one of its strongest use cases. Filter the trace list by high-latency traces for a specific endpoint, then compare slow traces against normal ones. Look for spans that are wide only in the slow traces. The latency distribution view also shows whether spikes cluster around a specific time or deployment, making regressions easy to spot.
Do I need custom spans or are automatic traces enough?
Automatic traces from GCP services show you request boundaries (how long each service took end-to-end). Without custom spans you cannot see inside a service: which database query ran, which cache lookup was slow, which external API call timed out. For real debugging you need application-level spans.
When should I use Cloud Profiler instead of Cloud Trace?
Use Cloud Trace to find which service or span is slow. Use Cloud Profiler to find which function inside that service is slow. If the waterfall shows a wide span on your application code with no obvious database or API call underneath it, that is the signal to open Cloud Profiler and look at CPU and wall-time data for that service.