Distributed Tracing in GCP: Traces, Spans, and Context Propagation

A user clicks checkout. The request hits your API gateway, calls the order service, which calls the inventory service, which queries a database. The whole thing takes 4.2 seconds. Which step is slow? Logs from each service show what happened inside that service. Metrics show average latency per service. Neither tells you how the pieces connected for that one specific request. Distributed tracing does.

Distributed tracing tracks a single request as it moves through multiple services, recording each step as a span. All the spans for one request are grouped into a trace. The result is a timeline showing exactly which services were called, in what order, and how long each took.

Three things work together in Google Cloud but are not the same: distributed tracing is the concept, Cloud Trace is Google’s backend for storing and visualizing trace data, and OpenTelemetry is the vendor-neutral toolkit for generating that data. This page explains the concept and mental model. The Cloud Trace page covers the tool and setup.

Simple explanation

Each service your request touches creates a span. All those spans share one trace ID. The result is a complete picture of where the request went and how long it spent at each stop, rendered as a waterfall diagram where you can see the bottleneck at a glance.

Analogy

Think of a request as a parcel moving through a courier network. The entire delivery is the trace, one end-to-end journey with a unique tracking number (the trace ID). Each handoff is a span: picked up from sender, sorted at facility, loaded on truck, delivered. The tracking number travels with the parcel at every step, so you can assemble the complete timeline afterward and see exactly where the delay happened.

Why distributed tracing matters

Average latency metrics hide problems. If 99% of requests complete in 200ms but 1% take 10 seconds, your average looks healthy. When a slow request happens, logs from ten different services tell you what each one did, but not how they interacted for that specific request.

Distributed tracing solves both problems. It preserves the causal structure of a request: Service A called Service B, which called a database, which took 3.1 seconds. When you are working with microservices on GCP, tracing is often the only reliable way to answer “which service made this request slow?”

It also makes service dependencies visible. You can see which services a request depends on, how those dependencies compose, and what happens when one of them slows down or fails.

How distributed tracing works

The structure of a trace

Every trace starts when a request enters your system. The first service creates the root span. Each downstream service call creates a child span attached to its parent. Spans form a tree that mirrors the call graph of the request.

Here is what that looks like for the checkout example:

[ROOT] POST /checkout — api-gateway (4200ms)
  ├── [CHILD] process-order — order-service (3800ms)
  │     ├── [CHILD] check-inventory — inventory-service (3200ms)
  │     │     └── [CHILD] SELECT * FROM inventory — postgres (3100ms)  ← bottleneck
  │     └── [CHILD] reserve-items — inventory-service (120ms)
  └── [CHILD] log-audit — audit-service (90ms)

The Postgres query consumes 3100ms of a 4200ms request. That is your bottleneck. Without tracing, you would only know the total was slow.

What a span contains

Each span records:

Trace ID: shared by every span in the request, a 128-bit random identifier
Span ID: unique identifier for this span within the trace
Parent span ID: the span that triggered this work (empty on the root span)
Start time and end time: used to calculate duration and render the waterfall
Attributes: key-value metadata such as HTTP method, status code, database query, order ID, user ID
Status: OK, ERROR, or UNSET

Attributes are where the diagnostic value lives. A span that records http.method=POST, http.status_code=500, and db.statement=SELECT… tells you far more than one that only records duration.

The waterfall view

Cloud Trace renders traces as horizontal bars on a timeline. Each bar is one span. Nested bars show the parent-child relationship. You can immediately see which span is blocking the others.

Reading the waterfall

Think of a relay race. Each runner’s bar shows when they started and how long they ran. If one runner’s bar extends far to the right while all the others are short, that runner is your bottleneck. The trace waterfall is the same idea: each service has a bar, and the long bar is where your 4-second checkout is going.

Trace context and propagation

For a trace to connect across services, the trace ID and parent span ID must travel with every outgoing request. This is called context propagation, and it happens via HTTP headers.

The current standard is W3C Trace Context, which defines two headers:

traceparent: carries the trace ID, parent span ID, and a flags byte. Format: 00-TRACE_ID-PARENT_SPAN_ID-FLAGS. Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: carries optional vendor-specific metadata. Usually empty.

When the order service calls the inventory service, it reads its current span’s IDs, formats the traceparent header, and includes it in the outgoing HTTP request. The inventory service reads that header, creates a new child span with the correct parent, and continues the trace.

If you use OpenTelemetry, the SDK handles propagation automatically through instrumented HTTP clients. You do not construct the headers manually.

Note

Google Cloud services also support the legacy X-Cloud-Trace-Context header for compatibility with older integrations. The W3C traceparent header is the recommended standard for new instrumentation. Both can coexist if you are integrating with services that still emit the legacy format.

How this works in Google Cloud

When you implement distributed tracing in GCP, three layers work together:

Distributed tracing: the practice and concept, tracking requests across services by creating spans and connecting them into traces
OpenTelemetry: the vendor-neutral instrumentation toolkit, SDKs for your application code that create spans, propagate context, and export data to a backend
Cloud Trace: the GCP backend, stores span data, renders waterfall views, and surfaces latency outliers

Some Google Cloud services, including Cloud Run and GKE, can propagate trace context automatically at the platform level. This means a trace ID generated upstream can flow to downstream services without code changes.

That automatic propagation covers the routing of context. It does not create spans for your internal operations. Custom instrumentation with OpenTelemetry is still needed when you want spans for individual database queries, cache lookups, or external API calls, business-level attributes like user ID or order amount, and accurate visibility into code paths within a service rather than just between services.

Tip

Instrument with OpenTelemetry rather than the GCP-specific client libraries if you can. The exporter configuration is a one-line change to switch backends. Your application code stays the same whether you export to Cloud Trace, Jaeger, or Zipkin.

Sampling strategies

Tracing every request in a high-traffic production system is expensive. Most tracing systems use sampling to record only a fraction of requests.

Common strategies:

Head-based sampling: the decision is made at the root span and travels with the traceparent flags. Simple and low-overhead, but you cannot target slow or error requests specifically.
Tail-based sampling: the decision is made after the entire trace is assembled, so you can capture 100% of slow or error traces while keeping only a small fraction of successful ones. More useful in production since you never lose a slow request because the sampler dropped it at the start.
Fixed-rate sampling: sample a fixed percentage regardless of outcome. A 5% rate means roughly 1 in 20 requests is recorded.

Watch your costs

Cloud Trace charges per span ingested. At high request volume, 100% sampling can generate significant cost fast. Start with a low fixed rate for development and a head-based rate in production. Switch to tail-based sampling once you want complete visibility into errors and latency outliers without paying to record every healthy request.

When to use distributed tracing

Tracing is most useful when:

A request is slow and you do not know which service is responsible. The waterfall view tells you immediately.
A request fans out across multiple services. Monitoring Cloud Run or Monitoring GKE covers aggregate health. Tracing covers individual requests.
You need to understand service dependency chains. Which service calls which, and how does latency compound?
You are debugging an incident and logs alone are not enough. Link traces to your structured log entries using the trace ID and you can jump from a slow span directly to the logs it generated.
You are working with event-driven systems. Propagate trace context through message headers to trace async flows across services and queues.

Tracing is less useful for simple single-service applications where logs or a profiler are sufficient, aggregate trend analysis (use metrics for that), and high-volume batch jobs where per-request tracing cost is prohibitive.

Distributed tracing vs logs vs metrics vs profiling

You get paged at 2am. Checkout is slow. Here is what each observability signal tells you:

The 2am incident scenario

Metrics show request rate is normal but p99 latency spiked 12 minutes ago, and only the order service is affected. That narrows your search.

Logs show no errors in the order service. Requests are completing, just slowly. Dead end with logs alone.

A trace shows the order service calling the inventory service, which calls Postgres, which is taking 3 seconds per query.

Profiling the inventory service reveals the query is doing a full table scan because an index was dropped in the last deploy.

Each signal answered a question the others could not.

More concisely:

Metrics: aggregate trends and patterns over time. Best for alerting, capacity planning, and dashboards.
Logs: detailed events and state within a service. Best for debugging what happened inside one service. Start in Logs Explorer.
Traces: request path and latency across services. Best for debugging slow or broken cross-service requests.
Profiling: CPU and memory usage inside a service at the code level. Best for finding hot spots in one process. See Cloud Profiler.

Distributed tracing vs Cloud Trace

Distributed tracing is the concept. Any system can implement it, and you can export spans to Jaeger, Zipkin, or other backends. Cloud Trace is Google Cloud’s specific product for storing, querying, and visualizing those spans. Using OpenTelemetry keeps your instrumentation portable regardless of which backend you choose.

For a workflow that pulls all four signals together during a live incident, see Debugging Production Systems.

Common beginner mistakes

Not propagating trace context between services. If Service A creates a trace but does not include traceparent when calling Service B, Service B starts a new unrelated trace. You end up with disconnected fragments. Use an OpenTelemetry-instrumented HTTP client and propagation happens automatically.
Setting sampling to 100% in production without considering cost. Cloud Trace charges per span ingested. At high request volume, sampling everything can generate significant cost quickly. Start with a low head-based rate and move to tail-based sampling when you need full coverage of errors and slow requests.
Not adding span attributes. A span with no attributes tells you something happened and how long it took. Nothing more. Add http.method, http.status_code, db.statement, and business context like order.id or user.id. That is where the diagnostic value comes from.
Confusing tracing with logging. Traces show structure and timing across services. Logs show events within a service. Link them by including the trace ID in your structured logs using the logging.googleapis.com/trace field and you can jump from a slow span straight to the log entries it generated.
Expecting traces to replace metrics or logs. Each signal answers a different question. Log-based metrics and Cloud Monitoring cover the aggregate view. Logs Explorer covers per-event detail. Tracing covers cross-service request flow. You need all three.
Instrumenting every internal operation in a high-throughput service. Span volume grows with request volume. Be selective, especially for operations that run thousands of times per second. Focus on operations where latency matters and where attributes will be useful when debugging.

Frequently asked questions

What is the difference between a trace and a span?

A trace represents the complete end-to-end journey of one request through your system, identified by a globally unique trace ID. A span represents one unit of work within that journey: one service handling part of the request, one database query, one cache lookup. Every span in a trace shares the same trace ID, but each span has its own span ID and a parent span ID linking it back to the span that triggered it.

Is distributed tracing the same as Cloud Trace?

No. Distributed tracing is the practice of tracking requests across services using spans. Cloud Trace is Google Cloud's backend product for storing and visualizing trace data. You can use distributed tracing and export to a different backend entirely. It is the same distinction as structured logging (the practice) versus Cloud Logging (the GCP product).

Do I need OpenTelemetry to use distributed tracing in Google Cloud?

Not strictly, but it is the recommended approach. OpenTelemetry is the vendor-neutral instrumentation toolkit most teams use. It handles span creation, context propagation, and exporting to Cloud Trace. Some GCP services automatically propagate trace context, but custom instrumentation with OpenTelemetry gives you richer spans, business-level attributes, and portability to other backends.

Does distributed tracing record every request?

Usually not. Production systems use sampling, recording only a fraction of requests to limit cost and overhead. A 5% sampling rate means you see roughly 1 in 20 requests. Tail-based sampling is more sophisticated: it records 100% of slow or error traces and only a small fraction of fast, successful ones. For development and staging, 100% sampling is fine.

When should I use tracing instead of logs?

Use logs to understand what happened inside one service. Use tracing to understand how a request moved across multiple services and where it slowed down. They are complementary, not alternatives. For a request that is slow but not generating errors, logs often show you nothing useful because no errors are being thrown. A trace will show you which service call took the time. Linking logs and traces using the trace ID in structured log entries gives you both perspectives at once.

Last verified: 19 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.