GCP Metrics Explained: Gauge vs Delta vs Cumulative in Cloud Monitoring
Cloud Monitoring stores all observability data as time series: sequences of numeric measurements over time, each tagged with labels. Every metric has a kind — gauge, delta, or cumulative. That kind determines how to interpret values, build charts, and write alerts. Get it wrong and your alerts fire on incorrect conditions, or never fire at all.
What is a metric?
A metric is a named, numeric measurement that changes over time. Cloud Monitoring records these as time series: ordered sequences of data points, each with a timestamp and a value.
For example, CPU utilization on a VM is a metric. Every minute, Cloud Monitoring records “this VM’s CPU was at 43%.” Those data points accumulate into a time series you can chart, alert on, and query.
Every metric in Cloud Monitoring has three defining properties:
- Metric type: the unique identifier, like
compute.googleapis.com/instance/cpu/utilization - Metric kind: gauge, delta, or cumulative
- Value type: INT64, DOUBLE, BOOL, or DISTRIBUTION
The metric kind is the one most beginners overlook. It also causes the most broken alerts.
Open Metrics Explorer in the GCP Console (Monitoring → Metrics Explorer). Select any metric and the details panel shows its kind, value type, unit, and labels. This is the fastest way to check an unfamiliar metric before writing an alert or dashboard.
Why metric kind matters
Metric kind is not an implementation detail. It directly affects whether your monitoring works correctly.
- Bad alerts. Alert on a raw cumulative value and the condition is true the moment the counter starts. The alert fires immediately and stays firing, paging you for something that was never actually wrong.
- Misleading charts. Chart a delta metric without proper alignment and you get jagged spikes or flat lines depending on UI defaults, not what actually happened.
- False confidence. A gauge showing zero requests might mean no traffic, or it might mean the wrong time window is selected. Knowing the metric kind tells you which interpretation is correct.
A team sets an alert: fire when container.googleapis.com/container/cpu/core_usage_time exceeds 100. That metric is cumulative. It starts at 0 and climbs forever. The alert fires within minutes of deployment and never clears. On-call gets paged at 2am for a container that is running perfectly. This is not hypothetical — it is a routine mistake with cumulative metrics.
Understanding metric kinds makes you faster when writing alerting policies and building dashboards. You know immediately which aggregation to reach for.
How metrics work in GCP
When Cloud Monitoring collects a measurement, it stores it as a data point on a time series. Each time series is uniquely identified by its metric type plus its label values.
Here is how the pieces fit together:
- Metric descriptor: defines the metric’s type string, kind, value type, unit, and allowed labels. Every metric has exactly one descriptor. Inspect it in Metrics Explorer or via the
metricDescriptors.getAPI. - Time series: the stream of data points for one specific combination of metric type and label values. CPU utilization for vm-instance-a in us-central1-a is one time series. The same metric for vm-instance-b is a different time series.
- Labels: key-value pairs that identify which resource or dimension produced the data. Labels let you filter by service, region, response code class, and more.
- Value type: what kind of number the metric records. INT64 for integers, DOUBLE for floating-point, DISTRIBUTION for latency histograms.
- Metric kind: the semantic meaning of each data point. An instantaneous reading, a count over an interval, or a running total.
Cloud Monitoring stores all this data in Monarch, Google’s internal time-series database used to monitor Google’s own infrastructure. You interact with Monarch only through the Cloud Monitoring API, which is why the query model is time-series-first rather than SQL-style.
Gauge, Delta, and Cumulative compared
| Metric kind | What it means | Typical example | How to chart it | How to alert on it | Common mistake |
|---|---|---|---|---|---|
| Gauge | Current value at this instant | CPU utilization, memory usage, Pub/Sub backlog | Plot the raw value | Alert when value exceeds threshold | Assuming two readings imply a meaningful rate |
| Delta | Count of events in this interval | Request count, error count, messages published | Sum or rate per alignment period | Alert on rate or sum per window | Charting without specifying an alignment period |
| Cumulative | Running total since start | Total CPU core-seconds, total billable instance time | Apply Rate alignment first | Alert on rate of increase, not the raw value | Alerting on the raw value (it always increases) |
Gauge
A gauge measures a value at a specific point in time. The value can go up or down between readings. Each data point is independent: it tells you the state right now, not how it changed.
Verified GCP examples:
compute.googleapis.com/instance/cpu/utilization: CPU as a fraction (0.0–1.0). Each point shows utilization at that moment.run.googleapis.com/container/memory/utilization: Cloud Run container memory as a fraction of the memory limit.pubsub.googleapis.com/subscription/num_undelivered_messages: current backlog size. This is a gauge because the backlog can grow or shrink at any time depending on whether your subscriber is keeping up.cloudsql.googleapis.com/database/cpu/utilization: Cloud SQL CPU usage right now.
Charting and alerting: Plot the raw value. Alert when it exceeds a threshold. No alignment conversion needed.
Computing a “rate of change” by subtracting consecutive gauge readings. Gauges measure state, not flow. Two readings of 72% and 74% do not tell you how fast CPU is rising. For that, you need a metric specifically designed to measure CPU increase over time.
Delta
A delta metric counts events that occurred during a measurement interval. Each data point covers a fixed window of time and resets at the start of the next window. It does not accumulate across intervals.
Verified GCP examples:
run.googleapis.com/request_count: requests handled during the interval. A value of 500 means 500 requests happened in that window.run.googleapis.com/request_latencies: latency distribution for requests in the interval (DISTRIBUTION value type).pubsub.googleapis.com/topic/send_message_operation_count: messages published to a topic during the interval.
Charting and alerting: In Metrics Explorer or an alert policy editor, set the per-series aligner to Sum for totals or Rate for per-second throughput. Always specify an explicit alignment period. Do not leave it on the UI default when writing alerts.
Charting raw delta values without an alignment period. The UI may show spikes that are artefacts of interval length, not real traffic increases. If the reporting interval changes, the chart looks like a traffic spike even when nothing changed.
Cumulative
A cumulative metric is a running total that only increases. It has a start time and grows over the lifetime of the process or resource.
Verified GCP examples:
container.googleapis.com/container/cpu/core_usage_time: total CPU core-seconds used by a GKE container since it started. This is a cumulative DOUBLE.run.googleapis.com/container/billable_instance_time: total billable instance time for a Cloud Run service, accumulated since service creation.
Charting and alerting: Never chart or alert on the raw cumulative value. It slopes upward forever and tells you nothing useful on its own. Apply Rate alignment (per-series aligner ALIGN_RATE) to convert it to a per-second rate of increase. In PromQL, the rate() function does this automatically for counter metrics. In the alert policy editor, Cloud Monitoring flags cumulative metrics and prompts you to set an alignment period for exactly this reason.
Setting an alert threshold on the raw cumulative value. The alert fires the moment the counter exceeds the threshold and never clears. This will page your on-call team immediately on every deployment. Apply ALIGN_RATE first, then compare the rate to a threshold.
A gauge is a speedometer: it shows what is happening right now. A cumulative metric is an odometer: it only goes up and shows the total distance since the counter started. To find your current speed from an odometer, you divide the distance change by the time elapsed. Rate alignment does exactly that calculation for you.
How to choose the right metric kind
When reading an unfamiliar metric or designing a custom one, ask these three questions:
- Can the value go up and down right now? Use Gauge. (CPU load, active connections, queue depth, memory usage)
- Does it count events that happened during an interval? Use Delta. (Requests per minute, errors per hour, bytes transferred this window)
- Is it a running total that only ever increases? Use Cumulative. (Total bytes sent since startup, total requests served since deployment)
If you are creating a custom metric, prefer cumulative for counters. Cumulative counters preserve full history even if your reporting interval changes, and OpenTelemetry maps its counter type to cumulative by default when exporting to Cloud Monitoring.
Always verify your assumption by checking the metric descriptor in Metrics Explorer before writing an alert. A metric named “count” might be cumulative, not delta.
Metric labels explained
Labels are key-value pairs attached to each time series. They split one metric into multiple streams: one per service, one per region, one per response code class.
For example, run.googleapis.com/request_count includes these labels:
service_name: which Cloud Run service handled the requestrevision_name: which deployment revisionresponse_code_class:2xx,4xx, or5xxlocation: GCP region
When writing an alert condition, filter on labels to scope it precisely: “alert when 5xx request count for this specific service in this region exceeds 10 per minute,” not for all services in the project. Without label filters, you get aggregated noise that obscures which service is actually broken.
Labels like user_id, session_id, or request_id create one time series per unique value. Millions of users means millions of time series: expensive to store and slow to query. Use labels only for bounded dimensions such as environment, region, service name, response code class, and feature flag name.
Built-in metrics vs custom metrics
GCP services publish hundreds of metrics automatically. As soon as you use a service, its metrics appear in Cloud Monitoring with no instrumentation code required. Built-in metrics cover infrastructure: CPU, memory, disk I/O, network throughput, request counts, latency distributions.
What they do not cover is your application’s business logic.
Create a custom metric when:
- You need to track business events: orders placed, payments processed, jobs enqueued
- You need error rates for specific code paths not surfaced by infrastructure metrics
- You need feature usage counters or experiment exposure rates
- You need an application-level queue depth separate from the underlying infrastructure queue
Custom metric type strings start with custom.googleapis.com/.
The recommended instrumentation path is OpenTelemetry. It is the vendor-neutral standard for metrics, traces, and logs. Configure the Google Cloud Monitoring exporter and your metrics flow into Cloud Monitoring without coupling your code to a GCP-specific API. If you later need to export to a second backend, you add an exporter rather than rewriting instrumentation. See Cloud Monitoring Overview for how custom metrics fit into the broader observability picture.
For simple one-off use cases, you can also write directly to the Cloud Monitoring API using a client library. This is faster to prototype but ties your code to GCP.
When this knowledge matters
You need to understand metric kinds when:
- Writing alerting policies. The correct threshold and aggregation depend entirely on the metric kind. A wrong assumption means a broken or permanently-firing alert.
- Building dashboards. Charts need the right aligner. A cumulative metric without rate alignment shows a line that climbs forever.
- Debugging incidents. During an outage, knowing whether a spike is from a gauge or a delta metric changes how you interpret the chart. See Incident Response with Monitoring for the full workflow.
- Instrumenting applications. Choosing the wrong metric kind for a custom metric means every downstream alert and dashboard built on it will need rethinking.
- Reading service dashboards. Cloud Run, GKE, Pub/Sub, and Cloud SQL each use a mix of gauge, delta, and cumulative metrics. Without knowing which is which, you will misread the charts. See Debugging Production Systems for how metric kind affects root-cause analysis.
Real examples in common GCP services
For each example: what is the metric kind, and what does that mean for how you should use it?
Cloud Run
run.googleapis.com/request_count: Delta. Counts requests in each measurement interval. To alert on error rate, filter byresponse_code_class=5xxand use Sum alignment. See Monitoring Cloud Run for a complete walkthrough.run.googleapis.com/request_latencies: Delta, DISTRIBUTION. Use percentile aggregation (p95, p99) to catch latency tail issues. Averaging a distribution discards the shape. Percentiles tell you what your slowest users actually experience.run.googleapis.com/container/memory/utilization: Gauge. Alert when utilization stays above 0.85 for more than a few minutes to catch memory pressure before OOM kills start.
GKE
container.googleapis.com/container/cpu/core_usage_time: Cumulative. Apply Rate alignment to get CPU core usage per second. Chart it alongsidekubernetes.io/container/cpu/request_cores(gauge) to see how close workloads are to their CPU requests. See Monitoring GKE for recommended dashboards and alert thresholds.kubernetes.io/container/memory/used_bytes: Gauge. Current memory in bytes. Alert when it approaches the container memory limit to catch memory leaks early.
Pub/Sub
pubsub.googleapis.com/subscription/num_undelivered_messages: Gauge. Current backlog size — how many messages are waiting right now. It can go up or down depending on whether your subscriber is keeping up. Alert when it exceeds your acceptable lag threshold.pubsub.googleapis.com/subscription/oldest_unacked_message_age: Gauge. Age in seconds of the oldest unacknowledged message. Often a more useful signal than raw backlog count. If messages are aging out, consumers are stalled.
Cloud SQL
cloudsql.googleapis.com/database/cpu/utilization: Gauge. Alert when it stays above 0.8 for several minutes. A brief spike is normal during a query burst. Sustained high utilization points to an under-provisioned instance or a runaway query.cloudsql.googleapis.com/database/disk/bytes_used: Gauge. Monitor and alert with enough headroom before hitting the disk limit. A full disk causes writes to fail immediately.
Metrics vs logs vs log-based metrics
Cloud Monitoring has three distinct data types. Knowing the difference prevents confusion about where to look for information.
Think of a hospital ward. The vital signs monitor showing heart rate every second is a metric: numeric, continuous, instant to query. The nurse’s written notes (“patient reported chest pain at 14:32”) are logs: detailed, text-based, specific to one event. A weekly report counting how many chest pain events occurred is a log-based metric: a number derived by counting log entries.
- Metrics: numeric time series. Fast to query, efficient to store, ideal for alerting on thresholds, rates, and distributions. Cannot tell you why something happened.
- Logs: structured or unstructured records of individual events. Tell you exactly what happened, including stack traces, request payloads, and context. Expensive to query at scale. Managed in Logs Explorer. See Structured Logging to make your logs filterable and useful for deriving metrics.
- Log-based metrics: metrics derived from log entries. A log-based metric counts how many log entries match a filter per minute, or extracts a value from a log field and tracks it as a distribution. This bridges the two systems: you get the alerting efficiency of metrics from events that only exist in logs. See Log-Based Metrics for how to create them.
Use metrics for ongoing monitoring and threshold-based alerting. Use logs when you need to understand a specific event in detail. Use log-based metrics when an event only exists in your application logs but you need to alert on its frequency.
Common mistakes
- Alerting on a raw cumulative value. A cumulative metric grows forever. An alert that fires when the value exceeds 1000 will trigger immediately and never clear. Apply Rate alignment (
ALIGN_RATE) to get the per-second rate of increase, then compare that rate to a threshold. - Thinking Pub/Sub’s num_undelivered_messages is a delta. It is a gauge. It shows current backlog size, not the count of new messages delivered in an interval. Alert on the raw gauge value.
- Using high-cardinality label values. Labels like
user_idortrace_idcreate one time series per unique value. Millions of users means millions of time series: significant cost and slow queries. Use bounded dimensions only. - Averaging a distribution metric. Latency metrics like
run.googleapis.com/request_latenciesare DISTRIBUTION type. Averaging discards the shape. Use percentile aggregation (p50, p95, p99) instead, otherwise you will miss the tail latency affecting your slowest users. - Copying metric examples without checking the descriptor. Before writing an alert, look up the metric kind in Metrics Explorer. A metric named “count” or “total” might be cumulative, not delta. The name alone does not tell you.
- Using logs when a metric would work. Running a Logs Explorer query every minute to count error occurrences is slower and more expensive than using a metric or log-based metric. Metric-based alerts are evaluated continuously and efficiently.
Summary
- Three metric kinds: gauge (current value), delta (change per interval), cumulative (running total)
- Every metric is identified by a type string like
run.googleapis.com/request_count - Cumulative metrics must use Rate alignment before charting or alerting. Never alert on the raw value
pubsub.googleapis.com/subscription/num_undelivered_messagesis a gauge, not a delta- Always verify metric kind in Metrics Explorer before writing an alert or dashboard
- Labels filter and group time series. Keep cardinality low to control cost and query speed
- GCP services emit built-in metrics automatically. Custom metrics use
custom.googleapis.com/ - OpenTelemetry is the recommended instrumentation path for custom metrics
Frequently asked questions
What is the difference between gauge, delta, and cumulative metrics in GCP?
A gauge measures a value right now (like CPU at 72%). A delta counts events during a measurement interval (like 500 requests in the last minute). A cumulative metric is a running total that only increases (like total requests since the process started). The metric kind determines which aggregation to use for charts and alerts.
How do I know which metric kind a Google Cloud metric uses?
Open Metrics Explorer in the GCP Console, select the metric, and check the metric details panel. You can also call the Cloud Monitoring API metricDescriptors.get endpoint to read the descriptor programmatically. Always check the kind before writing an alert, do not assume.
Are request counts in Cloud Monitoring usually gauge or delta?
Request counts are usually delta metrics. run.googleapis.com/request_count, for example, reports the number of requests handled during each measurement interval. To get a per-second rate, apply a rate or sum alignment in your chart or alert policy.
When should I create a custom metric?
Create a custom metric when built-in GCP metrics do not cover what you need to observe. Common cases include business events (orders placed, payments processed), application-level queue depths, feature usage counters, and error rates for specific code paths.
Should I use Metrics Explorer, PromQL, or the Monitoring API?
For interactive exploration in the console, use Metrics Explorer. It is the fastest way to browse and visualize metrics without writing a query. For writing alerts or dashboards with a query language, PromQL is available in Cloud Monitoring and is a good choice if you know it. For programmatic access from scripts or applications, use the Cloud Monitoring API or a client library (Python, Go, Java, etc.).