How to Create Alerts in Google Cloud Monitoring (GCP)

An alerting policy in Cloud Monitoring watches a metric continuously and tells you when something crosses a threshold, so you find out about service degradation before your users do. This page explains how alerting policies work, walks through creating one in the GCP Console, and provides correct Terraform examples for a Cloud Run error ratio alert and a p99 latency alert.

Cloud Monitoring alerts are built on the same metrics pipeline you use in dashboards. The difference is that a policy watches a metric on your behalf and creates an incident when a condition is met. After reading this page you will be able to create your first alert, configure notification channels, and avoid the calibration mistakes that lead to alert fatigue.

Simple explanation

An alerting policy has three parts:

Condition: which metric to watch, how to aggregate it, and what threshold triggers the alert
Duration (retest window): how long the condition must be continuously true before an incident is created. This is what separates a brief spike from a real problem.
Notification channel: where to send the alert. Options include email, Slack, PagerDuty, Pub/Sub, and webhook.

When the condition is met for the full duration, Cloud Monitoring creates an incident and sends a notification. When the condition clears, the incident closes automatically and optionally sends a second notification so your team knows the problem has resolved.

Analogy

An alerting policy is like a smoke detector. The sensor (condition) watches for smoke. The alarm only sounds if smoke persists (duration) rather than from a single puff. The siren and phone notification (channels) tell you when to respond. A detector that goes off every time you make toast is miscalibrated, and you will eventually disable it. Setting an appropriate duration is how you avoid that.

Warning

Setting duration to 0 on every alert means the policy fires from a single bad data point. Brief spikes in CPU, errors during a deployment rollout, and transient network issues are all normal and short-lived. For most threshold alerts, start with a duration of 2 to 5 minutes so the condition must persist before you are paged.

Why alerts matter

Dashboards tell you what is happening when you are looking. Alerts tell you what is happening when you are not. Without alerts, you learn about incidents from user reports. With well-configured alerts, you know about degradation before most users are affected, and you have the metric data and runbook link right in the notification to start investigating immediately.

The challenge is calibration. Too many alerts, especially noisy ones that fire from brief spikes, train people to ignore them. Effective alerting is specific, actionable, and infrequent enough that when it fires, people take it seriously. Everything in this page is aimed at that goal.

When to create alerts

Create alerting policies for conditions that require a human response. Common categories:

Error ratio: when 5xx responses exceed a percentage of total traffic on a production service
Latency degradation: when p99 request latency exceeds a threshold that affects user experience
Uptime failures: when an external health check cannot reach your service from multiple locations
Metric absence: when a service or job stops reporting metrics entirely, which is a common sign of a crash or deployment problem
Resource saturation: when CPU, memory, or disk utilization approaches limits
Quota usage: when GCP quota consumption reaches a high percentage, before requests start being rejected
Failed batch jobs: when a scheduled Cloud Run job or Dataflow pipeline does not complete successfully

If you are monitoring a Cloud Run service, see Monitoring Cloud Run for the specific metrics to prioritise. For GKE workloads, see Monitoring GKE.

How Cloud Monitoring alerts work

Condition

A condition specifies the metric to watch, the aggregation to apply, and the comparison threshold. Cloud Monitoring supports four condition types, covered in the next section. The condition is evaluated continuously against incoming metric data.

Alignment period

Before evaluating a condition, Cloud Monitoring resamples raw metric data into fixed time windows called the alignment period. For a 60-second alignment period, each data point represents the metric aggregated over the previous 60 seconds as a rate, sum, mean, or percentile depending on the aligner you choose. A shorter alignment period is more responsive but produces noisier results. 60 seconds is a reasonable default for most production alerts. See Metrics in GCP for how aligners and metric kinds interact.

Duration (retest window)

Duration is how long the condition must be continuously true before an incident is created. A duration of 0 fires the moment the condition is first met, which is useful for a complete service outage but noisy for most threshold alerts. A duration of 5 minutes (300s) means the condition must hold across five consecutive 60-second alignment windows. For most error rate and latency alerts, 2 to 5 minutes is a good starting point.

Analogy

The alignment period and duration solve different problems. Think of a speed camera on a highway. The camera takes a reading every few seconds (alignment period). The speeding fine is only issued if you exceed the limit for 30 consecutive seconds (duration). A single brief reading above the threshold does not count. Alerting works the same way: the alignment period controls how often you sample, and the duration controls how long you have to stay above the threshold before the policy actually fires.

Incident creation

When the condition is met for the full duration, Cloud Monitoring creates an incident and sends notifications to your configured channels. Incidents are tracked under Monitoring > Alerting > Incidents. See Incident Response with Monitoring for how to use incidents during an investigation.

Notification channels

Notification channels are configured separately from alerting policies and reused across many policies. This means you can update a Slack webhook token in one place and all policies pointing to that channel pick up the change. Supported channel types: email, Slack, PagerDuty, webhook, and Pub/Sub.

Incident close and auto-close

An incident closes automatically when the condition is no longer true. You can also close it manually. Cloud Monitoring can send a notification when an incident closes, called the “incident closed” notification. Enable it so your team knows when to stand down. If a condition remains true for 7 days without manual intervention, Cloud Monitoring auto-closes the incident and reopens it if the condition is still met.

Severity and documentation

Each alerting policy can have a severity level (critical, error, warning) and a documentation field. Use the documentation field to link to a runbook. A notification that says “p99 latency is above 2000ms, see https://runbook/api-latency for diagnosis steps” is far more useful than one that says “alert fired”. Fill this field in from the start.

Prerequisites

IAM permissions: you need the Monitoring Editor role (roles/monitoring.editor) or Monitoring Admin role (roles/monitoring.admin) to create alerting policies. Monitoring Viewer is read-only.
Metric data must exist: alerting policies can be created before metrics arrive, but you cannot validate the condition or threshold until data is flowing. Deploy your service and send some traffic before configuring production alerts.
Notification channels should be configured first: create channels before policies so you can reference them immediately. The GCP Console lets you create channels inline during policy creation, but doing it separately first makes Terraform easier.
Billing must be enabled: Cloud Monitoring alerting requires an active billing account. The free tier includes a reasonable number of alerting policies for most teams.

Alert types and when to choose each

Metric threshold

What it is: fires when a metric value exceeds or falls below a threshold for the configured duration.

When to use it: for any continuously-emitted numeric metric: error count, latency percentile, CPU utilization, memory usage, request rate.

Example: alert when p99 request latency for your Cloud Run service exceeds 2000ms for 2 consecutive minutes.

Metric absence

What it is: fires when no data is received for a metric within a time window. The trigger is the absence of data, not a high value.

When to use it: for services or jobs that must report regularly. If a batch job normally writes a “job completed” metric every hour and stops doing so, absence alerting catches it. Also useful for detecting when a service has crashed entirely and is no longer emitting any metrics.

Example: alert if a Pub/Sub subscriber stops acknowledging messages and the subscription metric goes silent for 10 minutes.

Log-based metric alert

What it is: combines a log-based metric (a counter or distribution derived from log entries) with a threshold condition.

When to use it: for conditions that are clearest in logs: application error codes, specific warning patterns, or events that do not have a corresponding built-in metric. For example, alerting when a specific exception appears more than 10 times per minute.

Example: create a log-based metric that counts log entries matching severity=ERROR AND jsonPayload.code=“PAYMENT_FAILED”, then create a threshold alert on that metric.

Uptime check alert

What it is: fires when an uptime check fails from a configured number of geographic locations.

When to use it: for external reachability monitoring. Catches DNS failures, expired SSL certificates, load balancer misconfigurations, and complete service outages that internal metrics cannot see. Require at least 2 locations to fail before alerting to avoid false positives from single-location network blips.

Example: alert when your /health endpoint fails from 2 or more of 3 configured locations for 1 minute.

Step-by-step: create an alert in the Google Cloud Console

This walkthrough creates a metric threshold alert. The same flow applies to other condition types. Only the condition configuration step differs.

Step 1: Open alerting

In the GCP Console, navigate to Monitoring > Alerting. Click Create policy.

Step 2: Select a metric

Click Select a metric. Use the search box to find the metric you want to alert on. For a Cloud Run error alert, search for request_count and select the Cloud Run resource type. For a latency alert, search for request_latencies.

After selecting the metric, you can filter the time series. For a Cloud Run service, add a filter: resource.service_name = “your-service-name”. This scopes the alert to one service rather than all Cloud Run services in the project.

Step 3: Configure aggregation

The aggregation section controls the alignment period and how data is reduced across multiple time series. For a latency p99 alert, change the aligner from “mean” to “99th percentile”. For an error count alert, use “rate” to get requests per second. The Cloud Monitoring overview explains how the metrics pipeline works in more detail.

Step 4: Set the threshold condition

Choose whether the condition fires when the value is above or below the threshold. Set the threshold value, for example 2000 for a latency alert measured in milliseconds. Check the metric’s unit before entering a number. This is where beginners most often set the wrong value.

Step 5: Set the duration

This is the retest window. The default is often 0 or 1 minute. For most production alerts, increase this to 2 to 5 minutes to reduce noise. Do not leave it at 0 unless the condition represents a complete outage.

Step 6: Add notification channels

Click Next to move to the notification step. Select existing channels or create new ones. Create at least two channels of different types (for example, email and Slack) so that a single-channel failure does not silence your alerts.

Step 7: Add documentation

In the Alert details section, add a display name and fill in the Documentation field. Include a runbook link or a plain-English description of what to check first. This text is included in the notification.

Step 8: Enable incident closure notifications

In the notification settings, enable “Notify when incident is closed”. This tells your team when the condition has resolved so they know when to stop investigating.

Step 9: Review and save

Click Save policy. The policy is active immediately. It will not fire until metric data arrives and the condition is met for the full duration.

Example: Cloud Run 5xx error ratio alert

Common mistake

Alerting on the raw count or rate of 5xx requests and calling it a “5% error rate” is mathematically wrong. A rate of 10 requests/second might be 10% of traffic at low load, or 0.1% during peak. The number alone tells you nothing without total request volume. To measure a true error ratio, you must divide 5xx requests by total requests using a denominator filter.

Cloud Monitoring’s condition_threshold block supports ratio conditions via denominator_filter. The numerator filter matches 5xx requests. The denominator filter matches all requests. Cloud Monitoring computes the ratio and compares it to the threshold (0.05 = 5%). Both sides use identical aggregations so the division is consistent.

resource "google_monitoring_alert_policy" "error_ratio_high" {
  display_name = "api-service: 5xx error ratio > 5%"
  combiner     = "OR"

  conditions {
    display_name = "5xx error ratio above 5% for 5 minutes"
    condition_threshold {
      # Numerator: rate of 5xx requests for api-service
      filter = join(" AND ", [
        "resource.type=\"cloud_run_revision\"",
        "metric.type=\"run.googleapis.com/request_count\"",
        "resource.labels.service_name=\"api-service\"",
        "metric.labels.response_code_class=\"5xx\""
      ])

      # Denominator: rate of all requests for api-service
      denominator_filter = join(" AND ", [
        "resource.type=\"cloud_run_revision\"",
        "metric.type=\"run.googleapis.com/request_count\"",
        "resource.labels.service_name=\"api-service\""
      ])

      duration        = "300s"  # 5 minutes
      comparison      = "COMPARISON_GT"
      threshold_value = 0.05    # 5% error ratio

      # Sum request rates across all revisions of api-service
      aggregations {
        alignment_period     = "60s"
        per_series_aligner   = "ALIGN_RATE"
        cross_series_reducer = "REDUCE_SUM"
        group_by_fields      = ["resource.labels.service_name"]
      }

      # Same aggregation for denominator so the ratio is consistent
      denominator_aggregations {
        alignment_period     = "60s"
        per_series_aligner   = "ALIGN_RATE"
        cross_series_reducer = "REDUCE_SUM"
        group_by_fields      = ["resource.labels.service_name"]
      }
    }
  }

  documentation {
    content   = "## api-service: 5xx error ratio above 5%\n\nMore than 5% of requests to api-service are returning 5xx errors.\n\nCheck recent deployments and review error logs:\nhttps://your-runbook-url/api-service-errors"
    mime_type = "text/markdown"
  }

  notification_channels = [google_monitoring_notification_channel.email.name]
}

Why 5 minutes? A brief spike during a deployment rollout can temporarily cause 5xx responses even when the deployment ultimately succeeds. A 5-minute duration means the alert only fires if errors persist, which is the signal worth responding to. For a more critical service, shorten this to 2 minutes (120s).

For more on the Cloud Run metrics involved here, see Monitoring Cloud Run.

Example: p99 latency alert

Average latency is misleading. A service where 90% of requests complete in 100ms and 10% take 5000ms will show an average of about 590ms, which looks acceptable. The 10% of users experiencing 5-second responses will not think so. p99 captures the experience of the slowest 1% of requests and is a better proxy for tail-latency problems.

run.googleapis.com/request_latencies is a distribution metric, which means you can query any percentile from it. The ALIGN_PERCENTILE_99 aligner extracts the 99th percentile from each 60-second window.

resource "google_monitoring_alert_policy" "p99_latency_high" {
  display_name = "api-service: p99 latency > 2000ms"
  combiner     = "OR"

  conditions {
    display_name = "p99 request latency above 2000ms for 2 minutes"
    condition_threshold {
      filter = join(" AND ", [
        "resource.type=\"cloud_run_revision\"",
        "metric.type=\"run.googleapis.com/request_latencies\"",
        "resource.labels.service_name=\"api-service\""
      ])
      duration        = "120s"  # 2 minutes
      comparison      = "COMPARISON_GT"
      threshold_value = 2000    # milliseconds

      aggregations {
        alignment_period     = "60s"
        per_series_aligner   = "ALIGN_PERCENTILE_99"
        cross_series_reducer = "REDUCE_PERCENTILE_99"
        group_by_fields      = ["resource.labels.service_name"]
      }
    }
  }

  documentation {
    content   = "## api-service: p99 latency above 2s\n\nThe 99th percentile request latency has exceeded 2000ms.\n\nCheck Cloud Trace for slow requests: https://your-runbook-url/api-service-latency"
    mime_type = "text/markdown"
  }

  notification_channels = [google_monitoring_notification_channel.email.name]
}

Choose your threshold based on your SLO or user-facing requirements. 2000ms is a reasonable starting point for an interactive API. Adjust lower if your service has tighter latency requirements. If this alert fires frequently, use distributed tracing to identify which operations are slow.

Notification channels and routing

Notification channels are configured separately from alerting policies in Monitoring > Alerting > Notification channels. Creating them separately means you can reuse one Slack channel across twenty policies without duplicating configuration.

Channel types

Email: the simplest option. Good for low-urgency alerts and as a fallback channel. Use a team distribution list rather than an individual email address.
Slack: suitable for visibility and team awareness. Not reliable as a sole channel for critical alerts since Slack can be unavailable during infrastructure incidents. Requires one-time OAuth authorization in the GCP Console.
PagerDuty: the right choice for on-call rotation management and escalation. Requires an integration key from your PagerDuty account. Use PagerDuty for critical alerts that require a guaranteed response.
Pub/Sub: sends notifications as JSON messages to a Pub/Sub topic. Use this to build custom notification logic, feed into ticketing systems, or trigger Cloud Functions for auto-remediation.
Webhook: sends an HTTP POST to a URL you control. Useful for integrating with systems not natively supported.

Redundancy best practice

Do not rely on a single notification channel for critical alerts. Configure at least two channels of different types, for example Slack for visibility and PagerDuty for on-call paging. If Slack is down during an incident, PagerDuty will still reach the on-call engineer. Email alone is not sufficient for critical alerts because it may be slow and is easy to miss.

# Create an email notification channel
gcloud beta monitoring channels create \
  --display-name="Team Alerts Email" \
  --type=email \
  --channel-labels=email_address=oncall@example.com

# Create a Slack notification channel
# Note: Slack channels require authorization via the GCP Console first
gcloud beta monitoring channels create \
  --display-name="Slack #incidents" \
  --type=slack \
  --channel-labels=channel_name="#incidents"

# List channels to retrieve their IDs for use in Terraform or policy definitions
gcloud beta monitoring channels list --project=my-app-prod

Note

Slack and PagerDuty channels require one-time authorization in the GCP Console under Monitoring > Alerting > Notification channels before they can receive alerts. The gcloud channel commands are currently under the beta surface.

How to test an alert safely

Creating an alert policy does not guarantee it works. A policy can be syntactically valid but point to the wrong metric, have a threshold that will never be reached, or reference a notification channel with an expired token. Test before you rely on it.

Tip

The fastest way to test a notification channel is the Send test notification button. In the GCP Console, go to Monitoring > Alerting > Notification channels, click the three-dot menu next to a channel, and select it. This sends a sample notification without creating an incident and confirms the channel is reachable and the token is valid.

Validate the condition in non-production

Lower the threshold temporarily in a non-production environment. For example, set the error ratio threshold to 0.001 (0.1%) instead of 0.05 (5%). Then send a few requests that will result in errors and confirm that an incident is created, notifications are sent to the correct channels, and the incident closes automatically when the condition clears.

Confirm the full incident lifecycle

Check that you receive both the incident-opened notification and the incident-closed notification. Confirm that the incident appears in Monitoring > Alerting > Incidents. Verify that the incident links to the correct metric chart and includes the documentation text you added.

Restore the original threshold before going to production

After validating, restore the correct threshold and confirm the policy is saved. A common mistake is to forget to restore the threshold after testing, which results in a permanently noisy alert.

Common beginner mistakes

Alerting on a raw count or rate and calling it an error rate. A rate of 0.05 requests/second is not the same as a 5% error rate. To measure the true error ratio, divide the 5xx count by the total request count using a denominator filter. The examples above show the correct approach.
Setting duration to zero on every alert. A duration of 0 fires the moment the condition is true for even one data point. Brief CPU spikes, deployment rollout errors, and transient network issues are normal. Set a duration of 2 to 5 minutes for most threshold alerts to reduce noise from transient events.
Alerting on averages instead of percentiles. Average latency can look fine while the slowest 10% of users are experiencing severe slowness. Alert on p95 or p99 for latency to catch tail-latency problems that averages hide.
Not enabling incident closure notifications. By default you are notified when an incident opens but not when it resolves. Enable the “incident closed” notification so your team knows when to stand down.
Not adding a runbook link in the documentation field. A notification without context forces the responder to start from scratch. Even a one-sentence description of what to check first, included in the documentation field, saves significant time during an incident.
Relying on a single notification channel. If your only channel is Slack and Slack is unavailable during the same incident your alert is firing for, nobody gets notified. Configure at least two channels of different types.
Not managing policies as code. Console-created policies are hard to audit, reproduce in other environments, or roll back after a mistake. Use google_monitoring_alert_policy in Terraform from the start. You can export an existing policy to JSON with gcloud monitoring policies describe POLICY_ID —format=json as a starting point.
Not validating the policy after creation. A syntactically valid Terraform resource can still point to the wrong metric or have a threshold that will never be reached. Always test the full notification path before relying on a new alert in production.

Metric threshold vs log-based metric vs uptime check

Choosing the wrong alert mechanism is one of the most common configuration errors. Here is a quick guide:

Mechanism	Best for	Limitation
Metric threshold	Continuously-emitted numeric metrics: error ratio, latency, CPU, memory, request rate	Requires a numeric metric to exist. Cannot capture conditions only visible in log content.
Log-based metric alert	Application-level events visible only in logs: specific error codes, payment failures, security events	Adds ingestion cost for log-based metrics. Requires log data to flow consistently. See Log-Based Metrics for setup.
Uptime check alert	External reachability: DNS failures, SSL expiry, load balancer issues, complete outages	Only tests HTTP reachability, not application health. A 200 response with a broken payment flow passes an uptime check. See Uptime Checks for details.
Metric absence	Services or jobs that must report regularly: batch jobs, data pipelines, health metrics	Requires knowing the normal reporting cadence. Can fire during expected maintenance windows.

For most production services, you want all four: a metric threshold for errors and latency, a log-based metric for critical application events, an uptime check for external reachability, and a metric absence alert to detect complete silence.

Frequently asked questions

What is an alerting policy in Cloud Monitoring?

An alerting policy combines a condition (which metric to watch and at what threshold), a duration (how long the condition must hold before firing), and notification channels (where to send the alert). When all three criteria are met, Cloud Monitoring creates an incident and sends notifications. You can create policies in the GCP Console, via gcloud, or with the Terraform resource google_monitoring_alert_policy.

Which alert type should I start with?

Start with a metric threshold alert on request errors and latency. These are the most common alert types, directly correspond to user experience, and Cloud Run and GKE emit the metrics automatically. Add uptime check alerts next, then metric absence alerts for jobs or services that must report regularly.

What is the difference between duration and alignment period?

The alignment period is how often Cloud Monitoring resamples the metric, for example computing the average or rate over each 60-second window. The duration is how long the condition must be continuously true before an incident is created. A 60s alignment period with a 300s duration means the resampled value must exceed the threshold for five consecutive 60-second windows before the alert fires.

Can I manage alerts as code?

Yes. Use the Terraform resource google_monitoring_alert_policy to define policies in code and check them into version control. You can also export an existing policy as JSON with: gcloud monitoring policies describe POLICY_ID --format=json. Managing alerts as code makes them auditable, reproducible across environments, and easy to roll back.

How do I test alerts without causing noise?

Use the "Send test notification" button in the GCP Console to verify that your notification channels are reachable without triggering a real incident. To validate the full condition-to-incident path, lower the threshold temporarily in a non-production environment and confirm that an incident is created, notifications are sent, and the incident closes automatically when the condition clears.

Last verified: 25 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.