How to Create CloudWatch Alarms in AWS (Console + CLI Examples)

A CloudWatch alarm watches a metric and takes action when it crosses a threshold you set. Without alarms, you have to watch dashboards manually and hope someone notices a problem before users report it. With well-configured alarms, CloudWatch pages you automatically when something goes wrong. This guide covers how to create CloudWatch alarms in the AWS Console and with the CLI, how to configure SNS notifications, how to choose thresholds that won’t flood you with false positives, and the mistakes that make alarms unreliable.

Simple explanation

Before creating an alarm, it helps to understand the difference between three related CloudWatch concepts:

  • A metric is a time-series measurement. CPU usage at 2:00 PM, 2:05 PM, 2:10 PM. It is just data. CloudWatch collects hundreds of metrics automatically from AWS services with no setup required.
  • A dashboard visualizes those metrics in charts. CloudWatch Dashboards are useful for understanding trends, but someone still has to look at them.
  • An alarm watches a metric and does something when the value crosses a line you draw. No one has to be watching. The alarm fires on its own.

How CloudWatch alarms work

When you create an alarm, you define a chain of components that determine when and how it fires:

  • Metric — which specific measurement to watch (for example, CPUUtilization for a specific EC2 instance)
  • Statistic — how to aggregate data points within each period: Average, Sum, Minimum, Maximum, SampleCount, or a percentile like p99
  • Period — the time window for each data point, in seconds (60 = 1 minute, 300 = 5 minutes)
  • Threshold — the numeric value and comparison operator (greater than 80%, less than 10 GB, and so on)
  • Evaluation periods — how many consecutive data points must breach the threshold before the alarm fires; higher values reduce false positives from brief spikes
  • Datapoints to alarm (M of N) — a flexible variant: M out of the last N periods must breach the threshold. Setting 3 out of 5 fires the alarm if 3 of the last 5 data points were over the threshold, even if they weren’t all consecutive.
  • Missing data treatment — what to do when no data arrives for a period (covered below)
  • Actions — what happens when the alarm transitions to a new state

The three alarm states

StateMeaningWhen it occurs
OKMetric is within the acceptable thresholdNormal operation; no threshold breach detected
ALARMThreshold has been breachedMetric exceeded (or fell below) the threshold for the required evaluation periods
INSUFFICIENT_DATANot enough data to evaluateNew alarm before the first period completes, metric stopped reporting, or instance was stopped

Actions fire on state transitions, not while an alarm stays in a state. To be notified when an incident starts and again when it resolves, configure actions for both the ALARM state and the OK state.

Note

M-of-N evaluation catches intermittent problems. If you set evaluation periods to 5 and datapoints to alarm to 3, the alarm fires when 3 of the last 5 data points breach the threshold, even if they were not consecutive. This catches recurring spikes that a strictly consecutive check might miss, while still ignoring a single isolated blip.

Missing data treatment

When no metric data arrives during a period, CloudWatch needs to know how to handle the gap. The four options are:

  • notBreaching — treat the missing period as within threshold. Good for sparse metrics like Lambda invocations that genuinely have quiet periods.
  • breaching — treat the missing period as a threshold violation. Use this when absence of data is itself a problem, such as an EC2 health check that should always be reporting.
  • ignore — keep the current alarm state unchanged during the gap.
  • missing — the alarm transitions to INSUFFICIENT_DATA.
Watch out

Choosing notBreaching for a metric that should always be reporting means the alarm goes silent if the data source disappears. For things like health checks and heartbeat metrics, set this to breaching so the alarm fires when data stops arriving.

When to use CloudWatch alarms

Almost every production AWS workload needs alarms. These are the most common scenarios and what each alarm is protecting against:

  • EC2 high CPU or status checks — sustained CPU above 80% may indicate a runaway process or an undersized instance. A failing status check means the instance or underlying hardware is unhealthy and needs immediate attention.
  • Lambda errors or throttles — any errors in a function that should be error-free, or throttles showing you’ve hit the concurrency limit and requests are being dropped. See Monitoring Lambda in AWS for function-level alerting strategies.
  • RDS low storage or connection pressure — storage exhaustion stops a database instance with no warning. Connection pressure approaching max_connections causes client errors as new connections are rejected.
  • ALB 5xx spikes — a spike in 5xx responses from a load balancer means backend instances are returning errors. Alarm separately on ALB-level 5xx and target-level 5xx to distinguish load balancer issues from application issues.
  • SQS queue backlog — if the age of the oldest message climbs, your consumer is falling behind. This usually means a consumer is failing silently or scaling hasn’t kicked in.
  • Log-based patterns — you can create log-based metrics from CloudWatch Logs and alarm on them, which is useful for custom application errors that don’t map to a standard AWS metric.

Choose the right alarm type

TypeHow it worksBest for
Metric alarmCompares a metric (or metric math expression) against a fixed threshold you defineMost alarms. Use this by default: CPU > 80%, errors > 0, storage < 10 GB.
Composite alarmCombines the states of multiple metric alarms using AND, OR, NOT logicReducing noise when a single metric spiking isn’t enough to confirm a real incident. Also useful for suppressing child alarms during maintenance windows.
Anomaly detection alarmLearns the metric’s expected range from historical data and fires when it deviates significantly from the predicted bandMetrics with variable patterns: web traffic that spikes on weekdays, batch jobs with fluctuating volume, anything where a fixed threshold would generate constant false positives.
Tip

Start with metric alarms for everything you can express as a fixed threshold. Add composite alarms once you find yourself getting paged on single-signal spikes that resolve on their own. Switch to anomaly detection when traffic patterns vary enough that no stable fixed threshold exists.

How to create a CloudWatch alarm in the AWS Console

The following steps create a metric alarm. If you’re new to CloudWatch, the CloudWatch overview explains how alarms fit into the broader service.

  1. Open CloudWatch. In the AWS Console, search for CloudWatch in the top search bar and open the service. Confirm you’re in the correct region. Alarms are region-scoped and don’t cross regions.

  2. Navigate to Alarms. In the left sidebar, under Alarms, click All alarms. Then click Create alarm.

  3. Select a metric. Click Select metric. You’ll see namespaces for every AWS service that reports to CloudWatch. Browse to the service you want (for example, EC2 > Per-Instance Metrics) or use the search box to find a specific metric. Select the metric row and click Select metric.

  4. Configure the metric and conditions. You’re now on the “Specify metric and conditions” screen.

    • Under Metric: choose your statistic (Average is appropriate for most continuous metrics; Sum is better for error counts and event-based metrics) and your period (5 minutes is a sensible default; 1 minute gives faster detection at higher cost).
    • Under Conditions: choose Static threshold type (or Anomaly detection if you want a learned band). Set the comparison operator and enter the threshold value.
  5. Set evaluation logic. Expand Additional configuration:

    • Set Datapoints to alarm. For example, “2 out of 3” means two of the last three periods must breach the threshold before the alarm fires. This avoids false positives from brief spikes.
    • Set Missing data treatment. For most metrics, Treat missing data as missing (transitions to INSUFFICIENT_DATA) is a safe default. Choose Treat missing data as bad if absence of the metric is itself an alert condition.

    Click Next.

  6. Configure actions. Under Notification, select the alarm state that triggers the action (In alarm). Choose an existing SNS topic or create a new one. To also receive a recovery notification, click Add notification, choose the OK state, and select the same topic. You can also add EC2 actions (reboot, stop, recover), Auto Scaling policies, or Lambda invocations from this screen. Click Next.

  7. Name the alarm. Give the alarm a descriptive name that includes the service, metric, and environment. For example: EC2-HighCPU-prod-web-01. A clear name makes it obvious what’s wrong when the alarm fires at 2 AM. Add an optional description. Click Next.

  8. Review and create. Check the alarm configuration on the preview screen. Click Create alarm. The alarm will start in INSUFFICIENT_DATA state until the first evaluation period completes.

Important

Confirm your SNS subscription before moving on. If you created a new SNS topic with an email subscription, AWS sends a confirmation email immediately. The subscription stays inactive until you click the confirmation link. Alarms will appear to work in the console but no notification will be delivered. Check your spam folder if the email doesn’t arrive.

How to create a CloudWatch alarm with the AWS CLI

The CLI is ideal for scripting alarm creation, managing alarms across many resources at once, or wiring alarm setup into infrastructure automation. All examples use put-metric-alarm.

EC2 high CPU alarm

This fires if average CPU utilization stays above 80% for two consecutive 5-minute periods. Ten minutes of sustained high CPU is a more reliable signal than a single spike, and two evaluation periods prevents false positives from brief bursts.

aws cloudwatch put-metric-alarm \
  --alarm-name "EC2-HighCPU-prod-web-01" \
  --alarm-description "CPU utilization above 80% for 10 minutes" \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123def456789 \
  --statistic Average \
  --period 300 \
  --evaluation-periods 2 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --treat-missing-data breaching \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:production-alerts \
  --ok-actions arn:aws:sns:us-east-1:123456789012:production-alerts

The —ok-actions flag points to the same SNS topic so you’re notified when CPU recovers, not just when it spikes. Setting —treat-missing-data breaching means if the instance stops reporting (for example, because it was terminated unexpectedly), the alarm fires rather than going silent.

Lambda error rate alarm using metric math

Lambda’s raw Errors metric counts errors, but the rate matters more when invocation volume varies. This example uses metric math to compute errors divided by invocations and fires when the error rate exceeds 5%. See Monitoring Lambda in AWS for more Lambda-specific alerting patterns.

aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-HighErrorRate-process-orders" \
  --alarm-description "Lambda error rate above 5%" \
  --metrics '[
    {
      "Id": "errors",
      "MetricStat": {
        "Metric": {
          "Namespace": "AWS/Lambda",
          "MetricName": "Errors",
          "Dimensions": [{"Name": "FunctionName", "Value": "process-orders"}]
        },
        "Period": 300,
        "Stat": "Sum"
      },
      "ReturnData": false
    },
    {
      "Id": "invocations",
      "MetricStat": {
        "Metric": {
          "Namespace": "AWS/Lambda",
          "MetricName": "Invocations",
          "Dimensions": [{"Name": "FunctionName", "Value": "process-orders"}]
        },
        "Period": 300,
        "Stat": "Sum"
      },
      "ReturnData": false
    },
    {
      "Id": "error_rate",
      "Expression": "errors / invocations * 100",
      "Label": "Error Rate",
      "ReturnData": true
    }
  ]' \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:production-alerts

RDS low free storage alarm

RDS reports FreeStorageSpace in bytes, not gigabytes. 10 GB = 10,737,418,240 bytes. Always check a metric’s unit before setting a threshold. Getting it wrong silently creates an alarm that never fires.

aws cloudwatch put-metric-alarm \
  --alarm-name "RDS-LowFreeStorage-prod-postgres" \
  --alarm-description "RDS free storage below 10 GB" \
  --namespace AWS/RDS \
  --metric-name FreeStorageSpace \
  --dimensions Name=DBInstanceIdentifier,Value=prod-postgres \
  --statistic Average \
  --period 300 \
  --evaluation-periods 3 \
  --threshold 10737418240 \
  --comparison-operator LessThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:production-alerts
Unit trap

RDS, EBS, and several other services report storage metrics in bytes. Setting a threshold of 10 thinking it means 10 GB actually means 10 bytes. The alarm will never fire. Always check the CloudWatch metric documentation for the unit before writing the threshold value.

Testing alarms before relying on them

Use set-alarm-state to force an alarm into any state without affecting real metrics. This verifies that your SNS notification path works end to end before an actual incident.

# Force the alarm into ALARM state to test the SNS notification
aws cloudwatch set-alarm-state \
  --alarm-name "EC2-HighCPU-prod-web-01" \
  --state-value ALARM \
  --state-reason "Testing alarm notification path"

# Reset it back to OK when done
aws cloudwatch set-alarm-state \
  --alarm-name "EC2-HighCPU-prod-web-01" \
  --state-value OK \
  --state-reason "Test complete"

# List all alarms currently in ALARM state
aws cloudwatch describe-alarms \
  --state-value ALARM \
  --query 'MetricAlarms[*].{Name:AlarmName,Metric:MetricName,State:StateValue}' \
  --output table

# Delete an alarm
aws cloudwatch delete-alarms \
  --alarm-names "EC2-HighCPU-prod-web-01"

Setting up notifications with SNS

CloudWatch alarms deliver notifications through Amazon SNS (Simple Notification Service). SNS acts as a pub/sub fanout layer: you create a topic, subscribe one or more endpoints to it, and then point alarms at the topic ARN. When an alarm fires, SNS delivers the notification to every active subscription on the topic.

# Create an SNS topic for alarm notifications
aws sns create-topic --name production-alerts \
  --query 'TopicArn' --output text
# Returns: arn:aws:sns:us-east-1:123456789012:production-alerts

# Subscribe an email address to the topic
aws sns subscribe \
  --topic-arn arn:aws:sns:us-east-1:123456789012:production-alerts \
  --protocol email \
  --notification-endpoint your-email@example.com
# You will receive a confirmation email — click the link to activate

# Subscribe a webhook (for Slack, PagerDuty, or OpsGenie)
aws sns subscribe \
  --topic-arn arn:aws:sns:us-east-1:123456789012:production-alerts \
  --protocol https \
  --notification-endpoint https://hooks.example.com/your/webhook/url

Email subscriptions work well for low-urgency or informational alarms. For production incidents that need immediate human response, connect SNS to a tool like PagerDuty, OpsGenie, or a dedicated Slack channel via HTTPS subscription. The SNS messaging model page covers topic configuration and subscription filtering in more detail.

Easy to miss

SNS email subscriptions don’t activate automatically. AWS sends a confirmation email as soon as you subscribe. Until you click the confirmation link, the subscription is pending and all alarm notifications are silently dropped. Alarms will show as firing in the console, but nothing is delivered. Always verify the subscription is confirmed (not pending) before treating any alarm as production-ready.

Composite alarms: reducing alert noise

A composite alarm combines the states of multiple metric alarms using Boolean expressions (AND, OR, NOT). The most common use case: don’t page anyone when a single metric spikes in isolation. Require two or more related signals to be in ALARM simultaneously before notifying.

# Component alarm 1: Lambda errors above threshold
aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-HighErrors" \
  --namespace AWS/Lambda \
  --metric-name Errors \
  --dimensions Name=FunctionName,Value=process-orders \
  --statistic Sum \
  --period 300 \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1

# Component alarm 2: Lambda p99 duration above threshold
aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-HighDuration" \
  --namespace AWS/Lambda \
  --metric-name Duration \
  --dimensions Name=FunctionName,Value=process-orders \
  --statistic p99 \
  --period 300 \
  --threshold 8000 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1

# Composite alarm: fires only when BOTH component alarms are in ALARM state
aws cloudwatch put-composite-alarm \
  --alarm-name "Lambda-DegradedService" \
  --alarm-rule "ALARM(\"Lambda-HighErrors\") AND ALARM(\"Lambda-HighDuration\")" \
  --alarm-description "Both high errors and high duration — service is degraded" \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:production-alerts

Composite alarms can also suppress child alarms during a known outage or maintenance window. When you set the parent alarm to ALARM state manually, it can be configured to prevent its children from firing separately, which avoids an alert storm when you already know something is wrong. This pattern is covered in the incident response with monitoring guide.

Choosing thresholds without creating noisy alerts

The most common complaint about CloudWatch alarms is that they’re either too noisy or miss real incidents. Both problems usually trace back to thresholds set without looking at actual baseline behavior.

Start with observed behavior

Before setting a threshold, look at your metric’s history in CloudWatch Dashboards. Find the normal range — not just the average, but also peak behavior during traffic spikes and deployments. Set your threshold where you would genuinely want to be paged, not at a round number that happens to be above average.

Tip

A good rule of thumb: watch a new service’s metrics for a full week before setting alarm thresholds. You want to see at least one weekday peak, one weekend, and ideally one deployment cycle. Thresholds set from a single hour of data are almost always wrong.

Period length affects both speed and noise

A 1-minute period detects problems faster than a 5-minute period, but it also makes alarms more sensitive to brief spikes. A metric that naturally jitters between 75% and 85% CPU will generate constant false positives with a 1-minute period and an 80% threshold. The same alarm with a 5-minute average period fires only during a sustained problem, which is usually what you actually want.

Use evaluation periods to absorb transient spikes

A single evaluation period with a tight threshold is appropriate only when you need zero tolerance. StatusCheckFailed > 0 is a good example: any failure is immediately serious and should fire without delay. For metrics that naturally vary, use 2 or 3 consecutive evaluation periods to require sustained bad behavior before the alarm fires.

When patterns vary, use anomaly detection

If your application traffic varies significantly by time of day or day of week, a fixed threshold will either be too tight during peak hours or too loose during quiet ones. Anomaly detection alarms learn the expected range from historical patterns and fire when the metric deviates beyond a configurable band, with no manual threshold tuning required.

Combine signals to reduce single-metric noise

If CPU alone fires multiple times per week without a real incident behind it, combine it with another metric. A composite alarm requiring both high CPU and elevated 5xx errors is almost always pointing at a real problem. CPU alone rarely is.

ServiceMetricSuggested starting thresholdNotes
EC2CPUUtilization> 80% for 10 min2 × 5-minute periods; adjust down for latency-sensitive apps
EC2StatusCheckFailed> 0 for 1 minAny failure here is serious; act immediately
LambdaErrors (Sum)> 0 in 5 minFor functions that should be error-free; use error rate for high-volume functions
LambdaThrottles> 0 in 5 minAny throttle means requests are being dropped; check concurrency limits
LambdaDuration> 80% of timeoutFunctions close to the timeout are at risk of timing out; optimize or increase the limit
RDSFreeStorageSpace< 20% of totalStorage exhaustion stops the instance abruptly; give yourself lead time to respond
RDSDatabaseConnections> 80% of max_connectionsConnection exhaustion causes client errors; check for connection leaks
ALBHTTPCode_ELB_5XX_Count> 10 in 5 minLoad balancer-level errors; alarm separately from target-level 5xx
SQSApproximateAgeOfOldestMessage> 300 secondsConsumer falling behind or failing silently; investigate consumers first

Common mistakes

  1. Using 1 evaluation period for noisy metrics. One data point above the threshold is enough to fire the alarm. For metrics that naturally vary (like CPU on a busy application server), this generates false positives constantly. Use evaluation-periods=2 or 3. Reserve 1 for metrics where any breach is unacceptable, like StatusCheckFailed.
  2. Not configuring OK actions. If you only configure an ALARM action, you get paged when something breaks but never notified when it recovers. Your on-call engineer has to keep manually checking the console. Add an OK action to the same SNS topic so recovery is communicated automatically.
  3. Forgetting to confirm SNS email subscriptions. SNS sends a confirmation email immediately after you create a subscription. Until you click the link, the subscription is inactive and alarm notifications are silently dropped. Always confirm subscriptions before treating an alarm as production-ready, and check your spam folder.
  4. Setting the wrong missing-data treatment. Choosing notBreaching for a metric that should always be reporting means the alarm goes quiet when the data source disappears. Think carefully about what absence of data means for each specific metric before accepting the default.
  5. Setting thresholds with no baseline. Picking 80% CPU because it sounds reasonable, without checking that your service normally runs at 75%, guarantees constant false positives. Observe metrics in CloudWatch Dashboards for at least a few days before fixing thresholds.
  6. Alarming on one noisy signal only. A single CPU or memory metric is rarely enough to confirm a real incident. Combine signals using composite alarms, or use anomaly detection for metrics with variable baselines, to reduce noise without sacrificing coverage.
  7. Never testing the alarm path. Creating an alarm and assuming it works is not the same as verifying it. Use set-alarm-state to force the alarm into ALARM state and confirm the notification arrives, routes to the right person, and is actionable. An alarm that silently fails to deliver is worse than no alarm at all.

Frequently asked questions

What is a CloudWatch alarm?

A CloudWatch alarm watches a single metric over a specified time period and changes state when the metric crosses a threshold you define. When an alarm enters the ALARM state, it can trigger actions such as sending an SNS notification, triggering Auto Scaling, stopping or rebooting an EC2 instance, or invoking a Lambda function.

What are the three CloudWatch alarm states?

OK means the metric is within the defined threshold. ALARM means the threshold has been breached for the required evaluation periods. INSUFFICIENT_DATA means CloudWatch does not have enough data points to evaluate. This is the starting state for a new alarm and also occurs when a metric stops reporting, such as when an EC2 instance is stopped.

What is the difference between a metric alarm and a composite alarm?

A metric alarm watches a single CloudWatch metric and fires when it crosses a fixed threshold. A composite alarm combines the states of multiple metric alarms using Boolean logic (AND, OR, NOT). Composite alarms help reduce noise by firing only when both high CPU and high error rate are in ALARM state simultaneously, rather than when either one spikes independently.

How often does CloudWatch evaluate alarms?

CloudWatch evaluates alarms at the end of every period you define. If your period is 5 minutes (300 seconds), the alarm is re-evaluated every 5 minutes. High-resolution alarms using 10-second or 30-second periods are evaluated at that frequency but incur higher costs.

What happens when CloudWatch data is missing?

You control this with the treat-missing-data setting. notBreaching treats missing data as within threshold, which is good for sparse metrics with genuine quiet periods. breaching treats missing data as a threshold violation, which is useful when absence of data is itself a problem. ignore keeps the current alarm state unchanged. missing transitions the alarm to INSUFFICIENT_DATA.

Last verified: 3 May 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.