AWS Performance Insights for RDS: DB Load & Top SQL

CloudWatch can show you that your RDS instance is at 90% CPU. Performance Insights shows you which SQL query got you there and what it is waiting on. That distinction is what turns a guessing game into a five-minute diagnosis.

AWS Performance Insights in simple terms

Performance Insights is a query monitoring tool built into Amazon RDS and Aurora. Instead of watching server metrics from the outside, it connects to the database engine itself and samples the active session list every second. It records which queries are running, how long they are taking, and why they are not finishing faster.

The tool answers two questions a DBA would normally need direct database access to answer: how loaded is this database right now? and which queries are responsible? Performance Insights puts that on a dashboard any engineer can read without touching the database directly.

🍳

Analogy

CloudWatch is the camera on the restaurant car park. It tells you the lot is full and the kitchen is busy. Performance Insights is the camera inside the kitchen. It tells you the grill is the bottleneck, three orders of the fish dish are backed up, and one cook is taking twice as long as everyone else. Completely different information.

Why teams use it

Performance Insights is most valuable when standard CloudWatch metrics show a problem but cannot explain it:

  • CPU spikes with no obvious query cause. CloudWatch shows 90% CPU. Performance Insights shows the single reporting query responsible for 70% of that load.
  • Lock contention slowing writes. CloudWatch shows high latency. Performance Insights shows dozens of sessions queued behind a long-running transaction holding a table lock.
  • Slow response times during peak load. DB load climbs above the Max vCPU line during business hours, revealing the database is saturated, not just busy.
  • Query regression after a deploy. A new feature ships and response times spike. Top SQL immediately surfaces the query introduced by the deploy.
  • Performance validation after a migration. You moved to PostgreSQL on RDS and need to verify which queries are running slower than expected on the new engine.
When CloudWatch alone is not enough

CloudWatch CPU percentage does not tell you which query is consuming it. A database can also be slow without high CPU at all, when sessions are blocked on locks or waiting for disk I/O. If you are using CPU % as your only signal for database health, you will miss the majority of real query problems.

CloudWatch alone is sufficient when the problem is clearly infrastructure: disk space running low, connection limits being hit, replication lag building up. If you already know what is slow and just need an alert for when it gets worse, a CloudWatch alarm works fine. When you need to know why something is slow, Performance Insights is the right tool.

How Performance Insights works

Every second, Performance Insights queries the database engine’s internal session table. This is the same data a DBA would read manually with pg_stat_activity on PostgreSQL or performance_schema on MySQL. It records every active session: the SQL digest, the current wait event, the user, and the execution time so far.

These samples are aggregated into one-minute buckets and stored in the Performance Insights backend, separate from your RDS instance. The free 7-day retention window covers 10,080 minutes of history. You can extend this to 24 months as a paid add-on.

Because the sampling happens inside the engine rather than at the OS or network layer, Performance Insights can attribute load to specific SQL digests, even across thousands of concurrent connections. This is what separates it from the instance-level metrics you get from the CloudWatch agent.

Safe to run in production

Performance Insights adds a small overhead, typically under 1% CPU. AWS designed it to be on at all times, not just during incident response. Enable it when you create the instance and leave it running.

Understanding DB load and AAS

The central metric in Performance Insights is DB load, measured in Average Active Sessions (AAS). AAS counts the number of database sessions actively executing or waiting at any sampled moment during a one-minute window. A value of 4 means that, on average, 4 sessions were doing work every second in that minute.

🚦

Think of it like a highway

The Max vCPU line is the number of lanes on the highway. DB load is the current number of cars. When cars equal the number of lanes, traffic is flowing well. When cars double the lane count, everything backs up and slows down, even if each individual car is not doing anything wrong.

Reading the three states:

DB load below Max vCPU

The database is healthy. Sessions are being processed without significant queuing. No action needed.

DB load near Max vCPU

The database is well-utilized. Monitor for trends. Check which wait events are dominant so you have context if it tips over.

DB load consistently above Max vCPU

Sessions are queuing. More work is arriving than the database can process simultaneously. Open the Top SQL table immediately and find the dominant query.

A DB load of 8 on a db.r6g.4xlarge (16 vCPUs) is unremarkable. A DB load of 8 on a db.t3.micro (2 vCPUs) means every session is waiting for four others to clear first. The Max vCPU line makes that context immediate without having to remember the instance size.

Wait events: what the database is waiting on

DB load is broken down by wait events, the reasons why sessions are not making progress. The colored bands stacked in the DB load chart each represent a wait event category. Hovering over a band shows which event it is and how much load it accounts for.

The most common wait events and what they mean:

  • CPU: Sessions are actively running on CPU. High CPU wait with no other events usually means queries need index optimization or are scanning too many rows.

  • IO:DataFileRead (PostgreSQL) or io/file_io (MySQL): Sessions are waiting for data to be read from disk. Typically means the buffer pool is undersized for the working data set, or queries are doing full table scans where index scans would be faster.

  • Lock:Relation (PostgreSQL) or synch/mutex/innodb (MySQL): Sessions are blocked waiting for a lock held by another session. Common with long-running transactions or missing row-level locking. Symptoms often surface first as connection timeouts in the application before you open Performance Insights.

  • Client:ClientRead: The database is waiting for the client application to send more data. Often caused by slow application code between database calls, or a connection pool holding idle connections open.

  • idle: Sessions are open but not executing a query. High idle counts often mean too many connections are pooled at the application layer. Visible in DB load but not a database-side problem.

Diagnosing a spike

Identify the dominant wait event first, then move to Top SQL to find the specific query responsible. The wait event tells you the category of the problem (CPU, I/O, locks). The Top SQL table tells you which query to fix.

Top SQL: how to find the expensive query

Below the DB load chart, the Performance Insights dashboard shows the Top SQL table. It lists every distinct query digest contributing to DB load during the selected time window, sorted by average active sessions.

Each row in the table shows:

  • SQL text: The normalized query with literals replaced by placeholders, so WHERE id = 42 and WHERE id = 99 are grouped as one query.
  • % of DB load: The fraction of total database work this query accounts for in the selected period.
  • Average latency per call: Average execution time per query invocation.
  • Calls per second: How frequently the query executes.
  • Rows per call: Average rows returned or scanned per execution.

A high-frequency query (10,000 calls/sec at 0.5ms each) and a slow query (1 call/min at 30 seconds each) can both appear at the top of the list for completely different reasons. Always look at latency and call frequency together, not just total DB load contribution.

Click the query row

Click any row to see its wait event breakdown and execution plan. A query scanning millions of rows to return ten results is a strong signal that an index is missing. Use the execution plan view to confirm before adding the index.

The table also has tabs for Top waits, Top hosts, Top users, and Top databases. If multiple applications share one RDS instance, the Top users or Top databases tabs help isolate which workload is causing the load.

Performance Insights vs CloudWatch vs Database Insights

AWS now offers three layers of RDS observability. Understanding what each one does helps you pick the right tool for the question at hand.

FeatureCloudWatchPerformance InsightsDatabase Insights
What it monitorsInstance infrastructure (CPU %, IOPS, storage, connections)Query-level activity inside the database engineOS metrics, query metrics, and fleet-level views
Best forAlerting on thresholds, capacity planning, uptime trackingFinding slow queries, diagnosing lock contention, post-deploy checksTeams managing many production databases at scale
Query visibilityNoneFull SQL digest, wait events, Top SQL tableFull SQL digest plus OS and application-level context
CostIncluded with RDS; custom metrics are extraFree for 7-day retention; paid up to 24 monthsPremium add-on, priced per vCPU per hour
Anomaly detectionCloudWatch Anomaly Detection on alarmsNoneBuilt-in automated anomaly detection
Typical userAll teams using RDSDevelopers and DBAs troubleshooting query performancePlatform teams managing fleet-level observability
The right setup for most teams

Use CloudWatch alerts for infrastructure thresholds and Performance Insights always on for query-level investigation. Database Insights is worth evaluating if you manage dozens of RDS instances and need centralized fleet visibility with automated anomaly detection.

How to enable it on RDS

Performance Insights can be enabled at instance creation or on an existing instance. Most modern instance classes support enabling without a restart. The AWS console will warn you if a restart is required for your specific instance.

# Enable at instance creation
aws rds create-db-instance \
  --db-instance-identifier my-postgres \
  --db-instance-class db.t3.medium \
  --engine postgres \
  --master-username admin \
  --master-user-password changeme123 \
  --enable-performance-insights \
  --performance-insights-retention-period 7

# Enable on an existing instance
aws rds modify-db-instance \
  --db-instance-identifier my-existing-postgres \
  --enable-performance-insights \
  --performance-insights-retention-period 7 \
  --apply-immediately

Verify that Performance Insights is active:

aws rds describe-db-instances \
  --db-instance-identifier my-postgres \
  --query 'DBInstances[0].{PIEnabled:PerformanceInsightsEnabled,RetentionPeriod:PerformanceInsightsRetentionPeriod}'

To query Performance Insights data programmatically, useful for building custom dashboards or automated alerting on query load:

# Step 1: get the DBI resource ID (different from the instance identifier)
aws rds describe-db-instances \
  --db-instance-identifier my-postgres \
  --query 'DBInstances[0].DbiResourceId'

# Step 2: query DB load broken down by wait event for the last hour
aws pi get-resource-metrics \
  --service-type RDS \
  --identifier db-ABCDEFGHIJKLMNOPQRSTUVWX \
  --metric-queries '[{"Metric":"db.load.avg","GroupBy":{"Group":"db.wait_event","Limit":5}}]' \
  --start-time $(date -u -d '1 hour ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period-in-seconds 60

The GroupBy parameter lets you slice DB load by wait event, SQL query, user, host, or database name. Performance Insights is available for all major RDS engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and both Aurora variants.

Step-by-step: investigate a slow database

This workflow covers the most common scenario: users are reporting slow responses and you suspect the database. Walk through these steps in order before drawing conclusions or making changes.

  1. Confirm the time window. Get a specific time range from your application logs or user reports. Performance Insights lets you zoom into any 5-minute window. The more precise the window, the easier it is to isolate the signal from background noise.

  2. Open the DB load chart. In the RDS console, select your instance and open Performance Insights. Zoom into the time window. Look for the peak in the DB load chart.

  3. Compare against Max vCPU. Was DB load above or below the Max vCPU line during the incident? If DB load was low, the database is unlikely to be the primary bottleneck. Consider checking the application layer or production debugging approaches instead.

  4. Identify the dominant wait event. Look at the colored bands in the DB load chart. Which color is largest during the peak? This tells you the category: CPU-bound, I/O-bound, lock-bound, or client-bound.

  5. Find the query in Top SQL. Scroll to the Top SQL table. Sort by DB load contribution. The top row is almost always the highest-impact starting point. Click it to see the query’s individual wait event breakdown.

  6. Check the execution plan. For PostgreSQL on RDS, run EXPLAIN ANALYZE on the query to see whether it is doing a sequential scan where an index scan would be faster. For MySQL on RDS, use EXPLAIN FORMAT=JSON. A query scanning millions of rows for a 10-row result set is a near-certain index problem.

  7. Decide and act. Based on the wait event and execution plan, decide whether to tune the query, add an index, increase the buffer pool, or scale the instance. Make one change at a time and watch the DB load chart before and after to confirm improvement.

Confirm the fix worked

After making a change, drag the Performance Insights time range to cover both before and after. A clear reduction in the DB load spike confirms the fix worked. If the spike is unchanged, move to the next query in Top SQL.

Common mistakes

  1. Not enabling it until there is already a problem. Performance Insights needs historical data to be useful for diagnosis. If you only enable it during an incident, you are looking at real-time data with no baseline to compare against. Enable it when you create the instance.

  2. Using CloudWatch CPU as the only database health signal. A database can be slow without high CPU. Lock contention and I/O waits do not show up in CPU percentage. Performance Insights surfaces both; CloudWatch does not.

  3. Optimizing the second-most expensive query first. Sort the Top SQL table by DB load contribution and start with the top row. Fixing the third query while the first query still dominates the chart is wasted effort.

  4. Ignoring the Max vCPU line context. A DB load of 8 on a db.r6g.4xlarge (16 vCPUs) is unremarkable. The same value on a db.t3.micro (2 vCPUs) means sessions are queuing four-deep. DB load only makes sense relative to the instance size.

  5. Confusing DB load with CPU percentage. A DB load equal to the vCPU count does not mean CPU is at 100%. Sessions are also counted while waiting for I/O or locks. Check the wait event breakdown before assuming CPU is the constraint.

Frequently asked questions

What is AWS Performance Insights?

Performance Insights is a query monitoring feature built into Amazon RDS and Aurora. It samples active database sessions every second to show how many queries are running or waiting, what they are waiting for, and which SQL statements are responsible for the most load. It gives you query-level visibility that CloudWatch infrastructure metrics cannot provide on their own.

What is DB load in Performance Insights?

DB load is the average number of active sessions (AAS) in the database at any given moment, which is essentially the count of queries currently in flight. A DB load below the instance vCPU count means the database is handling requests without queuing. When DB load consistently exceeds the vCPU line, sessions are stacking up and the database is saturated.

Does Performance Insights replace CloudWatch?

No. They solve different problems. CloudWatch monitors infrastructure: CPU percentage, disk I/O, free storage, and connection count. Performance Insights monitors query behavior: which SQL statements are running, how long they take, and what they are waiting for. You need both. CloudWatch tells you a problem exists; Performance Insights tells you which query is causing it.

What is Amazon CloudWatch Database Insights?

Database Insights (launched 2024) is a premium observability tier within CloudWatch that layers on top of Performance Insights. It adds OS-level metrics, fleet views across multiple RDS instances, automated anomaly detection, and longer data retention. Standard Performance Insights remains the built-in tool for single-instance query monitoring. Database Insights is aimed at teams managing many production databases at scale.

When should I enable Performance Insights on RDS?

Enable it when you create the instance, not after a problem appears. The 7-day free retention tier costs nothing on most instance classes, and having historical data before a slowdown happens is what makes the tool useful for diagnosis. Enabling it after an incident means you are investigating without a baseline.

Last verified: 3 May 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.