How to Monitor Cloud Run: Metrics, Logs, Alerts, Dashboards, and SLOs

Cloud Run handles infrastructure, scaling, and deployment for you. What it does not do is alert you when your service is slow, throwing errors, or about to run out of memory. Cloud Run emits a complete set of metrics automatically and sends all container logs to Cloud Logging with no agent required. Getting useful monitoring out of it is about knowing which signals matter and how to act on them.

This page covers what Cloud Run monitoring means in practice: which metrics to watch, how to build alerts and dashboards, how to define SLOs, and how to investigate a production incident step by step. Use it when setting up monitoring for the first time, or when debugging a Cloud Run problem and not sure where to start.

Simple explanation

Monitoring a Cloud Run service means tracking four things: whether requests are succeeding, how fast they are, whether containers are healthy, and whether the service is scaling the way you expect.

Cloud Run monitoring differs from VM monitoring in a few important ways. Because Cloud Run scales automatically, including down to zero, you do not watch a fixed set of machines. You watch request patterns and container health across a variable number of instances, any of which may have just cold-started. Revisions add another layer: a single service can run multiple code versions simultaneously if you are doing a canary or blue/green deployment, which means a problem may only show up in a subset of your metrics unless you filter by revision.

Here is what each monitoring layer covers:

Metrics: numeric signals like request count, latency, memory utilization, and instance count. Useful for trending, alerting, and dashboards.
Logs: the stdout and stderr output from your containers, automatically captured by Cloud Logging. Useful for seeing exactly what happened in a specific request or at a specific time.
Traces: per-request timing data showing where a request spent its time, across multiple services if needed. Useful when latency is the problem and you need to find the slow span.
Profiles: CPU and memory usage broken down by function. Useful when resource utilization is high and you need to find which code is responsible.
Dashboards: visual displays of your key metrics in one place. Useful for ongoing visibility and during incidents.
Uptime checks: external probes that confirm your service endpoint is reachable. Useful as a safety net during low-traffic windows.

All metrics and logs are built in. Traces require your application to instrument spans (or use Cloud Trace’s automatic instrumentation for some frameworks). Profiles require the Cloud Profiler agent or library. Dashboards and alerts you configure yourself.

How to picture it

Think of monitoring a Cloud Run service like watching a staffed help desk. Request count and response codes tell you how many calls came in and whether callers got answers. Latency tells you how long each call took. Instance count tells you how many staff members are currently active. Memory and CPU tell you if the staff are overwhelmed. The number of staff fluctuates based on demand — that is autoscaling — so you need to watch patterns, not individual machines.

How Cloud Run monitoring works

The monitoring flow for a Cloud Run service looks like this:

Your Cloud Run service handles requests and runs containers.
Cloud Run automatically emits metrics to Cloud Monitoring. No agent or configuration required.
Container stdout and stderr go to Cloud Logging automatically.
You build alerts, dashboards, and SLOs on top of those metrics and logs.

All metrics are labeled with both the service name and the revision name. This matters during incidents: if you have traffic split across two revisions (say, 90% to v5 and 10% to v6 as a canary), the combined metrics will show a blended error rate. If v6 has a 5% error rate and v5 is clean, the combined view might show only 0.5% errors, which would not trigger most alerts. Filtering to the specific revision that is misbehaving gives you the true signal.

Canary deployments can hide real error rates

If a new revision handles 10% of traffic and has a 20% error rate, the combined service-level error rate appears as just 2%. That is often below alert thresholds. During any deployment with a traffic split, filter metrics and logs by revision rather than by service. The service-level view is useful for baselines. The revision-level view is what you need during incidents.

Revisions in plain English

Think of Cloud Run revisions like recipe versions at a restaurant. When the chef updates a dish, both the old and new recipes might go out to tables for a while. If guests start complaining, you need to know which recipe version their table received before you can fix anything. Reading unfiltered Cloud Run metrics during a canary is the equivalent of blaming “the kitchen” without knowing which recipe caused the problem.

A real-world example: traffic spike with cold starts

A traffic spike arrives. Cloud Run scales from 2 instances to 18. Each new instance takes 3–4 seconds to start. During that cold-start period, queued requests wait. P99 latency climbs from 180ms to 4 seconds. The error rate stays near zero (requests are slow, not failing), but your p99 latency alert fires. You open the dashboard, see instance count and startup latency spiking together, and understand what happened: the scaling response was slower than the traffic ramp. You can then decide whether to increase minimum instances to avoid cold starts, or accept the latency during ramps for cost reasons. See Cloud Run Scaling Behaviour for how to tune this.

A real-world example: memory leak

Over the course of six hours, memory utilization climbs from 45% to 88%. Your memory alert fires at the 85% threshold. You open Cloud Profiler and find a function allocating large objects without releasing them. You increase the memory limit as a short-term fix and deploy a corrected version. Without the alert at 85%, the first signal would have been OOM-killed containers producing 500 errors with no application log entry.

OOM kills leave no application log

When a container hits the memory limit, the operating system kills the process before it can write a log entry. You will see 5xx errors in your request metrics, but the Logs tab will be empty for those failures. The only evidence is an infrastructure log entry: “Container called exit(1) with code 137”. If 5xx errors appear with no matching application logs, check memory utilization first.

What to check first for any Cloud Run service

These are the eight signals to check whenever you want to understand the health of a Cloud Run service. In a normal state, all of these should be stable. During an incident, one or more will be abnormal and point you toward the cause.

Signal	What it tells you	”Bad” looks like	Typical action
Request count	Total traffic volume and rate	Drops to zero when you expect traffic	Check routing config, deployment status, and Cloud Run service health
4xx error rate	Client-side errors: bad input, auth failures, missing routes	Sustained rate above 1–2% of requests	Check routing config, auth settings, or upstream clients sending invalid input
5xx error rate	Server-side failures: crashes, OOM kills, unhandled exceptions	Any sustained rate above your baseline	Open Logs Explorer for the affected revision; look for crash patterns
p99 latency	Tail latency: the slowest 1% of requests	Rising trend, or exceeds your SLO target	Check Cloud Trace for slow spans; check instance count and startup latency
Memory utilization	How close containers are to the OOM kill threshold	Above 85%	Increase memory limit or investigate the cause with Cloud Profiler
CPU utilization	Processing load per instance	Sustained above 80%	Profile the application or increase CPU allocation
Instance count	Number of active container instances	Spike disproportionate to traffic, or sustained high count during normal traffic	Check startup latency, concurrency settings, and whether containers are crashing and restarting
Startup latency	Time for cold starts to complete	Rising trend or consistently above 5 seconds	Investigate container initialization code; consider increasing minimum instances

In addition to these metrics, check which revision is currently serving traffic before you start filtering logs and metrics during an incident. In the Cloud Run console, the service detail page shows the traffic split. From the CLI:

# Show traffic splits across revisions
gcloud run services describe api-service \
  --region us-central1 \
  --format='table(status.traffic[].revisionName,status.traffic[].percent)'

Key Cloud Run metrics to watch

All Cloud Run metrics live under the run.googleapis.com/ namespace and are available in Cloud Monitoring immediately with no setup. Each metric is labeled with the service name, revision name, and region, so you can filter and group precisely.

Request metrics

run.googleapis.com/request_count: total requests, labeled by response_code and response_code_class (2xx, 4xx, 5xx). This is a delta metric. Use ALIGN_RATE to convert it to requests per second. To compute error rate, divide the 5xx-filtered count by the total. A common mistake is alerting on raw 5xx count without normalizing. A service with 10,000 req/s and 5 errors is healthy; the same 5 errors on a service with 20 req/s is a 25% error rate.
run.googleapis.com/request_latencies: request latency distribution in milliseconds. This is a distribution metric, which means you can query any percentile (p50, p95, p99) in Metrics Explorer or MQL. Use p99 for alerting, not average. Average latency can look fine while 1% of your users are hitting 10-second requests. Always ask: “what are the worst-served users experiencing?”

Container metrics

run.googleapis.com/container/cpu/utilization: CPU utilization as a fraction of the CPU allocated to the container (0.0 to 1.0). High CPU causes latency to increase and cold starts to take longer. If CPU is consistently above 0.8, profile the application with Cloud Profiler before increasing the CPU limit.
run.googleapis.com/container/memory/utilization: memory utilization as a fraction of the memory limit. When this reaches 1.0, the container is OOM-killed immediately. In-flight requests fail. There is no application log entry because the process is killed before it can write one. Your first signal will be 500 errors in the request count metric. Set an alert at 85% so you have time to respond.
run.googleapis.com/container/instance_count: number of active container instances. A spike without a corresponding traffic spike usually means containers are starting and stopping rapidly due to slow startups, crashes, or concurrency misconfiguration. Watch this alongside request count during incidents. High instance count also directly affects your bill. See Cloud Run Cost Optimisation for how to set cost controls.

Startup and concurrency

run.googleapis.com/container/startup_latency: time for new instances to start and become ready to serve requests. Increasing startup latency means your application initialization is taking longer. Common causes include downloading large files on startup, waiting for slow dependencies, or slow database connection establishment. Persistent slow startups hurt p99 latency during traffic ramps. See Cloud Run Container Failed to Start for how to diagnose startup issues.
run.googleapis.com/container/max_request_concurrency: the peak number of concurrent requests handled by a single instance in the sample window. Compare this against your configured concurrency limit. If this is consistently at the limit, Cloud Run is scaling aggressively and each instance is fully loaded.

Recommended dashboard layout

Every Cloud Run service has a built-in Metrics tab in the GCP console. It is useful for a quick check, but it has real limitations: it covers only one service, you cannot customize the time range easily, you cannot add deployment markers, and you cannot share it with your team. A custom Cloud Monitoring dashboard solves all of these and gives you a consistent place to look during incidents.

Minimum production dashboard

Build a dashboard with these panels as a starting point:

Traffic and error overview: request count split by response code class (2xx, 4xx, 5xx), shown as a stacked area chart. Makes it immediately obvious when errors start appearing.
Latency percentiles: p50, p95, and p99 on the same chart. Seeing the gap between p50 and p99 tells you whether you have a general slowdown or a tail latency problem affecting a small fraction of requests.
Memory utilization: a line chart with a horizontal annotation at 85%. Gives you early warning before OOM kills happen.
CPU utilization: useful alongside memory to understand whether a latency problem is resource-bound.
Instance count: shows scaling behavior over time. Spikes that do not correspond to traffic spikes are worth investigating.
Startup latency: useful during incidents where latency is rising and instance count is also rising. Cold start contribution becomes visible.
Revision-filtered views: add a filter variable for revision name so you can quickly narrow all panels to a specific revision during an incident or canary deployment.

If you use CI/CD pipelines to deploy to Cloud Run, consider adding deployment markers to the dashboard so spikes can be immediately correlated with specific deployments.

Alerts every production Cloud Run service should have

Configure these alerts in Cloud Monitoring Alerting for any service that handles real users. These are the minimum. Add more as you learn what matters for your specific workload.

5xx error rate

Alert when the rate of 5xx responses exceeds a threshold. Filter request_count to response_code_class = “5xx” and use ALIGN_RATE over a 1-minute window. Compute error rate as 5xx count divided by total count. A rate above 1% sustained for 5 minutes is a reasonable starting threshold for most services. Adjust based on your baseline. A service that occasionally returns valid 500 errors for certain inputs needs a higher threshold or a more specific filter.

p99 latency

Alert when the 99th percentile of request_latencies exceeds your target for more than 2 minutes. Use ALIGN_PERCENTILE_99 over a 1-minute window. If you do not have a defined SLO yet, start with 2× your typical p99 as the alert threshold. Never alert on average latency for production; average hides tail behavior and you will miss real user-facing problems.

Memory utilization near limit

Alert when container/memory/utilization exceeds 0.85 (85%) for more than 5 minutes. At 100% the container is OOM-killed, in-flight requests fail, and there is no application log entry. The 85% alert gives you time to respond: increase the memory limit, redeploy a fix, or investigate the leak with Cloud Profiler before containers start dying.

When to add an uptime check

Uptime checks make sense when your service has a dedicated health endpoint and you want to catch outages that request metrics would not catch, for example when traffic drops to near zero and you still want to know if the service is reachable. They also validate that your service is accessible from outside GCP. For services with consistent traffic, request-based metrics are usually sufficient.

Threshold tuning and alert noise

New alerts are often too sensitive. Start with conservative thresholds and tighten them after you have observed normal behavior for a few weeks. Use a duration condition (5 minutes of sustained breach, not a single data point) to avoid paging on transient spikes. Add notification channels you will actually look at. An alert that goes to an unread email inbox is not an alert.

Build your dashboard before the first incident

The worst time to build a monitoring dashboard is during a production outage. Set it up when things are calm, ideally the same day you deploy to production. Your first incident will be dramatically shorter if you already have a dashboard you understand and can navigate quickly under pressure.

SLO-based alerting for production services

For mature teams, burn-rate alerting is more reliable than threshold alerting. Define a Cloud Monitoring SLO based on request availability: the percentage of requests that return a non-5xx response.

Open Cloud Monitoring → Services → your Cloud Run service
Click “Create SLO”
Choose “Request-based” as the SLI type
Define good requests as those with response_code_class != “5xx”
Set a target, such as 99.5% over a 30-day rolling window

Once the SLO is defined, add burn-rate alerts: fire immediately when burn rate exceeds 10× over 5 minutes (fast burn, page the on-call), and notify the team when burn rate exceeds 2× over 1 hour (slow burn, investigate during business hours). Burn-rate alerts are better than threshold alerts because they tell you how quickly your error budget is draining, not just whether a threshold was crossed.

Logs, traces, and profiles: which tool to use when

Metrics tell you that something is wrong. Logs, traces, and profiles tell you why. Here is how to decide which tool to open:

Use metrics when

You want a trend over time: is error rate rising? Is latency getting worse after a deployment?
You are building alerts that need to fire automatically
You want an ongoing dashboard for visibility
You want to understand scaling behavior across all instances

Open Logs Explorer when

Something went wrong and you need to see the actual error message or stack trace
You want to correlate events to a specific request ID or timestamp
You are investigating a specific revision or deployment window
You need to count a pattern that does not appear in built-in metrics (use Log-Based Metrics if you need to alert on it)

When you open Logs Explorer for a Cloud Run incident, filter immediately to the affected revision and time window:

resource.type="cloud_run_revision"
resource.labels.service_name="api-service"
resource.labels.revision_name="api-service-00005-abc"
severity>=ERROR

Use Cloud Trace when

Latency is the problem and you need to find which part of a request is slow
Your service calls other GCP services or external APIs and you want the full call chain
Logs show requests completing, but slower than expected, with no errors

Cloud Trace shows you a waterfall breakdown of each request. If your API is slow because it is waiting 800ms for a database query, the trace will show that span immediately.

Use Cloud Profiler when

CPU utilization is unexpectedly high and you need to find which function is responsible
Memory is climbing and you suspect a leak but logs do not show the cause
Latency is high but traces do not show obvious blocking spans; the time is being spent inside your code

Create log-based metrics when

You want to alert on an application-specific pattern that only appears in logs: a custom error code, a business event, or a structured log field your application writes
You want to add that signal to a dashboard without having to manually query logs each time

See Log-Based Metrics for how to create and use them. Use Structured Logging so your log fields are consistently queryable. Unstructured log lines are much harder to build reliable metrics from.

How to investigate a Cloud Run incident

This is the step-by-step workflow for a common Cloud Run production problem: latency rising during a traffic spike with some errors appearing.

Most Cloud Run incidents fall into four categories

A code bug in the current revision. A misconfiguration (wrong environment variable, wrong memory limit, wrong concurrency setting). A startup problem (slow initialization or missing dependency). External resource pressure (a database or downstream service is slow). Identifying the category early narrows the search significantly and avoids wasted time.

Confirm the scope in Cloud Monitoring. Open the Cloud Run service in Cloud Monitoring. Check request count (is traffic elevated?), 5xx rate (how many errors?), and p99 latency (how bad is the tail?). Establish the start time of the problem.
Check instance count and startup latency. Is Cloud Run scaling out? If instance count is rising and startup latency is elevated, cold starts are contributing to the latency. If instance count is normal but latency is high, the problem is inside the containers, not a scaling issue.
Check which revision is serving traffic. If you deployed recently, the problem may be in the new revision. Run gcloud run services describe to see the traffic split. Filter all subsequent investigation to the revision receiving most of the traffic at the incident time.
Open Logs Explorer filtered to the revision. Use resource.type=“cloud_run_revision” and the revision name. Filter to severity>=ERROR. Do errors cluster at a particular time, request path, or user pattern? Look for OOM kills (no application log, just Cloud Run infrastructure logs), startup failures, or unhandled exceptions.
If latency is the issue (not hard errors), open Cloud Trace. Find a slow sample trace from the incident window. Identify which span is taking the most time: your application code, a database call, an external API, or an internal GCP service. This usually points directly to the fix.
If CPU or memory is high, open Cloud Profiler. Identify the function consuming the most CPU or allocating the most memory. High memory with no leak pattern in logs often means a specific code path is holding references it should release.
Decide on the cause category. Most Cloud Run incidents fall into one of four buckets: a code bug (fix and redeploy), a config change (incorrect environment variable, wrong memory limit), a startup problem (slow initialization, missing dependencies; see Cloud Run Container Failed to Start), or external resource pressure (a downstream service or database is slow).
Act based on the category. Roll back the revision if a code bug is confirmed. Increase resource limits if memory or CPU is the bottleneck. Fix the initialization code or increase minimum instances if startup latency is the issue. Add a circuit breaker or fallback if a downstream dependency is failing.

When to use this setup

Monitoring maturity should match the risk and criticality of your service. Here is what the minimum useful setup looks like at different stages:

Hobby or learning service

Use the built-in Metrics tab and Logs tab in the Cloud Run console. Open them when something seems wrong. No additional setup needed. This is fine for services where downtime has no real consequence.

Internal team service

Add a 5xx error rate alert and a p99 latency alert that notifies the relevant Slack channel or email. Use structured logging so log fields are queryable. This takes under an hour to set up and means you will know about problems before users file bug reports.

Customer-facing production service

Full setup: three or more alerts (5xx rate, p99 latency, memory), a custom dashboard, SLO monitoring with burn-rate alerts, structured logging, distributed tracing for latency diagnosis, and a documented incident response workflow. Add log-based metrics for business-critical events. Review the dashboard after every deployment to confirm baseline metrics have not shifted. The cost of setting this up is low; the cost of not having it during a production incident is high.

Common beginner mistakes

Alerting on request volume instead of error rate. A spike in request count is often good news. Alert on the ratio of errors to total requests, or on absolute 5xx count above a threshold. Alerting on high total traffic will page you on your best days.
Ignoring revision splits during incidents. Cloud Run can serve traffic across multiple revisions simultaneously. If you look at blended metrics across all revisions, a problem in the newest revision may appear small. Always filter to the specific revision at the center of the incident.
Not using structured logging. Unstructured log lines are hard to filter and impossible to build log-based metrics from. Write JSON to stdout. Cloud Logging automatically parses the fields and makes them queryable. See Structured Logging for how to do this.
Not watching memory before OOM kills happen. When a container hits the memory limit, it is killed instantly. There is no application log entry. Your first signal is 500 errors in request metrics. Set a memory alert at 85% so you can respond before containers start dying.
Waiting until the first incident to build a dashboard. Building a dashboard during an incident while users are affected is the worst time to do it. Build it when things are calm. Your first incident will be much shorter if you have a dashboard you already understand.
Ignoring scaling and cost implications when instance count spikes. High instance count affects your bill directly. Unexpected instance spikes during periods of normal traffic often indicate a bug (containers crashing and restarting) rather than legitimate load. Investigate the cause rather than just accepting the extra cost.
Using average latency in alerts. Average latency can look healthy while a significant portion of requests are very slow. Use p99 for alerting. Use p50 for understanding the typical user experience. Look at the gap between them to understand how wide your latency distribution is.

Cloud Run monitoring: built-in metrics vs log-based metrics vs tracing

These three approaches are complementary, not competing. Understanding when each one applies will help you avoid over-engineering your monitoring setup.

Approach	What it covers	Setup required	When to use
Built-in Cloud Run metrics	Request count, latency, CPU, memory, instance count, startup latency	None. Available immediately.	Default for all monitoring, alerting, and dashboards. Start here.
Log-based metrics	Any pattern in your application logs: error codes, business events, structured log fields	Create a log-based metric in Cloud Logging; requires structured logging to be reliable	When you need to alert or chart something that only appears in logs, not in built-in metrics
Distributed tracing (Cloud Trace)	Per-request latency breakdown showing where time is spent, across multiple services	Instrument your application with Cloud Trace SDK, or use auto-instrumentation for supported frameworks	When latency is the issue and you need to find the slow span: database, external call, or your own code

For most teams, built-in metrics handle 90% of monitoring needs. Add log-based metrics when you have application-specific signals worth alerting on. Add tracing when latency problems are recurring and you need systematic diagnosis rather than log-hunting.

Frequently asked questions

What metrics does Cloud Run emit automatically?

Cloud Run automatically emits request count, request latencies, container CPU utilization, container memory utilization, container instance count, and startup latency. All are available in Cloud Monitoring with no configuration or agent required.

How do I view Cloud Run logs?

Open the Cloud Run service in the GCP Console and click the Logs tab — it opens Logs Explorer pre-filtered to that service. You can also query directly in Logs Explorer using resource.type="cloud_run_revision" and resource.labels.service_name="your-service".

Do I need uptime checks for Cloud Run?

Not always. If your service receives regular traffic, request-based metrics give you availability data automatically. Add an uptime check if you want proactive alerting during low-traffic windows, or if your service exposes a health endpoint you want to validate independently of user traffic.

What is a good latency target for a Cloud Run API?

It depends on your application and users. A reasonable starting target for an interactive API is under 500ms at p99. Anything consistently above 2 seconds is noticeable to users and worth investigating with Cloud Trace and Cloud Profiler.

When should I use log-based metrics?

Use log-based metrics when the signal you need only appears in your application logs, not in built-in Cloud Run metrics. Examples include specific error codes, business events like order completions, or structured log fields your application writes. Log-based metrics let you alert on those patterns and add them to dashboards.

Last verified: 25 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.