Cloud Profiler in GCP: What It Is, When to Use It, and How to Read Flame Graphs
Cloud Profiler is a continuous profiling tool for production applications running in Google Cloud. It samples your running application at intervals and shows you which functions are consuming the most CPU time, memory, or wall time, without requiring manual instrumentation or application downtime. When your metrics show elevated CPU or latency and your traces narrow the problem to slow application code, Cloud Profiler answers the next question: which specific function inside that code is responsible?
What Cloud Profiler is
Cloud Profiler is a managed service that continuously collects performance profiles from your running application and stores them in Google Cloud. A profile is a statistical snapshot of where your program is spending its time or allocating memory. Cloud Profiler builds these profiles by periodically sampling your application’s call stack in production and uploading those samples to the Cloud Console for you to examine.
It works with Go, Java, Node.js, and Python applications. The agent is a library you initialize inside your application at startup. There is no separate process to manage and no infrastructure changes required. Once initialized, it runs in the background and collects profiles continuously.
The main output is a flame graph: a visual representation of your call stack that makes it straightforward to see which functions are consuming the most resources. Once you understand how to read one, Cloud Profiler becomes a fast way to find code-level performance problems in a running production system.
A simple explanation
Imagine a manager who walks through an office every few minutes and notes what each employee is doing at that exact moment. After a week of observations, the notes show that 40% of the time someone is processing invoices, 30% is in meetings, and 10% is debugging a broken printer. The manager did not follow anyone around all day. The random snapshots built a statistically accurate picture of where work is being spent. Cloud Profiler works the same way: it periodically samples your running application to build a picture of which functions are consuming resources, without measuring every single operation.
This approach is called statistical sampling, and it is what makes Cloud Profiler safe for production. Because it only samples at intervals rather than tracking every function call, the CPU overhead is low enough to leave it running all the time, typically under 1%.
When to use Cloud Profiler
Cloud Profiler is most useful when you already have a signal that something is wrong with application code performance, but you do not yet know which function is responsible. Typical triggers include:
- High CPU utilization with no clear cause. Your Cloud Monitoring dashboards show elevated CPU, but there is no obvious spike in request volume or background job activity to explain it.
- Slow application spans in traces. Cloud Trace identifies a span as slow, but the time is not being spent in a database query or external API call. The time is inside your own application code.
- Unexpected memory growth. Your service is using more memory over time and you want to identify which code path is allocating objects that are not being released.
- Performance regression after a deployment. Latency increased after a recent deploy and you want to compare CPU profiles before and after to isolate what changed.
- Suspected allocation hotspots driving garbage collection. High GC pressure is causing latency spikes and you want to find which functions are creating the most short-lived objects.
Metrics show high CPU or latency. Use Cloud Trace to find which request or span is slow. If the slow time is inside your own code rather than a database query or external call, switch to Cloud Profiler to find which function is responsible.
Cloud Profiler is less useful for diagnosing problems caused by external dependencies such as a slow database query or a network call to a third-party API. Those are better investigated with distributed tracing and the dependency’s own metrics.
Cloud Profiler vs Cloud Trace
Cloud Trace and Cloud Profiler answer different questions and are designed to be used together, not instead of each other.
- Cloud Trace tells you which request or span is slow. It records timing information across services and shows you the sequence of operations that made up a request. Use it to identify that a particular endpoint is slow and to see which step in its execution is taking the most time.
- Cloud Profiler tells you which function inside your code is consuming CPU or time. It does not track individual requests. Instead, it samples the entire running process and shows you aggregate function-level cost across all traffic.
Think of diagnosing a slow car. Cloud Trace is the mechanic saying “the engine is taking too long to respond on cold starts.” Cloud Profiler is the diagnostic scanner that points to the specific component inside the engine causing the delay. You need the first observation to know something is wrong. You need the second tool to know what to fix.
A common workflow: use distributed tracing to identify a slow service or span, then switch to Cloud Profiler to find the specific function or loop inside that service that is responsible. Trace gives you the where in your system. Profiler gives you the what inside the code.
If your trace shows that most of a span’s time is consumed by a database call or an external API, Cloud Profiler cannot diagnose that. The bottleneck is outside your application code. Check query plans, connection pool settings, or network latency instead.
How Cloud Profiler works
Cloud Profiler uses statistical sampling. The agent periodically interrupts the application and records which function is currently executing (for CPU profiles) or which objects are allocated in the heap (for memory profiles). By collecting thousands of samples over time, it builds a statistical picture of where time is spent without tracking every individual function call.
Profile collection happens in roughly 10-second windows, approximately once per minute per instance. When you have multiple instances of the same service running, Cloud Profiler coordinates across them so they do not all profile at the same time. This distributes the small overhead across your fleet rather than concentrating it on a single instance.
The agent uploads profile data to the Cloud Profiler backend, which aggregates samples into a flame graph visible in the Cloud Console. Because sampling only briefly interrupts the application at intervals, the CPU overhead is typically under 1%.
Most profiling tools instrument every function call, which can add 10–50% overhead. Cloud Profiler takes random snapshots instead. The statistical picture that emerges from thousands of samples is accurate enough to find real bottlenecks, and the application barely notices the sampling. This is why you can leave it running all the time rather than enabling it only when something breaks.
Supported profile types and language caveats
Cloud Profiler collects different profile types depending on what you are investigating. Not all profile types are available in all languages. The table below reflects the current supported combinations. Check the official documentation for the latest state, as support does expand over time.
| Profile type | What it shows | Go | Java | Node.js | Python |
|---|---|---|---|---|---|
| CPU time | Functions using the most CPU | Yes | Yes | Yes | Yes |
| Wall time | Functions taking the most real time, including I/O wait and lock contention | Yes | Yes | Yes | Yes |
| Heap (live) | Allocations currently live in memory | Yes | Yes | No | No |
| Allocated heap | Functions allocating the most memory over time, including released objects | Yes | Yes | No | No |
| Contention | Synchronization points causing waits (locks, channels) | Yes | Yes | No | No |
| Threads | Thread state distribution: running, waiting, or blocked | No | Yes | No | No |
Python support covers CPU time and wall-time profiles only. Heap, contention, and thread profiling are not available for Python. If a profile type is missing in the console, that does not mean your application has no problem. It means that profile type is not supported for your runtime. If you need heap or contention profiling, Go and Java runtimes provide more complete coverage.
Example setup: Python
Each supported language has its own agent and initialization path. The example below shows Python. Go, Java, and Node.js each have their own libraries and setup guides. See the Cloud Profiler documentation for language-specific instructions.
Install the Python agent:
pip install google-cloud-profiler
Initialize it at application startup, before your web framework or main logic runs:
import googlecloudprofiler
def main():
googlecloudprofiler.start(
service='api-service',
service_version='1.0.0',
verbose=3,
)
# rest of your application startup
The service name groups profiles together in the console. Call googlecloudprofiler.start() once at startup; it runs in the background from that point forward and requires no further interaction.
Set service_version to your Docker image tag or Git commit SHA. Without it, profiles from different code versions are grouped together under the same service name and cannot be compared. With it, you can diff profiles before and after any deployment to spot regressions or confirm that an optimization actually worked.
IAM and permissions
The service identity used by your workload needs permission to upload profiles to Cloud Profiler. The required role is roles/cloudprofiler.agent. Grant this role to the specific service account your workload runs as, not a broad project-level role.
For example, if you are deploying to Cloud Run, identify the service account attached to that Cloud Run service and grant it roles/cloudprofiler.agent on the project. For GKE workloads using Workload Identity, ensure the Kubernetes service account is mapped to a Google service account that has this role.
Do not assume the default Compute Engine service account already has profiler permissions through an Editor or Owner role. Grant roles/cloudprofiler.agent explicitly on the workload’s service account. If profiling silently stops working after a service account change or a permissions audit, this is the first thing to check.
How to read flame graphs
Cloud Profiler displays profiles as flame graphs. Two rules cover most of what you need to know:
Width equals cost. A wide bar means that function (and everything it calls) consumed a large proportion of CPU time, wall time, or memory. Narrow bars are cheap regardless of where they appear.
Height equals call depth. Bars higher in the graph were called by bars below them. The bottom bar is your entry point. Height has nothing to do with cost.
To locate a bottleneck:
- Find wide bars that sit near the top of their call subtree. A wide bar with only narrow bars above it means that function itself is consuming resources, not something it called.
- A wide bar that spans many tall, narrow bars above it is a common ancestor. The expensive work is happening in a callee, not in the function itself.
- Do not focus on the deepest functions in the stack. Depth does not indicate cost. Width does.
Think of a flame graph like a building floorplan where each room’s size represents how much time is spent inside it. A large room is expensive. A large room that is entirely subdivided into smaller rooms is not itself the bottleneck: the cost is inside the sub-rooms. A large room with no sub-rooms is the one you want to optimize. That is always the wide bar with nothing wide above it.
You can click any bar in the Cloud Console flame graph to zoom into that subtree and examine its callees in more detail.
Comparing profiles across deployments
Cloud Profiler lets you compare two profiles side by side. Set service_version to a meaningful value such as your image tag or commit SHA so profiles are correctly labeled per deployment.
After deploying a new version, select two profiles in the console: one from before the deployment and one from after. Functions that grew wider in the new version are consuming more resources than before. Functions that shrank were improved.
This comparison mode is how you verify that a performance optimization actually worked at the code level. Aggregate latency metrics may show overall improvement but they do not isolate which specific function changed. The profile comparison does.
After any significant deploy, pull up a before and after comparison in Cloud Profiler within the first hour. A function that grew substantially wider is a regression worth investigating immediately, before it accumulates into a reported slowdown. This takes two minutes and is far cheaper than an on-call page at 2am.
Common beginner mistakes
- Only profiling in development, not production. Development profiles reflect your machine, your data, and your local traffic. Production hotspots are often different due to real concurrency, larger data sets, and unexpected traffic patterns. Enable Cloud Profiler from the start and treat it as always-on infrastructure, not a diagnostic tool you reach for during an incident.
- Not setting
service_version. Without a version string, profiles from different deployments are grouped together and cannot be meaningfully compared. You lose the ability to see what changed between deploys. Setservice_versionto your image tag or commit SHA every time. - Reading flame graphs by height instead of width. The depth of a bar in the call stack has no relation to its cost. Width is cost. Focus on the widest bars at the top of their call subtrees. Those are your bottlenecks.
- Using CPU profiles to investigate I/O-bound slowness. If your service is slow but CPU utilization is normal or low, a CPU profile will look fine. The bottleneck is not in CPU execution. Use wall-time profiles to find functions that are slow because they are waiting on I/O, locks, or external calls.
- Assuming all profile types are available for all languages. Python support is narrower than Go or Java. If a profile type is not available in the console for your runtime, that absence does not mean your application has no problem. It means that profile type is not supported for your language.
When this is not the right tool
Cloud Profiler answers one specific question: which function inside your application code is consuming resources? For other types of production problems, different tools are more appropriate:
- The bottleneck is an external dependency. Cloud Profiler samples your application code, not external services. If the slow path is a database query or a network call to a downstream API, use distributed tracing and the external system’s own metrics instead.
- You need to understand request-level latency across services. Cloud Profiler aggregates samples across all requests. It does not record individual request timings. Use Cloud Trace to analyze individual requests and identify which step in a multi-service call chain is consuming time.
- You need to investigate an error or exception. Cloud Profiler does not record errors or exceptions. Use Logs Explorer to search for errors and structured logging to capture request context around them.
- You need to understand resource utilization trends over time. Use Cloud Monitoring metrics and dashboards for historical trends in CPU, memory, and request rates.
- You are debugging a production crash or panic. Cloud Profiler does not capture crash data. Use logs and any crash reporting tooling available for your runtime.
For a broader view of how these tools fit together during a production investigation, see Debugging Production Systems.
Summary
- Cloud Profiler continuously samples CPU, memory, and wall-time profiles in production with under 1% overhead
- Supported languages: Go, Java, Node.js, Python. Available profile types vary by language
- Python support is narrower than Go or Java: heap, contention, and thread profiling are not available for Python
- The agent is a library initialized inside your app at startup, not a sidecar container
- Grant
roles/cloudprofiler.agentto the workload’s service account explicitly. Do not rely on broad project roles - Flame graphs: width equals cost, height equals call depth. Find wide bars near the top of their call subtrees
- Set
service_versionto your image tag or commit SHA to enable meaningful cross-deployment comparisons - Use Cloud Trace to identify which span is slow, then Cloud Profiler to find which function inside the code is responsible
Frequently asked questions
What languages does Cloud Profiler support?
Cloud Profiler supports Go, Java, Node.js, and Python. Available profile types vary by language. Python support is narrower than Go or Java, and heap, contention, and thread profiling are not available for Python.
Does Cloud Profiler significantly impact application performance?
No. Cloud Profiler uses statistical sampling rather than instrumenting every function call. The overhead is typically less than 1% of CPU, making it safe to leave running continuously in production.
Is the Cloud Profiler agent a sidecar container?
No. The profiling agent runs inside your application process as a library you initialize at startup. No sidecar, no infrastructure changes required.
What is the difference between Cloud Profiler and Cloud Trace?
Cloud Trace identifies which request or span is slow across a distributed system. Cloud Profiler identifies which function inside your code is consuming CPU or time. Use Trace to find the slow path, then Profiler to find the code-level hotspot within that path.
When should I use wall-time profiling instead of CPU profiling?
When your application is slow but CPU utilization is normal or low. Wall time captures time spent waiting on I/O, locks, or external calls, not just active CPU execution. If CPU profiles look clean but the service is still slow, switch to wall-time profiles.