GCP Reference Architecture for Modern Web Apps: Cloud Run, Cloud SQL, Pub/Sub

This is a practical reference architecture for a production web application on Google Cloud Platform. It uses Cloud Run for stateless compute, Cloud SQL for relational data, Pub/Sub for async processing, and a set of managed services for security, CI/CD, and observability. It is not the only valid architecture and not the right starting point for every project. It is one well-tested pattern that works for teams building internet-facing web apps who want managed infrastructure and low operational overhead.

What this architecture is

This reference pattern covers a production-oriented web application with an internet-facing API, a relational database, background job processing, and the infrastructure for security, deployment, and monitoring.

The core stack in plain terms: edge protection and load balancing sit in front. A stateless application layer handles HTTP requests. A relational database and cache store data. An async message queue offloads slow work to background workers. Logging, metrics, and tracing tell you what is happening. A CI/CD pipeline deploys new code safely.

This is a starting point, not a prescription. Many production apps use a subset of these components. The starter vs production comparison below explains what to use on day one and what to add later.

Mental model

Think of this architecture like a well-run restaurant. Cloud Armor is the bouncer at the door. The load balancer is the host who seats guests at available tables. Cloud Run is the kitchen staff, scaling up when the restaurant is busy and going home when it is quiet. Cloud SQL is the pantry where ingredients (your data) are stored. Pub/Sub is the ticket rail between the front of house and the kitchen: the waiter drops an order on the rail and moves on to the next table instead of waiting. And observability is the manager watching the floor, spotting problems before customers start complaining.

Simple explanation

Here is the full request flow in plain language:

  1. A user makes a request to your app.
  2. Cloud Armor checks the request for common attacks (SQL injection, XSS, DDoS) and blocks bad traffic.
  3. The Global HTTPS Load Balancer terminates TLS and routes the request to the nearest healthy backend.
  4. Cloud Run receives the request. It reads data from Cloud SQL (relational database) and Memorystore Redis (cache). It pulls secrets like database passwords from Secret Manager.
  5. If the request triggers slow work like reports, emails, or image processing, the API publishes a message to Pub/Sub and returns immediately.
  6. A separate Cloud Run worker service picks up the message and processes the job in the background.
  7. Cloud Logging, Cloud Monitoring, and Cloud Trace capture logs, metrics, and request traces across every service so you can find and fix problems quickly.
  8. New code goes through Cloud Build and Cloud Deploy, which build, test, and deploy container images through staging to production.

That is the entire system. Every section below explains one part of it in detail.

Reading tip

If you are evaluating whether this architecture fits your project, skip ahead to When to use this and the starter vs production comparison first. Come back to the detailed component sections once you have decided this pattern is relevant.

Architecture at a glance

User request


Cloud Armor ──── blocks attacks, rate limits


Global HTTPS Load Balancer ──── TLS termination, routing

     ├──▶ Cloud Run: API ──────┬──▶ Cloud SQL (PostgreSQL)
     │                         ├──▶ Memorystore Redis (cache)
     │                         ├──▶ Secret Manager
     │                         └──▶ Pub/Sub topic

     └──▶ Cloud CDN ──▶ Cloud Storage (static assets)

Pub/Sub topic


Cloud Run: Worker ──────┬──▶ Cloud SQL
                        └──▶ Cloud Storage (files, exports)

CI/CD: GitHub → Cloud Build → Artifact Registry → Cloud Deploy → Cloud Run

Observability: Cloud Logging + Cloud Monitoring + Cloud Trace

Security: IAM (per-service accounts) + Private VPC + Audit Logs

Component table

LayerGCP servicePurposeWhy it exists
Edge protectionCloud ArmorWAF, DDoS mitigationBlocks common attacks before they reach your app
Load balancingGlobal HTTPS LBTLS termination, routing, CDNSingle entry point with global distribution
Static assetsCloud CDN + Cloud StorageServe JS, CSS, images from edgeReduces load on the API, faster page loads
ComputeCloud RunStateless HTTP servicesAutoscaling, zero cluster management
DatabaseCloud SQL (PostgreSQL)Relational dataManaged PostgreSQL with HA and automated backups
CacheMemorystore (Redis)Session store, query cacheReduces database load for read-heavy workloads
Object storageCloud StorageFiles, uploads, exportsDurable storage with lifecycle management
Async processingPub/SubMessage queue for background jobsDecouples slow work from user-facing requests
SecretsSecret ManagerDB passwords, API keysSecrets stay out of code, config, and env vars
NetworkingPrivate VPC + VPC connectorPrivate connectivityDatabase and cache never exposed to the internet
CI/CDCloud Build + Cloud Deploy + Artifact RegistryBuild, test, deployAutomated pipeline with staging and production
ObservabilityCloud Monitoring + Logging + TraceMetrics, logs, tracingKnow what broke and where before users report it
IdentityIAM with per-service accountsLeast-privilege accessEach service gets only the permissions it needs

How it works

Request flow

All internet traffic enters through the Global External HTTPS Load Balancer. The load balancer terminates TLS, checks requests against Cloud Armor security policies, and forwards clean traffic to Cloud Run. Cloud Run is configured with —ingress=internal-and-cloud-load-balancing, which means its direct *.run.app URL is not accessible from the public internet. All traffic must come through the load balancer or from within the VPC.

Data flow

The API service reads from and writes to Cloud SQL over a private IP through a Serverless VPC Access connector. For read-heavy workloads, it checks Memorystore Redis first and falls back to Cloud SQL on a cache miss. Binary files (uploads, exports, images) go to Cloud Storage. The database is never exposed to the public internet. For the details of connecting securely, see Connecting to Cloud SQL Securely.

Async job flow

When the API receives a request that involves slow work (generating a PDF, sending email, calling a third-party API), it publishes a message to a Pub/Sub topic and returns HTTP 202 to the user immediately. A separate Cloud Run worker service subscribes to the topic and processes jobs in the background. Failed messages retry up to a configured limit, then land in a dead-letter topic for investigation. See Event-Driven Systems for the full pattern.

Dead-letter topics are not optional

Without a dead-letter topic, a Pub/Sub message that consistently fails will retry until it expires and disappear silently. That means lost business data with no trace of what happened. Always configure a dead-letter topic on every production subscription.

Deployment flow

A push to the main branch triggers Cloud Build, which builds a container image, runs tests, and pushes the image to Artifact Registry. Cloud Deploy promotes the release through staging and production, with a manual approval gate before production. The pipeline authenticates to GCP via Workload Identity Federation, so no long-lived service account keys are stored in GitHub. For a step-by-step setup, see CI/CD Pipelines for Cloud Run.

Monitoring flow

Cloud Run sends container logs to Cloud Logging automatically. Application code writes structured JSON logs with a trace ID for correlation across services. Cloud Monitoring tracks metrics like error rate, latency, and database utilisation, with alerts for thresholds that indicate real problems. Cloud Trace shows the full path of a request across services so you can identify bottlenecks.

Core components and why each one is chosen

Cloud Armor

What it does: Web Application Firewall and DDoS protection. Sits in front of the load balancer and filters traffic using pre-configured rules (OWASP Top 10) and custom policies.

Why it is here: Any internet-facing app needs protection against common web attacks. Cloud Armor integrates natively with the Global HTTPS Load Balancer with no additional infrastructure.

When to skip it: Internal-only APIs behind an Internal Load Balancer. For an MVP with minimal traffic, you can add it later, but do so before launch if the app handles user data or payments.

Global HTTPS Load Balancer

What it does: Terminates TLS, distributes traffic to backends, integrates with Cloud Armor and Cloud CDN.

Why it is here: Gives you a single global IP, managed TLS certificates, and the ability to route traffic to Cloud Run services across regions.

When to skip it: If your app only needs a single region and you are fine with Cloud Run’s built-in HTTPS endpoint, you can start without it. Add it when you need Cloud Armor, CDN, or multi-region routing.

Cloud Run

What it does: Runs stateless containers that scale automatically based on traffic, including scaling to zero. See Cloud Run Overview for the full details.

Why it is here: No cluster to manage, built-in autoscaling, per-request billing, and native container support. For stateless HTTP services, it is the lowest-ops compute option on GCP.

When to skip it: If your workload is stateful, needs GPU access, requires Kubernetes-specific features, or runs long-lived processes that do not fit the request-based model. See the compute comparison below.

Cloud SQL (PostgreSQL)

What it does: Managed PostgreSQL with automated backups, point-in-time recovery, and optional high availability (standby in a second zone). See Cloud SQL Overview.

Why it is here: Most web apps need a relational database. Cloud SQL handles patching, backups, and failover so you do not run your own database servers.

When to skip it: If your data model is purely document-based (consider Firestore) or you need massive horizontal write scaling (consider Bigtable or Spanner).

Memorystore (Redis)

What it does: Managed Redis for caching and session storage. Sits between the app and the database to reduce read load.

Why it is here: For read-heavy APIs, a cache layer can significantly reduce database queries and improve response times. See Stateless vs Stateful Services for the caching pattern.

When to skip it: If your app has low traffic or your database handles the load comfortably. Add it when query latency or database CPU becomes a bottleneck.

Cloud Storage

What it does: Object storage for files, uploads, exports, and static assets. Supports lifecycle rules to automatically move older objects to cheaper storage classes.

Why it is here: Binary data does not belong in a relational database. Cloud Storage is durable, cheap, and integrates with Cloud CDN for static asset serving.

When to skip it: Rarely skipped. Most apps need file storage eventually. But if your app genuinely has no binary data, you can defer it.

Pub/Sub

What it does: Fully managed message queue. Producers publish messages to topics, subscribers consume them asynchronously. See Pub/Sub Overview.

Why it is here: Decouples slow operations (email, reports, file processing) from user-facing requests. The API stays fast because it does not wait for background work to finish.

When to skip it: If every operation in your app completes quickly and you have no background processing needs. For simpler task queuing without fan-out, Cloud Tasks may be a lighter option.

Private VPC and connectivity

What it does: Keeps Cloud SQL, Memorystore, and internal services on a private network with no public IP. Cloud Run connects via a Serverless VPC Access connector. Outbound internet from Cloud Run goes through Cloud NAT.

Why it is here: Defence in depth. Even if an application vulnerability is exploited, the database is not reachable from the internet. See VPC Networks Explained for setup details.

When to skip it: Do not skip this for production workloads that handle real user data. For a quick prototype, you can use Cloud SQL’s public IP with authorized networks, but switch to private connectivity before going to production.

Analogy

The private VPC is like putting your database in a locked back room. Cloud Run has a key (the VPC connector) to reach it, but nobody walking in off the street can get to it, even if they somehow get past the front door (Cloud Armor). The database has no street-facing entrance at all.

Cloud Build, Cloud Deploy, and Artifact Registry

What it does: Cloud Build runs your build and test pipeline. Artifact Registry stores container images. Cloud Deploy manages promotion through environments. See CI/CD Pipelines for Cloud Run.

Why it is here: Automated, repeatable deployments with a staging gate. Combined with Binary Authorization, only images built through the pipeline can be deployed to production.

When to skip it: For an early-stage project, deploying with gcloud run deploy from your machine is fine. Add the pipeline when you have more than one developer or when you need deployment approvals.

Cloud Logging, Monitoring, and Trace

What it does: Logging captures structured logs from all services. Monitoring tracks metrics and fires alerts. Trace shows distributed request paths. See Cloud Monitoring Overview and Distributed Tracing.

Why it is here: You need to know what your app is doing in production. Logs, metrics, and traces are how you debug incidents, spot regressions, and understand performance.

When to skip it: Cloud Logging is automatic for Cloud Run, so you get it for free. The question is how much you invest in structured logging, custom metrics, and trace propagation. Start with structured JSON logs and basic uptime alerts. Add custom dashboards and detailed tracing as the app matures.

Secret Manager and IAM

What it does: Secret Manager stores sensitive values (database passwords, API keys, tokens). IAM controls which service accounts can access which resources.

Why it is here: Secrets should never live in source code, environment variable configs, or container images. Each Cloud Run service gets its own service account with only the permissions it needs. If one service is compromised, it cannot access another service’s secrets.

When to skip it: Do not skip Secret Manager for any app that has credentials. It is free for low usage and prevents the most common class of credential leaks.

When to use this architecture

This architecture is a good fit when you are building:

  • A public web API or SaaS application that needs to handle variable traffic with low ops effort.
  • A dashboard, admin backend, or internal tool that serves HTTP requests and stores data in a relational database.
  • An app with bursty or unpredictable traffic. Cloud Run scales up and down automatically, including to zero during quiet periods.
  • A small team that wants managed services. No Kubernetes clusters to operate, no database servers to patch, no message broker to maintain.
  • Workloads with background processing like reports, emails, file processing, and webhooks that benefit from async decoupling via Pub/Sub.
  • Apps that need to go to production quickly with security, observability, and CI/CD built in from the start rather than bolted on later.

When not to use this architecture

  • Tiny MVP with almost no traffic. If you are testing an idea with a handful of users, Cloud Run plus a managed database is enough. You do not need Cloud Armor, Binary Authorization, or a full Cloud Deploy pipeline on day one. See the starter version below.
  • Long-running stateful workloads. Cloud Run has a maximum request timeout and is designed for stateless services. Persistent connections, in-memory state across requests, or long-lived processes fit better on GKE or Compute Engine.
  • Apps that need Kubernetes-specific features. If you need StatefulSets, DaemonSets, service mesh, custom scheduling, or GPU workloads, GKE gives you full Kubernetes control.
  • Workloads where the request-based model does not fit. Streaming servers, game backends with persistent connections, or ML inference with custom hardware are not a natural fit for Cloud Run. See the compute comparison.
  • Teams that would over-engineer by adopting every component. The full architecture described here is for a production app with real traffic and real users. Adopting all of it for a weekend project creates unnecessary complexity.

Starter version vs production version

You do not need every component from day one. Here is what to use at each stage and what to add as the app grows.

StageComponentsWhat to postponeWhy
MVP / early projectCloud Run + Cloud SQL + Secret Manager + basic Cloud LoggingCloud Armor, CDN, Pub/Sub workers, Binary Authorization, Cloud Deploy pipelineGet running fast with the minimum viable stack. Deploy with gcloud run deploy and manage secrets properly from the start.
Growing appAdd: Global HTTPS LB + Cloud Armor + Pub/Sub + worker service + structured logging + alertingMulti-region, Binary Authorization, Cloud Deploy with manual approvalsYou now have real users. Protect the frontend, offload slow work, and know when things break.
Production / regulatedAdd: Binary Authorization + Cloud Deploy pipeline + Memorystore + distributed tracing + VPC Service Controls + audit loggingMulti-region (unless uptime requirements demand it)Full security controls, automated deployments with approvals, caching for performance, and complete observability.
High availability / multi-regionAdd: multi-region Cloud Run + Cloud SQL cross-region replicas + disaster recovery planNoneNeeded when your SLA requires surviving a full regional outage. Adds significant complexity and cost. See Multi-Region Architectures.
Start here

If you are building your first production app on GCP, the MVP row is your starting point: Cloud Run + Cloud SQL + Secret Manager. That gives you a deployed, secured, and observable application in an afternoon. Everything else is an upgrade you add when you have a reason to.

The key principle: start simple, add complexity only when you have a specific requirement that justifies it. Every component you add is a component you have to understand, monitor, and pay for.

Cloud Run vs GKE vs Compute Engine

Choosing the compute layer is the biggest architectural decision. Here is how the three main options compare. For a deeper breakdown, see Choosing Between Cloud Run, GKE, and Compute Engine.

How to think about it

Cloud Run is like hailing a taxi: you say where you want to go, someone else drives, and you only pay for the ride. GKE is like leasing a fleet of cars: you pick the vehicles, plan the routes, and handle more of the logistics, but you have full control. Compute Engine is like owning the cars outright: maximum flexibility, but you are responsible for oil changes, insurance, and parking.

Cloud RunGKE (Autopilot)Compute Engine
Best forStateless HTTP services, APIs, background workersComplex microservices, stateful workloads, Kubernetes-native appsVMs with full OS control, legacy apps, custom runtimes
Ops overheadMinimal (no clusters, no nodes)Medium (managed control plane, you manage workload config)Higher (you manage VMs, patching, scaling)
Scaling modelPer-request autoscaling, including to zeroPod-based autoscaling, node auto-provisioningInstance groups with autoscaler, or manual
Control levelContainer-levelPod and cluster-level (full Kubernetes API)VM-level (full OS access)
Pricing modelPer-request (CPU/memory/request time)Per-pod resource usagePer-VM (sustained use and committed use discounts)
When to chooseDefault for most stateless web workloadsWhen you need Kubernetes features (StatefulSets, service mesh, custom scheduling)When you need VM-level control, specific OS, GPU, or legacy compatibility

Recommendation: Default to Cloud Run for stateless web apps and APIs. It covers the majority of web application use cases with the least operational overhead. Move to GKE when you have a concrete Kubernetes-specific requirement. Use Compute Engine when you need full VM control or your workload cannot run in a container.

Key configuration patterns

These are the most important configuration decisions in this architecture. Each one addresses a common source of production issues.

Restricting Cloud Run ingress

When Cloud Run is behind a load balancer, restrict its ingress so the direct *.run.app URL is not publicly accessible. Without this, attackers can bypass Cloud Armor entirely.

gcloud run deploy api-service \
  --ingress=internal-and-cloud-load-balancing \
  --no-allow-unauthenticated \
  --service-account=api-sa@my-project.iam.gserviceaccount.com \
  --region=us-central1
Security: do not skip this

If you put Cloud Armor in front of a load balancer but leave the Cloud Run service publicly accessible at its *.run.app URL, attackers bypass the WAF entirely. The —ingress=internal-and-cloud-load-balancing flag is what closes that gap. See Cloud Run Security Model for the full details on ingress and authentication.

The —no-allow-unauthenticated flag requires IAM authentication. The load balancer’s service account is granted the Cloud Run Invoker role, so end users are not affected.

Pub/Sub dead-letter topics

Always configure a dead-letter topic on production subscriptions. Without one, messages that consistently fail processing retry until they expire and are silently dropped.

gcloud pubsub subscriptions create worker-sub \
  --topic=job-requests \
  --dead-letter-topic=job-requests-dead-letter \
  --max-delivery-attempts=5 \
  --ack-deadline=300

Private database connectivity

Cloud SQL should use private IP only in production. Cloud Run connects through a VPC connector. This means the database has no public IP address and is only reachable from within the VPC.

gcloud compute networks vpc-access connectors create app-connector \
  --region=us-central1 \
  --network=my-vpc \
  --range=10.8.0.0/28

gcloud run deploy api-service \
  --vpc-connector=app-connector \
  --vpc-egress=private-ranges-only

For the complete networking setup including Cloud NAT for outbound internet access, see VPC Networks Explained.

Common mistakes

  1. Adopting the full architecture for a new project. A team of two building an MVP does not need Cloud Armor, Binary Authorization, and a full deployment pipeline. Start with Cloud Run, Cloud SQL, and Secret Manager. Add layers only when you have a requirement that justifies them.

  2. Leaving Cloud Run publicly accessible behind a load balancer. If you add Cloud Armor to a load balancer but leave the direct *.run.app URL accessible, attackers bypass the WAF entirely. Always set —ingress=internal-and-cloud-load-balancing for services behind the load balancer.

  3. Skipping the dead-letter topic. Without a dead-letter topic, a Pub/Sub message that consistently fails processing retries until it expires and is dropped. Business data is silently lost. Always configure dead-letter topics on production subscriptions.

  4. Weak IAM and secrets handling. Using a single service account for all services, or storing credentials in environment variables or source code, creates unnecessary risk. Use per-service service accounts with least-privilege roles and Secret Manager for all sensitive values.

  5. Adding observability after the first incident. Retroactively adding structured logging and trace propagation means touching every service after something has already gone wrong. The cost of setting it up from the start is small. The value during the first incident, being able to trace a user complaint to a specific log entry in seconds, is significant.

  6. Using exact service choices without understanding trade-offs. Copying this architecture without understanding why each component is here leads to over-engineering or misconfiguration. Read the “when to skip it” notes for each component and make deliberate choices for your specific app.

Frequently asked questions

Do I need all of these services for a new project?

No. Start with Cloud Run, Cloud SQL, and Secret Manager. That covers compute, data, and secrets with minimal ops overhead. Add Cloud Armor, Pub/Sub, multi-region, and Binary Authorization only when a specific requirement justifies the complexity.

Is Cloud Run enough for most web apps?

For stateless HTTP services, yes. Cloud Run handles autoscaling, TLS, and container management with zero cluster ops. Move to GKE only when you need Kubernetes-specific features like StatefulSets, service mesh, or custom scheduling.

How do I keep Cloud Run private behind a load balancer?

Set --ingress=internal-and-cloud-load-balancing on the Cloud Run service. This blocks direct access to the *.run.app URL from the public internet while still accepting traffic from the Global HTTPS Load Balancer and from within your VPC.

When should I use GKE instead?

When you need StatefulSets, a service mesh, custom pod scheduling, GPU workloads, or fine-grained network policies that Cloud Run does not support. If your team already operates Kubernetes and your workloads need that level of control, GKE is the better fit.

How much does this architecture cost in practice?

It depends on region, traffic volume, database size, and uptime requirements. A minimal version with Cloud Run near-zero scaling and a small Cloud SQL instance can start under $200/month. The Cloud SQL instance is usually the largest fixed cost. Use the GCP Pricing Calculator for estimates specific to your workload.

Last verified: 26 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.