GCP Reference Architecture for Modern Web Apps: Cloud Run, Cloud SQL, Pub/Sub
This is a practical reference architecture for a production web application on Google Cloud Platform. It uses Cloud Run for stateless compute, Cloud SQL for relational data, Pub/Sub for async processing, and a set of managed services for security, CI/CD, and observability. It is not the only valid architecture and not the right starting point for every project. It is one well-tested pattern that works for teams building internet-facing web apps who want managed infrastructure and low operational overhead.
What this architecture is
This reference pattern covers a production-oriented web application with an internet-facing API, a relational database, background job processing, and the infrastructure for security, deployment, and monitoring.
The core stack in plain terms: edge protection and load balancing sit in front. A stateless application layer handles HTTP requests. A relational database and cache store data. An async message queue offloads slow work to background workers. Logging, metrics, and tracing tell you what is happening. A CI/CD pipeline deploys new code safely.
This is a starting point, not a prescription. Many production apps use a subset of these components. The starter vs production comparison below explains what to use on day one and what to add later.
Think of this architecture like a well-run restaurant. Cloud Armor is the bouncer at the door. The load balancer is the host who seats guests at available tables. Cloud Run is the kitchen staff, scaling up when the restaurant is busy and going home when it is quiet. Cloud SQL is the pantry where ingredients (your data) are stored. Pub/Sub is the ticket rail between the front of house and the kitchen: the waiter drops an order on the rail and moves on to the next table instead of waiting. And observability is the manager watching the floor, spotting problems before customers start complaining.
Simple explanation
Here is the full request flow in plain language:
- A user makes a request to your app.
- Cloud Armor checks the request for common attacks (SQL injection, XSS, DDoS) and blocks bad traffic.
- The Global HTTPS Load Balancer terminates TLS and routes the request to the nearest healthy backend.
- Cloud Run receives the request. It reads data from Cloud SQL (relational database) and Memorystore Redis (cache). It pulls secrets like database passwords from Secret Manager.
- If the request triggers slow work like reports, emails, or image processing, the API publishes a message to Pub/Sub and returns immediately.
- A separate Cloud Run worker service picks up the message and processes the job in the background.
- Cloud Logging, Cloud Monitoring, and Cloud Trace capture logs, metrics, and request traces across every service so you can find and fix problems quickly.
- New code goes through Cloud Build and Cloud Deploy, which build, test, and deploy container images through staging to production.
That is the entire system. Every section below explains one part of it in detail.
If you are evaluating whether this architecture fits your project, skip ahead to When to use this and the starter vs production comparison first. Come back to the detailed component sections once you have decided this pattern is relevant.
Architecture at a glance
User request
│
▼
Cloud Armor ──── blocks attacks, rate limits
│
▼
Global HTTPS Load Balancer ──── TLS termination, routing
│
├──▶ Cloud Run: API ──────┬──▶ Cloud SQL (PostgreSQL)
│ ├──▶ Memorystore Redis (cache)
│ ├──▶ Secret Manager
│ └──▶ Pub/Sub topic
│
└──▶ Cloud CDN ──▶ Cloud Storage (static assets)
Pub/Sub topic
│
▼
Cloud Run: Worker ──────┬──▶ Cloud SQL
└──▶ Cloud Storage (files, exports)
CI/CD: GitHub → Cloud Build → Artifact Registry → Cloud Deploy → Cloud Run
Observability: Cloud Logging + Cloud Monitoring + Cloud Trace
Security: IAM (per-service accounts) + Private VPC + Audit LogsComponent table
| Layer | GCP service | Purpose | Why it exists |
|---|---|---|---|
| Edge protection | Cloud Armor | WAF, DDoS mitigation | Blocks common attacks before they reach your app |
| Load balancing | Global HTTPS LB | TLS termination, routing, CDN | Single entry point with global distribution |
| Static assets | Cloud CDN + Cloud Storage | Serve JS, CSS, images from edge | Reduces load on the API, faster page loads |
| Compute | Cloud Run | Stateless HTTP services | Autoscaling, zero cluster management |
| Database | Cloud SQL (PostgreSQL) | Relational data | Managed PostgreSQL with HA and automated backups |
| Cache | Memorystore (Redis) | Session store, query cache | Reduces database load for read-heavy workloads |
| Object storage | Cloud Storage | Files, uploads, exports | Durable storage with lifecycle management |
| Async processing | Pub/Sub | Message queue for background jobs | Decouples slow work from user-facing requests |
| Secrets | Secret Manager | DB passwords, API keys | Secrets stay out of code, config, and env vars |
| Networking | Private VPC + VPC connector | Private connectivity | Database and cache never exposed to the internet |
| CI/CD | Cloud Build + Cloud Deploy + Artifact Registry | Build, test, deploy | Automated pipeline with staging and production |
| Observability | Cloud Monitoring + Logging + Trace | Metrics, logs, tracing | Know what broke and where before users report it |
| Identity | IAM with per-service accounts | Least-privilege access | Each service gets only the permissions it needs |
How it works
Request flow
All internet traffic enters through the Global External HTTPS Load Balancer. The load balancer terminates TLS, checks requests against Cloud Armor security policies, and forwards clean traffic to Cloud Run. Cloud Run is configured with —ingress=internal-and-cloud-load-balancing, which means its direct *.run.app URL is not accessible from the public internet. All traffic must come through the load balancer or from within the VPC.
Data flow
The API service reads from and writes to Cloud SQL over a private IP through a Serverless VPC Access connector. For read-heavy workloads, it checks Memorystore Redis first and falls back to Cloud SQL on a cache miss. Binary files (uploads, exports, images) go to Cloud Storage. The database is never exposed to the public internet. For the details of connecting securely, see Connecting to Cloud SQL Securely.
Async job flow
When the API receives a request that involves slow work (generating a PDF, sending email, calling a third-party API), it publishes a message to a Pub/Sub topic and returns HTTP 202 to the user immediately. A separate Cloud Run worker service subscribes to the topic and processes jobs in the background. Failed messages retry up to a configured limit, then land in a dead-letter topic for investigation. See Event-Driven Systems for the full pattern.
Without a dead-letter topic, a Pub/Sub message that consistently fails will retry until it expires and disappear silently. That means lost business data with no trace of what happened. Always configure a dead-letter topic on every production subscription.
Deployment flow
A push to the main branch triggers Cloud Build, which builds a container image, runs tests, and pushes the image to Artifact Registry. Cloud Deploy promotes the release through staging and production, with a manual approval gate before production. The pipeline authenticates to GCP via Workload Identity Federation, so no long-lived service account keys are stored in GitHub. For a step-by-step setup, see CI/CD Pipelines for Cloud Run.
Monitoring flow
Cloud Run sends container logs to Cloud Logging automatically. Application code writes structured JSON logs with a trace ID for correlation across services. Cloud Monitoring tracks metrics like error rate, latency, and database utilisation, with alerts for thresholds that indicate real problems. Cloud Trace shows the full path of a request across services so you can identify bottlenecks.
Core components and why each one is chosen
Cloud Armor
What it does: Web Application Firewall and DDoS protection. Sits in front of the load balancer and filters traffic using pre-configured rules (OWASP Top 10) and custom policies.
Why it is here: Any internet-facing app needs protection against common web attacks. Cloud Armor integrates natively with the Global HTTPS Load Balancer with no additional infrastructure.
When to skip it: Internal-only APIs behind an Internal Load Balancer. For an MVP with minimal traffic, you can add it later, but do so before launch if the app handles user data or payments.
Global HTTPS Load Balancer
What it does: Terminates TLS, distributes traffic to backends, integrates with Cloud Armor and Cloud CDN.
Why it is here: Gives you a single global IP, managed TLS certificates, and the ability to route traffic to Cloud Run services across regions.
When to skip it: If your app only needs a single region and you are fine with Cloud Run’s built-in HTTPS endpoint, you can start without it. Add it when you need Cloud Armor, CDN, or multi-region routing.
Cloud Run
What it does: Runs stateless containers that scale automatically based on traffic, including scaling to zero. See Cloud Run Overview for the full details.
Why it is here: No cluster to manage, built-in autoscaling, per-request billing, and native container support. For stateless HTTP services, it is the lowest-ops compute option on GCP.
When to skip it: If your workload is stateful, needs GPU access, requires Kubernetes-specific features, or runs long-lived processes that do not fit the request-based model. See the compute comparison below.
Cloud SQL (PostgreSQL)
What it does: Managed PostgreSQL with automated backups, point-in-time recovery, and optional high availability (standby in a second zone). See Cloud SQL Overview.
Why it is here: Most web apps need a relational database. Cloud SQL handles patching, backups, and failover so you do not run your own database servers.
When to skip it: If your data model is purely document-based (consider Firestore) or you need massive horizontal write scaling (consider Bigtable or Spanner).
Memorystore (Redis)
What it does: Managed Redis for caching and session storage. Sits between the app and the database to reduce read load.
Why it is here: For read-heavy APIs, a cache layer can significantly reduce database queries and improve response times. See Stateless vs Stateful Services for the caching pattern.
When to skip it: If your app has low traffic or your database handles the load comfortably. Add it when query latency or database CPU becomes a bottleneck.
Cloud Storage
What it does: Object storage for files, uploads, exports, and static assets. Supports lifecycle rules to automatically move older objects to cheaper storage classes.
Why it is here: Binary data does not belong in a relational database. Cloud Storage is durable, cheap, and integrates with Cloud CDN for static asset serving.
When to skip it: Rarely skipped. Most apps need file storage eventually. But if your app genuinely has no binary data, you can defer it.
Pub/Sub
What it does: Fully managed message queue. Producers publish messages to topics, subscribers consume them asynchronously. See Pub/Sub Overview.
Why it is here: Decouples slow operations (email, reports, file processing) from user-facing requests. The API stays fast because it does not wait for background work to finish.
When to skip it: If every operation in your app completes quickly and you have no background processing needs. For simpler task queuing without fan-out, Cloud Tasks may be a lighter option.
Private VPC and connectivity
What it does: Keeps Cloud SQL, Memorystore, and internal services on a private network with no public IP. Cloud Run connects via a Serverless VPC Access connector. Outbound internet from Cloud Run goes through Cloud NAT.
Why it is here: Defence in depth. Even if an application vulnerability is exploited, the database is not reachable from the internet. See VPC Networks Explained for setup details.
When to skip it: Do not skip this for production workloads that handle real user data. For a quick prototype, you can use Cloud SQL’s public IP with authorized networks, but switch to private connectivity before going to production.
The private VPC is like putting your database in a locked back room. Cloud Run has a key (the VPC connector) to reach it, but nobody walking in off the street can get to it, even if they somehow get past the front door (Cloud Armor). The database has no street-facing entrance at all.
Cloud Build, Cloud Deploy, and Artifact Registry
What it does: Cloud Build runs your build and test pipeline. Artifact Registry stores container images. Cloud Deploy manages promotion through environments. See CI/CD Pipelines for Cloud Run.
Why it is here: Automated, repeatable deployments with a staging gate. Combined with Binary Authorization, only images built through the pipeline can be deployed to production.
When to skip it: For an early-stage project, deploying with gcloud run deploy from your machine is fine. Add the pipeline when you have more than one developer or when you need deployment approvals.
Cloud Logging, Monitoring, and Trace
What it does: Logging captures structured logs from all services. Monitoring tracks metrics and fires alerts. Trace shows distributed request paths. See Cloud Monitoring Overview and Distributed Tracing.
Why it is here: You need to know what your app is doing in production. Logs, metrics, and traces are how you debug incidents, spot regressions, and understand performance.
When to skip it: Cloud Logging is automatic for Cloud Run, so you get it for free. The question is how much you invest in structured logging, custom metrics, and trace propagation. Start with structured JSON logs and basic uptime alerts. Add custom dashboards and detailed tracing as the app matures.
Secret Manager and IAM
What it does: Secret Manager stores sensitive values (database passwords, API keys, tokens). IAM controls which service accounts can access which resources.
Why it is here: Secrets should never live in source code, environment variable configs, or container images. Each Cloud Run service gets its own service account with only the permissions it needs. If one service is compromised, it cannot access another service’s secrets.
When to skip it: Do not skip Secret Manager for any app that has credentials. It is free for low usage and prevents the most common class of credential leaks.
When to use this architecture
This architecture is a good fit when you are building:
- A public web API or SaaS application that needs to handle variable traffic with low ops effort.
- A dashboard, admin backend, or internal tool that serves HTTP requests and stores data in a relational database.
- An app with bursty or unpredictable traffic. Cloud Run scales up and down automatically, including to zero during quiet periods.
- A small team that wants managed services. No Kubernetes clusters to operate, no database servers to patch, no message broker to maintain.
- Workloads with background processing like reports, emails, file processing, and webhooks that benefit from async decoupling via Pub/Sub.
- Apps that need to go to production quickly with security, observability, and CI/CD built in from the start rather than bolted on later.
When not to use this architecture
- Tiny MVP with almost no traffic. If you are testing an idea with a handful of users, Cloud Run plus a managed database is enough. You do not need Cloud Armor, Binary Authorization, or a full Cloud Deploy pipeline on day one. See the starter version below.
- Long-running stateful workloads. Cloud Run has a maximum request timeout and is designed for stateless services. Persistent connections, in-memory state across requests, or long-lived processes fit better on GKE or Compute Engine.
- Apps that need Kubernetes-specific features. If you need StatefulSets, DaemonSets, service mesh, custom scheduling, or GPU workloads, GKE gives you full Kubernetes control.
- Workloads where the request-based model does not fit. Streaming servers, game backends with persistent connections, or ML inference with custom hardware are not a natural fit for Cloud Run. See the compute comparison.
- Teams that would over-engineer by adopting every component. The full architecture described here is for a production app with real traffic and real users. Adopting all of it for a weekend project creates unnecessary complexity.
Starter version vs production version
You do not need every component from day one. Here is what to use at each stage and what to add as the app grows.
| Stage | Components | What to postpone | Why |
|---|---|---|---|
| MVP / early project | Cloud Run + Cloud SQL + Secret Manager + basic Cloud Logging | Cloud Armor, CDN, Pub/Sub workers, Binary Authorization, Cloud Deploy pipeline | Get running fast with the minimum viable stack. Deploy with gcloud run deploy and manage secrets properly from the start. |
| Growing app | Add: Global HTTPS LB + Cloud Armor + Pub/Sub + worker service + structured logging + alerting | Multi-region, Binary Authorization, Cloud Deploy with manual approvals | You now have real users. Protect the frontend, offload slow work, and know when things break. |
| Production / regulated | Add: Binary Authorization + Cloud Deploy pipeline + Memorystore + distributed tracing + VPC Service Controls + audit logging | Multi-region (unless uptime requirements demand it) | Full security controls, automated deployments with approvals, caching for performance, and complete observability. |
| High availability / multi-region | Add: multi-region Cloud Run + Cloud SQL cross-region replicas + disaster recovery plan | None | Needed when your SLA requires surviving a full regional outage. Adds significant complexity and cost. See Multi-Region Architectures. |
If you are building your first production app on GCP, the MVP row is your starting point: Cloud Run + Cloud SQL + Secret Manager. That gives you a deployed, secured, and observable application in an afternoon. Everything else is an upgrade you add when you have a reason to.
The key principle: start simple, add complexity only when you have a specific requirement that justifies it. Every component you add is a component you have to understand, monitor, and pay for.
Cloud Run vs GKE vs Compute Engine
Choosing the compute layer is the biggest architectural decision. Here is how the three main options compare. For a deeper breakdown, see Choosing Between Cloud Run, GKE, and Compute Engine.
Cloud Run is like hailing a taxi: you say where you want to go, someone else drives, and you only pay for the ride. GKE is like leasing a fleet of cars: you pick the vehicles, plan the routes, and handle more of the logistics, but you have full control. Compute Engine is like owning the cars outright: maximum flexibility, but you are responsible for oil changes, insurance, and parking.
| Cloud Run | GKE (Autopilot) | Compute Engine | |
|---|---|---|---|
| Best for | Stateless HTTP services, APIs, background workers | Complex microservices, stateful workloads, Kubernetes-native apps | VMs with full OS control, legacy apps, custom runtimes |
| Ops overhead | Minimal (no clusters, no nodes) | Medium (managed control plane, you manage workload config) | Higher (you manage VMs, patching, scaling) |
| Scaling model | Per-request autoscaling, including to zero | Pod-based autoscaling, node auto-provisioning | Instance groups with autoscaler, or manual |
| Control level | Container-level | Pod and cluster-level (full Kubernetes API) | VM-level (full OS access) |
| Pricing model | Per-request (CPU/memory/request time) | Per-pod resource usage | Per-VM (sustained use and committed use discounts) |
| When to choose | Default for most stateless web workloads | When you need Kubernetes features (StatefulSets, service mesh, custom scheduling) | When you need VM-level control, specific OS, GPU, or legacy compatibility |
Recommendation: Default to Cloud Run for stateless web apps and APIs. It covers the majority of web application use cases with the least operational overhead. Move to GKE when you have a concrete Kubernetes-specific requirement. Use Compute Engine when you need full VM control or your workload cannot run in a container.
Key configuration patterns
These are the most important configuration decisions in this architecture. Each one addresses a common source of production issues.
Restricting Cloud Run ingress
When Cloud Run is behind a load balancer, restrict its ingress so the direct *.run.app URL is not publicly accessible. Without this, attackers can bypass Cloud Armor entirely.
gcloud run deploy api-service \
--ingress=internal-and-cloud-load-balancing \
--no-allow-unauthenticated \
--service-account=api-sa@my-project.iam.gserviceaccount.com \
--region=us-central1If you put Cloud Armor in front of a load balancer but leave the Cloud Run service publicly accessible at its *.run.app URL, attackers bypass the WAF entirely. The —ingress=internal-and-cloud-load-balancing flag is what closes that gap. See Cloud Run Security Model for the full details on ingress and authentication.
The —no-allow-unauthenticated flag requires IAM authentication. The load balancer’s service account is granted the Cloud Run Invoker role, so end users are not affected.
Pub/Sub dead-letter topics
Always configure a dead-letter topic on production subscriptions. Without one, messages that consistently fail processing retry until they expire and are silently dropped.
gcloud pubsub subscriptions create worker-sub \
--topic=job-requests \
--dead-letter-topic=job-requests-dead-letter \
--max-delivery-attempts=5 \
--ack-deadline=300Private database connectivity
Cloud SQL should use private IP only in production. Cloud Run connects through a VPC connector. This means the database has no public IP address and is only reachable from within the VPC.
gcloud compute networks vpc-access connectors create app-connector \
--region=us-central1 \
--network=my-vpc \
--range=10.8.0.0/28
gcloud run deploy api-service \
--vpc-connector=app-connector \
--vpc-egress=private-ranges-onlyFor the complete networking setup including Cloud NAT for outbound internet access, see VPC Networks Explained.
Common mistakes
Adopting the full architecture for a new project. A team of two building an MVP does not need Cloud Armor, Binary Authorization, and a full deployment pipeline. Start with Cloud Run, Cloud SQL, and Secret Manager. Add layers only when you have a requirement that justifies them.
Leaving Cloud Run publicly accessible behind a load balancer. If you add Cloud Armor to a load balancer but leave the direct
*.run.appURL accessible, attackers bypass the WAF entirely. Always set—ingress=internal-and-cloud-load-balancingfor services behind the load balancer.Skipping the dead-letter topic. Without a dead-letter topic, a Pub/Sub message that consistently fails processing retries until it expires and is dropped. Business data is silently lost. Always configure dead-letter topics on production subscriptions.
Weak IAM and secrets handling. Using a single service account for all services, or storing credentials in environment variables or source code, creates unnecessary risk. Use per-service service accounts with least-privilege roles and Secret Manager for all sensitive values.
Adding observability after the first incident. Retroactively adding structured logging and trace propagation means touching every service after something has already gone wrong. The cost of setting it up from the start is small. The value during the first incident, being able to trace a user complaint to a specific log entry in seconds, is significant.
Using exact service choices without understanding trade-offs. Copying this architecture without understanding why each component is here leads to over-engineering or misconfiguration. Read the “when to skip it” notes for each component and make deliberate choices for your specific app.
Summary
- This architecture uses Cloud Run, Cloud SQL, Pub/Sub, and managed security/CI/CD/observability services to build production web apps with low operational overhead.
- Start with the MVP stack (Cloud Run + Cloud SQL + Secret Manager) and add components only when you have a specific requirement that justifies them.
- Cloud Run is the default compute choice for stateless web workloads. Move to GKE for Kubernetes-specific needs or Compute Engine for full VM control.
- Always restrict Cloud Run ingress behind a load balancer, always use dead-letter topics on Pub/Sub, and always keep your database on a private VPC.
- Set up observability from day one. Structured logging and basic alerts cost almost nothing to configure and save hours during the first incident.
Frequently asked questions
Do I need all of these services for a new project?
No. Start with Cloud Run, Cloud SQL, and Secret Manager. That covers compute, data, and secrets with minimal ops overhead. Add Cloud Armor, Pub/Sub, multi-region, and Binary Authorization only when a specific requirement justifies the complexity.
Is Cloud Run enough for most web apps?
For stateless HTTP services, yes. Cloud Run handles autoscaling, TLS, and container management with zero cluster ops. Move to GKE only when you need Kubernetes-specific features like StatefulSets, service mesh, or custom scheduling.
How do I keep Cloud Run private behind a load balancer?
Set --ingress=internal-and-cloud-load-balancing on the Cloud Run service. This blocks direct access to the *.run.app URL from the public internet while still accepting traffic from the Global HTTPS Load Balancer and from within your VPC.
When should I use GKE instead?
When you need StatefulSets, a service mesh, custom pod scheduling, GPU workloads, or fine-grained network policies that Cloud Run does not support. If your team already operates Kubernetes and your workloads need that level of control, GKE is the better fit.
How much does this architecture cost in practice?
It depends on region, traffic volume, database size, and uptime requirements. A minimal version with Cloud Run near-zero scaling and a small Cloud SQL instance can start under $200/month. The Cloud SQL instance is usually the largest fixed cost. Use the GCP Pricing Calculator for estimates specific to your workload.