GCP Global Load Balancing Explained: Anycast, Failover, and Cloud CDN
GCP global load balancing puts your application behind a single Anycast IP that is announced from Google edge nodes in dozens of cities worldwide. A user in Singapore does not connect to a data centre in the US — they connect to the nearest edge node, then travel Google’s private network to reach your backends. The result is faster connections, edge-based TLS termination, and automatic cross-region failover when backends in multiple regions are configured.
What is GCP global load balancing?
A global load balancer is a managed service that accepts internet traffic and distributes it to your backend instances, containers, or serverless services. What makes it global is where it accepts that traffic: not at a single fixed location, but at Google edge nodes distributed across the world.
GCP’s global external Application Load Balancer is a layer 7 reverse proxy. It terminates HTTPS connections, reads the request, applies routing rules from a URL map, and forwards the request to the appropriate backend. All of this happens close to the user — at the edge — before the request travels Google’s internal network to your backends.
The core building block is Anycast: a single IP address that your DNS points to, but which is announced from many edge locations simultaneously. Internet routing automatically sends each user to whichever edge location is topographically nearest to them.
Simple explanation
Think of a chain of post offices that all share one postal address. No matter where in the country you post a letter to that address, it is handled by the nearest branch rather than being trucked to a central location. You only ever need one address. The logistics happen invisibly behind the scenes.
GCP global load balancing works the same way. Your users see one IP address and one domain name. A user in Tokyo connects at the nearest Tokyo edge node. A user in London connects at a London edge node. From there, their request travels Google’s private fibre — not the open internet — to your backend servers.
Without a global load balancer, your options are: one server in one region (everyone connects to it regardless of distance), or separate IP addresses per region (users and DNS must know which to use). A global load balancer removes both problems. One IP, one DNS name, every user served close to home.
How it works
Here is what happens step by step when a user makes a request through a global load balancer:
- DNS lookup. The user’s browser resolves your domain. Your DNS record points to a global static IP address reserved in GCP.
- Anycast routing. That IP is announced from Google edge nodes worldwide. Internet routers direct the TCP connection to the nearest edge node, which may be in a completely different city from your backends.
- TLS termination at the edge. The TLS handshake completes at the edge node. The user gets a fast, nearby TLS negotiation rather than waiting for a full round trip to your backend region.
- HTTP inspection. The load balancer reads the HTTP request — host header, URL path, any custom headers — and passes it through the URL map to find the right backend service.
- Backend selection. The load balancer checks health and capacity across all configured regions. It prefers the closest healthy region but routes to a further region if the nearest is unhealthy or at capacity.
- Request forwarded. The request travels Google’s internal private network to the selected backend. The backend receives the request from a GCP proxy IP range, with the original client IP in the
X-Forwarded-Forheader. - Response returns. The response travels back through the edge node to the user. If Cloud CDN is enabled and the response is cacheable, it is stored at the edge for future requests.
Steps 1 to 3 happen extremely close to the user. Steps 4 to 6 happen on Google’s private backbone. The user never touches the open internet after the initial edge connection — which is why global load balancing feels fast even when your backends are on the other side of the planet.
Key components of a global load balancer
A global external Application Load Balancer is built from several resources that chain together. Understanding each one makes the CLI commands and architecture diagrams easier to read. For a step-by-step walkthrough of the full setup, see the HTTP Load Balancer Setup guide.
Global static IP address
The Anycast entry point. Created with the —global flag — regional IP addresses will not work with a global load balancer and produce an error when you try to attach them to a global forwarding rule. Once reserved, this IP address is stable and can persist even if you rebuild your backends.
# Reserve a global static IP address
gcloud compute addresses create my-global-ip \
--globalForwarding rule
Binds the global IP to a target proxy and a port (typically 443 for HTTPS). The forwarding rule defines what traffic GCP accepts and hands to the load balancer. Must also be created with —global.
Target proxy
Processes the protocol. A target HTTPS proxy terminates TLS and attaches an SSL certificate. It then passes the decrypted request to the URL map. For certificate setup including Google-managed certificates, see SSL Certificates in GCP.
URL map
Routes requests to different backend services based on host and path. A request to /api/ can go to your API server backend service. A request to /static/ can go to a backend bucket pointing at Cloud Storage. One URL map handles all routing logic for the load balancer.
Backend service
Groups the actual compute targets — instance groups, zonal NEGs (for Compute Engine or GKE), serverless NEGs (for Cloud Run), or internet NEGs. Each backend service has its own health check and balancing mode. A single backend service can span multiple regions by adding backend groups from different zones.
Backend buckets
Points to a Cloud Storage bucket rather than compute backends. Use this to serve static files — images, CSS, JavaScript — directly from storage without routing through VMs. Combining backend buckets with Cloud CDN is one of the most cost-effective patterns for static asset delivery.
Health checks
The load balancer continuously probes backends. Only healthy backends receive traffic. Health checks run at the GCP infrastructure level, independently per region. If all backends in one region fail their health checks, the load balancer stops sending traffic there and routes to other healthy regions.
Cloud CDN (optional)
Enabled per backend service. When active, cacheable responses are stored at the edge node closest to the user. Subsequent requests for that same content are served from the edge without touching your backend. CDN caching is controlled by HTTP Cache-Control headers in your backend’s responses.
When to use a global load balancer
A global load balancer is the right choice when:
- You serve users in multiple countries. Anycast routing reduces latency by routing each user to the nearest edge node. Users in Asia, Europe, and the Americas all get a fast connection without needing separate regional infrastructure in each place.
- You need automatic cross-region failover. If your backends in one GCP region go down, traffic automatically reroutes to another region in seconds — no DNS changes, no manual intervention. This requires backends in at least two regions. For a broader look at multi-region patterns, see Multi-Region Architectures in GCP.
- You want one entry point for static and dynamic content. A URL map lets you serve static files from Cloud Storage via CDN and dynamic requests from your app servers, all from the same IP and domain.
- You need TLS termination close to the user. TLS handshakes complete at the edge, not at your backend region. This makes a visible difference for users geographically far from your servers.
- You need HTTP/2 or HTTP/3 support. Both are supported on the global Application Load Balancer and improve performance especially on mobile and unreliable connections.
A global load balancer may be overkill if all your users are in one country or region. A regional Application Load Balancer is simpler and cheaper in that case. Similarly, for services that never face the public internet, use an internal load balancer instead. For a side-by-side comparison of all GCP external load balancer options, see External Load Balancers in GCP.
GCP global load balancing vs regional load balancing
Both accept HTTPS traffic from the internet, but they behave quite differently once traffic arrives:
| Global external ALB | Regional external ALB | |
|---|---|---|
| IP type | Global Anycast IP | Regional IP |
| Where traffic enters | Nearest Google edge node to the user | The single configured GCP region |
| Latency for distant users | Lower — edge connection is geographically closer | Higher — user connects directly to the region |
| Cross-region failover | Automatic, health-check driven | Not supported |
| Cloud CDN | Yes | Yes (regional caches) |
| HTTP/2 and QUIC | Yes | Yes |
| Best for | Multi-country apps, resilience requirements | Users concentrated in one region |
If you are unsure which to choose, the global Application Load Balancer is generally the right default for internet-facing applications. The complexity overhead is small, and you can add CDN or extra regions later without rearchitecting.
Global load balancing vs DNS-based failover
DNS-based failover is a common alternative pattern: when a primary endpoint fails, you update a DNS record to point to a secondary. Tools like Cloud DNS health checks or third-party DNS providers can automate this update. It sounds simple, but it has a real problem in practice.
Imagine changing the address on your business card. The old cards are still in people’s wallets, so they keep showing up at your old location until they discard the card. DNS TTL works the same way. Even if you update the record, millions of DNS resolvers worldwide have cached the old answer. They will not check for a new one until the TTL expires — and some ignore TTL minimums entirely.
If your DNS TTL is 60 seconds, failover takes at least 60 seconds for most users. During that window, users receive the old IP and keep connecting to a failing endpoint. For a 300-second TTL, that is five minutes of broken requests during a live incident.
GCP global load balancing bypasses this entirely. Failover happens at the load balancer layer, not at DNS. The IP address never changes. When a backend goes unhealthy, the load balancer stops routing to it within seconds. There is no TTL to wait out, no stale DNS record to clear. See Cloud DNS Setup for how DNS integrates into a broader GCP architecture.
DNS-based failover is still reasonable for routing between entirely separate services or cloud providers. But for failover between GCP regions running the same application, global load balancing is faster and operationally simpler.
Cross-region failover
Cross-region failover is automatic when you have backends in more than one region and health checks detect that one region is degraded.
For failover to work, you need:
- Backends in at least two GCP regions, added to the same backend service
- Health checks configured and actively probing each backend
- Sufficient capacity in non-primary regions to absorb redirected traffic
The load balancer uses backend capacity as the primary routing signal. Each backend group has a capacity threshold — typically CPU utilisation or requests per second. When the nearest region’s backends are healthy and below that threshold, they handle all nearby traffic. When they become unhealthy or hit capacity, the load balancer automatically routes overflow to the next nearest healthy region.
This is different from a simple active-passive setup. Failover can happen gradually: if one region degrades partially, some traffic moves while the rest stays close. You do not need a total region failure to trigger rerouting. For a broader look at availability patterns, see Designing Highly Available Systems in GCP.
# Reserve a global static IP address
gcloud compute addresses create global-lb-ip \
--global
# Confirm the IP was assigned
gcloud compute addresses describe global-lb-ip \
--global \
--format="get(address)"
# Add a backend from us-central1
gcloud compute backend-services add-backend web-backend-service \
--instance-group=web-group-us \
--instance-group-zone=us-central1-a \
--balancing-mode=UTILIZATION \
--max-utilization=0.8 \
--global
# Add a backend from europe-west1 for failover and proximity routing
gcloud compute backend-services add-backend web-backend-service \
--instance-group=web-group-eu \
--instance-group-zone=europe-west1-b \
--balancing-mode=UTILIZATION \
--max-utilization=0.8 \
--globalWith —balancing-mode=UTILIZATION and —max-utilization=0.8, the load balancer prefers the closest region but spills to the next nearest when any backend group reaches 80% CPU utilisation.
If you configure backends in only one region, there is nowhere to fail over to. The load balancer will keep sending traffic to unhealthy backends rather than rerouting, because no healthy alternative exists. Always test failover deliberately — mark a backend unhealthy and confirm the routing behaviour before relying on it in production.
Cloud CDN integration
Cloud CDN caches HTTP responses at Google’s edge nodes. When a user requests content already cached at their nearest edge, the response is served without the request reaching your backend at all. This reduces latency, reduces backend egress costs, and absorbs traffic spikes more gracefully.
What gets cached
Cloud CDN caches responses with a valid HTTP Cache-Control or Expires header that signals publicly cacheable content. Static assets — images, fonts, CSS, JavaScript bundles — are good candidates. The CACHE_ALL_STATIC cache mode automatically caches common static content types even when the backend does not set explicit cache headers.
What does not get cached
Responses with Cache-Control: private or Cache-Control: no-store are not cached. API responses that vary per user, authenticated content, and responses with Set-Cookie headers should not be cached. Dynamic responses should explicitly set appropriate cache headers to prevent CDN from accidentally storing user-specific data.
Cache invalidation
When you deploy new static assets, cached versions at edge nodes keep serving until they expire. To force an update, invalidate specific paths via the GCP Console or with gcloud compute url-maps invalidate-cdn-cache. For high-traffic deployments, plan invalidations carefully: a full purge causes a brief spike in backend requests as all edge nodes simultaneously miss cache on the next request.
# Enable Cloud CDN on a backend service
gcloud compute backend-services update my-backend-service \
--enable-cdn \
--global
# Set the cache mode
gcloud compute backend-services update my-backend-service \
--cache-mode=CACHE_ALL_STATIC \
--globalCDN cache hits are billed at a lower egress rate than cache misses. For applications with significant static content, CDN typically reduces both latency and cost — but only if your cache headers are correct. Inspect your actual response headers before concluding that CDN is or is not working as expected.
TLS, HTTP/2, and HTTP/3 at the edge
The global Application Load Balancer terminates TLS at the edge node, not at your backend VMs. TLS requires multiple round trips to complete the handshake. When that happens at an edge node a few milliseconds from the user — rather than at a backend region hundreds of milliseconds away — the overhead is dramatically smaller.
HTTP/2 is supported between clients and the load balancer. It multiplexes multiple requests over a single TCP connection, removing the head-of-line blocking that slows pages with many assets under HTTP/1.1.
QUIC (the protocol underlying HTTP/3) is also supported, running over UDP. QUIC is specifically designed to handle packet loss and network transitions better than TCP — which is why it makes a meaningful difference on mobile networks. Users moving between Wi-Fi and cellular see fewer connection resets with QUIC.
For certificate setup, including Google-managed certificates that auto-renew without any manual intervention, see SSL Certificates in GCP.
Common mistakes
- Using a regional IP address. Global load balancers require a global IP address. If you create an address without the
—globalflag, it will be regional and the forwarding rule creation will fail. Confirm the address type withgcloud compute addresses listbefore proceeding. - Expecting failover with backends in only one region. Without backends in multiple regions, there is no destination for failover. If your only region has a problem, the load balancer has nowhere to reroute traffic. Add backend groups in at least two regions if cross-region resilience is a requirement.
- Assuming CDN caches everything automatically. Cloud CDN only caches what HTTP cache headers say is cacheable. If your backend returns no cache headers, or headers that say
no-cacheorprivate, nothing gets cached. Always check your actual response headers first. - Under-provisioning failover regions. When a primary region fails, the full traffic load shifts to other regions. If those regions are sized only for normal traffic, you replace one outage with another. Every region that can receive overflow must be sized to handle it.
- Health check firewall rules missing. Health checks probe backends from specific GCP IP ranges. If firewall rules do not allow traffic from those ranges, backends appear unhealthy even when they are running fine. See troubleshooting network issues if backends show as unhealthy unexpectedly.
- Wrong CLI flag syntax. Flags like
—global,—balancing-mode, and—max-utilizationrequire standard double hyphens. Typographic dashes or single hyphens cause parse errors. Also note that the correct value isUTILIZATION, not the British spellingUTILISATION, which the GCP CLI will reject.
Example architecture
Here is a realistic three-region setup:
- One global Anycast IP, pointed to by a single DNS A record
- A target HTTPS proxy with a Google-managed SSL certificate that auto-renews
A URL map with two routing rules:
/static/*routed to a backend bucket on Cloud Storage, with CDN enabled and a 24-hour TTL- All other paths routed to a backend service with instance groups in
us-central1,europe-west1, andasia-southeast1
- Health checks probing each instance group every five seconds
- Capacity-based routing: each region handles local users up to 80% utilisation, spilling to adjacent regions beyond that
Users in Europe connect to a London or Frankfurt edge node. TLS terminates there. Their requests hit backends in europe-west1 unless that region degrades, in which case traffic shifts automatically to us-central1 or asia-southeast1. Static assets served from CDN never reach the backend at all. For a detailed look at how to structure multi-region backends and data layers together, see Multi-Region Architectures in GCP.
Do not assume this architecture is right for every application. A three-region setup roughly triples your infrastructure cost and significantly increases operational complexity — especially around databases, which must replicate across regions. Validate your latency and availability requirements first before committing to multi-region. Most teams are better served by multi-zone within a single region until they have a concrete reason to go further.
Summary
- GCP global load balancing uses Anycast — one IP address announced from edge nodes worldwide
- Each user connects to the nearest Google edge node, reducing TLS latency and improving connection speed
- The URL map routes requests to different backend services or buckets based on host and path
- Multi-region backends enable automatic cross-region failover in seconds, with no DNS changes
- Cloud CDN integrates directly with the global Application Load Balancer to cache responses at the edge
- TLS terminates at the edge; HTTP/2 and QUIC (HTTP/3) are both supported
- Always use a global IP address and
—globalflags — regional resources will not work - Size all failover regions to handle full traffic load, not just the primary region
Frequently asked questions
What is Anycast in GCP global load balancing?
Anycast means a single IP address is announced from multiple locations on Google's network simultaneously. When a user connects to your app, internet routing sends them to the nearest Google edge location. GCP then forwards the request over Google's private network to a healthy backend in the best available region. You have one IP, but every user gets the closest entry point.
Do I need backends in multiple regions for global load balancing to help?
Not necessarily. Even with backends in only one region, a global load balancer reduces latency for distant users by completing TLS at the nearest edge node. The real multi-region benefit is resilience: if your backends in one region go unhealthy, traffic automatically moves to another region with no DNS change required.
Is GCP global load balancing the same as DNS failover?
No. DNS failover works by changing what IP a domain name resolves to, which takes time because DNS responses are cached by resolvers worldwide. GCP global load balancing uses Anycast and health checks at the load balancer layer, so failover happens in seconds without any DNS changes.
Does global load balancing work with Cloud CDN?
Yes. Cloud CDN integrates directly with the global external Application Load Balancer. When CDN is enabled on a backend service, cacheable static content is served from the nearest edge node without reaching your backends. CDN is configured per backend service, not for the entire load balancer.
Can Cloud Run or GKE sit behind a global load balancer?
Yes. Cloud Run connects through serverless NEGs and GKE connects through container-native zonal NEGs. Both work with the same URL map, Cloud CDN, and Anycast routing as any other backend type.
Does global load balancing reduce latency even with one backend region?
Yes. TLS terminates at the nearest edge node, which is much closer to the user than your backend region. The TLS handshake completes quickly, and the request then travels Google's fast private fibre network to your backend. Users far from your backend region still see a noticeably faster connection setup.