Troubleshooting GCP Network Issues: DNS, Firewalls, Routes, Cloud Run, and Load Balancers

When a connection fails in GCP, there is almost always a small set of root causes: a DNS issue, a missing route, a firewall rule, a missing NAT gateway, or something at the application layer. GCP gives you precise tools for each layer. Work through them in order rather than guessing, and most problems resolve quickly.

This page covers VM-to-VM connectivity, internet access, Google API access, Cloud Run connectivity to private IPs, load balancer health check failures, DNS problems, and VPC peering issues — with practical commands and a consistent methodology throughout.

Simple explanation

Most GCP network problems come from four places: the name did not resolve (DNS), there is no path from A to B (routing), there is a path but something is blocking it (firewall rules), or the destination is reachable but nothing is answering (application).

The key is to test one layer at a time. If you change the firewall rule, the route, and the application config all at once, you will not know which one fixed it. You may also leave something broken that got accidentally masked by another change.

GCP gives you different tools for different layers. Connectivity Tests answer routing and firewall questions without SSH access. Firewall logs show which rules are matching traffic. VPC Flow Logs show real traffic patterns. nc, curl, and nslookup from inside a VM confirm actual connectivity at the application layer.

Analogy

Think of a network connection like sending a courier package. DNS is looking up the delivery address. Routing is whether there is a road to that address. Firewall rules are the security desk at the building entrance. The application is whether anyone is actually at the desk to sign for the package. If the package never arrives, you check the address first, then the road, then the security desk. You do not start by assuming nobody is home.

When to use this

This guide is useful when:

  • A VM cannot reach another VM in the same VPC or a peered VPC
  • A VM has no external IP and cannot reach the internet or pull package updates
  • A VM with no external IP cannot reach Google APIs such as Cloud Storage or Pub/Sub
  • A load balancer backend shows unhealthy even though the application is running
  • A Cloud Run service times out when trying to connect to a private IP
  • DNS resolves correctly in one network but not another
  • Traffic between peered VPCs is not flowing as expected
  • A Shared VPC service project resource cannot reach the host network

How network troubleshooting works in GCP

Network debugging works best when you follow a consistent sequence. Skip a step and you risk fixing the wrong layer.

  1. Start with the symptom. Timeout, connection refused, DNS failure, and HTTP 502 each point to different layers. Identify which type of failure you have before doing anything else.
  2. Resolve the name. If you are using a hostname, confirm it resolves to the expected IP from inside the relevant network. A wrong IP at this stage invalidates everything below.
  3. Verify the path exists. Use Connectivity Tests or check the route table directly. A connection cannot succeed if there is no route from the source to the destination.
  4. Verify firewall evaluation. Even with a valid route, a firewall rule can drop the packet silently. Check ingress rules at the destination and egress rules at the source.
  5. Verify NAT and access settings where relevant. For VMs without external IPs, check Cloud NAT for internet access and Private Google Access for Google API access. For Cloud Run, check the Serverless VPC Access connector.
  6. Verify the application is healthy. Once you know the network path works, confirm the application is running, listening on the right port, and returning the expected response.
  7. Use logs to confirm. Once you know which layer is failing, logs tell you the exact packet, rule, and timestamp. Do not start with logs — they are most useful after you know where to look.
Good habit

Before you change anything, write down what the current behaviour is: what command you ran, what the output was, and what you expected to see. This keeps you honest about whether your fix actually worked, and it is invaluable when asking someone else for help.

Troubleshooting checklist by symptom

Use this as a quick reference when a connection fails. Match the symptom to the likely layers to inspect first.

SymptomMost likely causeWhere to start
Connection hangs (timeout)Firewall drop, missing route, missing NATConnectivity Tests, route table, firewall rules
Connection refused immediatelyNothing listening on that portApplication layer: is the service running? Is it bound to the right address?
DNS lookup failsWrong private zone association, zone not linked to VPCnslookup from inside VM, check private zone network bindings
Load balancer backend unhealthyMissing health check firewall rule or wrong health check configFirewall rule for 35.191.0.0/16 and 130.211.0.0/22; health check port and path
VM cannot reach internetNo external IP, no Cloud NAT, or missing default routeCheck NAT config, check routes, check egress firewall
VM cannot reach Google APIsPrivate Google Access not enabledCheck subnet setting; enable if missing
Cloud Run times out to private IPNo Serverless VPC Access connectorCheck connector config, region match, VPC attachment
Peered VPC traffic not flowingCIDR overlap, route not exported, firewall blockingCheck peering state, route export settings, firewall rules on destination VPC

Network problem or application problem?

One of the most common debugging mistakes is spending time on firewall rules when the problem is actually the application. These failure types are easy to tell apart if you know the signals.

Network blocked (firewall or routing)

The connection hangs with no response. nc -zv does not return. curl eventually times out. Connectivity Tests show the path as blocked. The packet never reaches the destination host. Fix: check the route table and firewall rules.

Application not listening

The connection is refused immediately. nc -zv returns “connection refused” right away. Connectivity Tests show the path as reachable. The packet reached the host but no process answered on that port. Fix: check whether the service is running and bound to 0.0.0.0 rather than 127.0.0.1.

DNS mismatch

The hostname resolves to the wrong or an unexpected IP. nslookup returns no records, an error, or a different IP than expected. The connection may succeed if you replace the hostname with the correct IP directly. Fix: check private zone association and DNS records.

If Connectivity Tests report the path as reachable but connections are still failing, the issue is almost certainly the application. Run ss -tlnp on the destination host to see what is actually listening and on which address.

Connectivity Tests

Network Intelligence Center’s Connectivity Tests are the fastest way to answer routing and firewall questions. They simulate a packet from source to destination by analysing your network configuration — they do not send real traffic — and return a step-by-step trace showing exactly which firewall rule or route allowed or blocked the connection.

# Test TCP connectivity between two VM instances
gcloud network-management connectivity-tests create test-vm-to-vm \
  --source-instance=projects/my-project/zones/europe-west2-a/instances/source-vm \
  --destination-instance=projects/my-project/zones/europe-west2-b/instances/dest-vm \
  --protocol=TCP \
  --destination-port=8080

# Test connectivity from a VM to a Cloud SQL private IP
gcloud network-management connectivity-tests create test-vm-to-sql \
  --source-instance=projects/my-project/zones/europe-west2-a/instances/my-vm \
  --destination-ip=10.100.0.3 \
  --protocol=TCP \
  --destination-port=5432

# View the test results
gcloud network-management connectivity-tests describe test-vm-to-vm \
  --format='yaml(reachabilityDetails)'

# List all connectivity tests
gcloud network-management connectivity-tests list

The result includes a step-by-step trace of the simulated path. Look for a step with DROPPED or FIREWALL_BLOCKED. That is the exact resource causing the failure. The output names the specific firewall rule or route involved, so you know exactly what to change.

Note

Connectivity Tests are configuration-based. They analyse your firewall rules and routes as currently defined. If the application is the issue rather than the network, or if a rule has not yet propagated, Connectivity Tests will show the path as reachable even though traffic is failing. Use them to rule out the network layer, not to rule out everything.

Testing from inside a VM

Connectivity Tests analyse configuration. When you need to confirm actual packet flow or test the application layer, SSH into the source VM and test directly.

# SSH to a VM
gcloud compute ssh my-vm --zone=europe-west2-a

# Test ICMP reachability
ping -c 4 10.0.0.5

# Test TCP connectivity to a specific port
# Hangs = silently dropped (firewall rule or missing route)
# "Connection refused" immediately = port is reachable, nothing is listening
nc -zv 10.0.0.5 8080

# Test HTTP health endpoint
curl -v http://10.0.0.5:8080/health

# Test DNS resolution
nslookup my-internal-service.europe-west2-a.c.my-project.internal
dig my-service.internal @169.254.169.254

# Trace the path (shows where packets stop)
traceroute 10.0.0.5

# Check what is listening on a port locally
ss -tlnp | grep 8080
Reading the result

nc -zv is one of the most useful one-line diagnostic tools in networking. If it hangs silently, a firewall or missing route is dropping the packet. If it fails immediately with “connection refused”, the packet got there but nothing answered. Those two results point to completely different parts of the stack. Get comfortable reading the difference before you change anything.

Connectivity Tests vs firewall logs vs VPC Flow Logs

GCP gives you three distinct network diagnostic tools. They answer different questions and work best in combination.

ToolWhat it tells youWhat it cannot tell you
Connectivity TestsWhether a specific path is allowed or blocked by your current config; which rule or route is responsibleWhether traffic is actually flowing; application-layer issues
Firewall rule loggingWhich connections matched a specific rule (allow or deny); source IP, destination IP, port, protocolTraffic not matched by any rule (implicit deny); routing problems
VPC Flow LogsActual traffic volumes, source/destination pairs, and byte counts from real network flowsWhich firewall rule matched; application-level errors

In practice: use Connectivity Tests first to check configuration. If the path looks clean, enable firewall logging on the relevant rules to see which connections are being matched. Use VPC Flow Logs for traffic pattern analysis, to understand what is connecting to what or to investigate unexpected traffic volumes.

# Enable flow logs on a subnet
gcloud compute networks subnets update my-subnet \
  --region=europe-west2 \
  --enable-flow-logs

# Enable logging on an existing firewall rule
gcloud compute firewall-rules update my-deny-rule \
  --enable-logging

# Query firewall deny logs in Cloud Logging (paste into Logs Explorer filter)
# resource.type="gce_subnetwork" AND jsonPayload.rule_details.action="DENY"
Logging the implicit deny

GCP’s implicit deny-all ingress rule cannot be logged directly. To capture traffic blocked by it, create an explicit low-priority DENY rule with logging enabled. Traffic not matched by any higher-priority allow rule will hit this rule and generate a log entry.

Common problems and fixes

VM cannot reach the internet

Check in this order:

  • Does the VM have an external IP? Check with gcloud compute instances describe vm-name —format=‘value(networkInterfaces[0].accessConfigs)‘
  • If no external IP, is there a Cloud NAT gateway configured for this subnet’s region?
  • Is there a default route 0.0.0.0/0 via default-internet-gateway in the VPC?
  • Is there an egress firewall rule explicitly blocking outbound traffic? GCP allows all egress by default, so a block must have been added deliberately.
# Check routes in the VPC
gcloud compute routes list --filter="network=my-vpc"

# Check Cloud NAT configuration for a region
gcloud compute routers list
gcloud compute routers nats list --router=my-router --region=europe-west2

VM cannot reach Google APIs

A VM with no external IP cannot reach Google APIs (Cloud Storage, BigQuery, Pub/Sub, and others) unless Private Google Access is enabled on its subnet. It is a subnet-level setting and is not on by default.

# Check if Private Google Access is enabled on the subnet
gcloud compute networks subnets describe my-subnet \
  --region=europe-west2 \
  --format='value(privateIpGoogleAccess)'

# Enable it if not already on
gcloud compute networks subnets update my-subnet \
  --region=europe-west2 \
  --enable-private-ip-google-access

Cloud Run cannot reach a private IP

Cloud Run services run outside your VPC by default. Without a Serverless VPC Access connector, Cloud Run cannot route traffic to any private IP range. Cloud Storage buckets and external APIs still work because those go over the internet. But anything on a 10.x.x.x address times out.

Easy to miss

A missing Serverless VPC Access connector produces a timeout that looks exactly like a firewall block. Connectivity Tests will not catch it because Connectivity Tests model VM-to-VM paths, not Cloud Run egress paths. If your Cloud Run service can reach the internet but times out on private IPs, the connector is the first thing to check.

# Check if a VPC Access connector is attached to a Cloud Run service
gcloud run services describe my-service \
  --region=europe-west2 \
  --format='value(spec.template.metadata.annotations)'

# List available connectors
gcloud compute networks vpc-access connectors list \
  --region=europe-west2

# Attach a connector to a Cloud Run service
gcloud run services update my-service \
  --region=europe-west2 \
  --vpc-connector=my-connector \
  --vpc-egress=private-ranges-only

Load balancer backend shows unhealthy

GCP health checkers probe backends from two fixed source ranges: 35.191.0.0/16 and 130.211.0.0/22. If your VPC does not have a firewall rule allowing ingress from those ranges on the health check port, every backend will show unhealthy regardless of whether the application is running and healthy.

Most common load balancer mistake

GCP does not create the health check firewall rule automatically. You must create it yourself. Without it, your backends will always show unhealthy and the load balancer will never route traffic to them, even if the application is running perfectly.

# Check whether the health check firewall rule exists
gcloud compute firewall-rules list \
  --filter='sourceRanges:35.191.0.0/16 OR sourceRanges:130.211.0.0/22'

# Create the rule if missing (adjust port and target tags to match your setup)
gcloud compute firewall-rules create allow-health-checks \
  --network=my-vpc \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:8080 \
  --source-ranges=35.191.0.0/16,130.211.0.0/22 \
  --target-tags=web-servers

# Check health check configuration
gcloud compute health-checks describe my-health-check

If the firewall rule already exists and backends are still unhealthy, verify the health check port and path match what the application actually exposes, and that the application returns a 2xx HTTP status on that path.

DNS resolution failing inside a VPC

VMs in GCP use the metadata server at 169.254.169.254 as their DNS resolver. If a hostname fails to resolve, the most common causes are: the hostname is in a private DNS zone that is not associated with the correct VPC, or the zone association is missing entirely.

# Test DNS resolution from a VM (using GCP's internal resolver)
nslookup my-service.internal 169.254.169.254

# List private DNS zones and check which networks they are associated with
gcloud dns managed-zones list --filter='visibility=private'
gcloud dns managed-zones describe my-private-zone \
  --format='yaml(privateVisibilityConfig)'

If the zone is correct but resolution still fails, confirm the record exists in the zone with gcloud dns record-sets list —zone=my-private-zone.

VPC peering connectivity issues

VPC peering is non-transitive. If VPC A is peered with VPC B, and VPC B is peered with VPC C, resources in A cannot reach resources in C. Every pair of VPCs that needs to communicate must establish its own direct peering connection.

Peering also requires that both sides export their routes to each other. If route export is disabled on either side, traffic will not flow even though the peering is active.

Non-transitive routing in plain terms

VPC peering is like a direct phone line between two offices. If Office A has a line to Office B, and Office B has a line to Office C, Office A still cannot call Office C. Each pair needs its own direct line. This surprises a lot of people coming from on-premises networking, where routing often is transitive through intermediate hops.

# List VPC peering connections and their state
gcloud compute networks peerings list --network=my-vpc

# Check route export settings on a peering connection
gcloud compute networks peerings describe my-peering \
  --network=my-vpc \
  --format='yaml(exportCustomRoutes,importCustomRoutes)'

Shared VPC connectivity issues

In a Shared VPC setup, firewall rules are managed in the host project and apply to resources in service projects that use the shared network. A common issue is a service project team creating a VM and not realising that firewall rule changes need to be made in the host project. They do not have permission to change those rules themselves.

Connectivity Tests work across Shared VPC boundaries and will correctly show which project’s firewall rule is causing a block.

Common mistakes

  1. Assuming every timeout is a firewall issue. Timeouts can come from missing routes, missing Cloud NAT, missing Serverless VPC Access connectors, or VPC peering misconfigurations. Run Connectivity Tests before touching firewall rules. The trace will tell you which layer is responsible.
  2. Forgetting the health check firewall rule. When a load balancer backend shows unhealthy, the most common cause is a missing rule allowing traffic from 35.191.0.0/16 and 130.211.0.0/22. GCP does not create this rule automatically.
  3. Testing from the wrong location. If a VM cannot reach another VM, testing from your laptop or a different network tells you nothing about internal routing. Always test from inside the source VM, or use Connectivity Tests with the correct source VM specified.
  4. Confusing “connection refused” with “blocked”. A refused connection means the network path works but nothing is listening. Do not chase firewall rules if the application is not running or is listening on the wrong address.
  5. Forgetting Private Google Access on private-only VMs. A VM with no external IP needs Private Google Access enabled on its subnet to reach any Google API. Without it, calls to Cloud Storage, Pub/Sub, and other APIs silently fail or time out.
  6. Forgetting a Serverless VPC Access connector for Cloud Run. Cloud Run cannot reach private IPs by default. If your service calls a Cloud SQL instance on a private IP, a Memorystore instance, or any VM, a connector is required. The symptom looks like a firewall block but the actual cause is a missing routing mechanism.
  7. Changing multiple things at once during debugging. If you update the firewall rule, restart the application, and enable Private Google Access all in one step, you will not know which change fixed the problem. Make one change, test, then move on.
  8. Not enabling flow logs or firewall logging before an incident. These logs are most useful during an outage. Enable them proactively on subnets and critical firewall rules. Enabling them after a problem starts means the evidence you need may already be gone.

Frequently asked questions

How do I troubleshoot connectivity between two VMs in GCP?

Start with Network Intelligence Center Connectivity Tests. Run `gcloud network-management connectivity-tests create` with a source and destination instance. The tool traces the network path through your configuration and shows exactly which firewall rule or route is responsible, with no SSH required. If the path looks clean, SSH into the source VM and test with `nc -zv <destination-ip> <port>`. A hang means the connection is silently dropped (firewall or missing route). An immediate refusal means the port is reachable but nothing is listening (application layer).

Why can a VM reach the internet but not Google APIs, or vice versa?

These are separate paths. Internet access requires either an external IP on the VM or a Cloud NAT gateway configured for the subnet. Google API access from a private VM requires Private Google Access to be enabled on the subnet. That flag routes traffic to Google APIs over Google internal paths without going through the internet. A VM can have Cloud NAT but no Private Google Access, or Private Google Access but no Cloud NAT. They are independent settings on the subnet.

Why does my Cloud Run service time out when calling a private IP?

Cloud Run runs outside your VPC by default. Without a Serverless VPC Access connector, it cannot reach any private IP. Requests to 10.x.x.x simply time out. Create a VPC Access connector in the same region as your Cloud Run service, attach it to your VPC, and update the service with `--vpc-connector`. Set `--vpc-egress=private-ranges-only` to route only private IP traffic through the connector, or `all-traffic` to route everything through it.

What does "connection refused" mean compared with a timeout?

They point to different problems. A timeout means the packet is being dropped, most often by a firewall rule, a missing route, or a NAT configuration issue. A connection refused means the packet reached the destination host but nothing is listening on that port. The network path is fine but the application layer is the problem. If `nc -zv <ip> <port>` hangs, check firewall rules and routing. If it returns "connection refused" immediately, check whether the application is running and bound to the right port and address.

Why is my load balancer backend unhealthy even though the application is running?

Almost always a missing firewall rule. GCP health checkers probe backends from source ranges 35.191.0.0/16 and 130.211.0.0/22. If no firewall rule allows ingress from those ranges to your backend VMs on the health check port, all backends appear unhealthy regardless of whether the application is working. Create an ingress allow rule for those ranges on the correct port. If the rule exists already, verify the health check port and path match what the application actually exposes, and that it returns a 2xx status.

Last verified: 24 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.