GCP Quotas and Limits Explained: Rate Quotas, Allocation Quotas, and Increase Requests

GCP quotas are limits on how much of a resource your project can use, either at once or within a given time window. Every new project starts with conservative defaults. At low usage they are invisible. At scale, they become one of the most common causes of failed deployments.

By the end of this page you will understand what rate and allocation quotas are, how to read a quota error message, how to check your current usage, and how to request an increase before it blocks you.

GCP quotas in simple terms

Think of GCP like a shared car park attached to a motorway. Your project gets a fixed number of parking spaces: once all the spaces are taken, no new car can enter until one leaves. That is an allocation quota. There is also a speed limit on the road into the car park: you can drive in continuously, but not faster than the limit allows. Every minute the counter resets and you can send the same volume of traffic again. That is a rate quota.

These limits exist for three reasons. They protect the shared infrastructure GCP runs on, so one project cannot starve other customers. They protect your own account from accidental large-scale bills. And they give Google time to plan capacity for unusually high usage patterns. See how GCP billing works for how quotas and billing controls work together as two independent layers of protection.

Here is what each key term means in plain English:

Quota: a limit on how many resources you can create or how many API calls you can make in a given period.
Rate quota: limits requests per minute or per second. Resets automatically after the window closes.
Allocation quota: limits how many resources exist at once (VMs, static IPs, vCPUs). Does not reset. You must delete resources or request an increase.
Quota metric: the specific resource being measured, such as CPUS, IN_USE_ADDRESSES, or DISKS_TOTAL_GB.
Quota increase request: a formal request to raise a specific limit, submitted through the Cloud Console.

Never assume this happens automatically

GCP does not raise quotas as your usage grows. Quotas stay at the default, or the last approved level, until you explicitly request an increase. This catches a lot of teams off guard right before a launch.

How GCP quotas and limits work

Quotas in GCP operate at two levels: project-level and region-level. Some limits apply across all regions within your project. Others are enforced separately per region. The regional dimension is the most common source of confusion.

Quotas do not travel between regions

A high vCPU quota in us-central1 does not carry over to europe-west1. Each region starts with its own default limits. If you plan to deploy across multiple regions, or if you are expanding from your primary region to a new one, you need to request quota increases in each region separately before you start.

New projects always start with conservative defaults. GCP has no way of knowing whether you are running a quick experiment or planning a production deployment at scale. Think of it like a new bank account with a starter overdraft limit: the bank begins cautiously and expects you to request more capacity once you have a track record. A new project in us-central1 might have a default CPU quota in the range of 8 to 24 vCPUs. That is enough for a few VMs, but it will block you quickly as you scale. It is not a permanent ceiling. It is just where you start.

Quotas apply at the point of resource creation. If you try to create a Compute Engine VM that would push your total vCPU count above the regional limit, the creation fails immediately. Existing resources continue running without interruption. Only the new request is blocked.

For autoscaling instance groups, this has a specific consequence worth understanding. An autoscaler reacts to load by attempting to create new VMs, sometimes many at once during a traffic spike. If the resulting CPU or instance count would exceed your quota, the autoscaler will fail to create some of those VMs. From the user’s perspective, the service simply does not scale fast enough. There is no obvious error on the surface.

Rate quotas vs allocation quotas

This distinction matters more than almost anything else on this page. The fix for each type is completely different, and mixing them up wastes time.

Rate quotas

Rate quotas limit how many API requests you can make per minute or per second. They reset automatically after the time window closes. Think of it as the speed limit on the road into the car park: you can keep sending traffic indefinitely, just not faster than the limit allows. A script that calls gcloud compute instances list in a tight loop will eventually hit the Compute Engine read requests rate quota. The correct response is to slow down your request rate and add exponential backoff with jitter, not to file an increase request.

Allocation quotas

Allocation quotas limit how many of a resource can exist simultaneously in your project or region. These do not reset. The car park analogy applies directly: once all the spaces are taken, nothing new can enter until a space is freed. If you have a 24 vCPU allocation quota and your VMs are using all 24, you cannot create another VM until you either delete one or get the quota increased.

Common allocation quotas:

vCPUs per region (the one you will hit first as you scale VMs)
VM instances per project
In-use external static IP addresses
Persistent disk capacity (total GB) per region
VPC networks per project
Cloud Storage buckets per project

Side-by-side comparison

	Rate quota	Allocation quota
What it limits	API requests per minute or per second	Resources that exist simultaneously
Resets automatically?	Yes, after the time window	No. You must delete resources or get an increase
Common examples	Compute Engine read requests per minute, BigQuery API calls per minute	vCPUs per region, static IPs, VM instances, persistent disk total GB
Typical fix	Add exponential backoff; slow down request rate	Delete unused resources or submit an increase request
Increase request needed?	Rarely, only for genuinely high-volume use cases	Yes, once existing resources consume the limit

Reading quota error messages

Quota errors have a consistent structure. Once you know how to read them, you can tell immediately whether you need to fix your code or request an increase.

# Allocation quota error: hitting a CPU limit when creating VMs
# ERROR: (gcloud.compute.instances.create) Could not fetch resource:
#  - Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.
#
# What it tells you:
#   - Metric: CPUS (allocation quota)
#   - Current limit: 24 vCPUs in us-central1
#   - Fix: go to IAM & Admin > Quotas, filter by 'CPUS' (us-central1), request an increase
#   - Or: delete VMs you no longer need to free up the allocation

# Rate quota error: making too many API calls too quickly
# ERROR: RESOURCE_EXHAUSTED:
#  Quota exceeded for quota metric 'compute.googleapis.com/read_requests_per_minute'
#
# What it tells you:
#   - Metric: read_requests_per_minute (rate quota)
#   - This resets automatically. You do not need to request an increase.
#   - Fix: add exponential backoff and jitter to your request loop

The three things to extract from any quota error:

Quota metric name: for example, CPUS, IN_USE_ADDRESSES, or read_requests_per_minute. This is the exact resource being limited.
Current limit and region: tells you the cap and whether it is regional or global.
Rate or allocation: rate quotas include time-window language (per minute, per second). Allocation quotas name a count.

Check your code before filing a request

For rate quota errors, start with your client code. Is it sending requests faster than necessary? Adding exponential backoff with jitter handles temporary rate exhaustion gracefully, and it is almost always the right fix. A higher rate quota ceiling will not fix code that hammers an API without any pacing.

How to check your current quota usage

Before a large deployment, check what your quota limits are in the target region. This is the step most teams skip, only to discover the problem partway through rollout.

Using the Cloud Console

Open the Cloud Console
Go to IAM & Admin > Quotas
Filter by service (e.g., Compute Engine) or search by metric name (e.g., CPUS)
Check the Current usage and Limit columns to see how close you are

The APIs & Services > Dashboard shows real-time API quota consumption per service. If you are consistently close to a rate limit, this is where you will see it. You can also set up Cloud Monitoring alerts on quota metrics to get notified before hitting the ceiling, not after.

Using gcloud

# Check compute quotas for a specific region (useful before large deployments)
gcloud compute regions describe us-central1 \
  --format="table(quotas.metric,quotas.limit,quotas.usage)"

# Check project-level compute quotas (applies across all regions)
gcloud compute project-info describe \
  --format="table(quotas.metric,quotas.limit,quotas.usage)"

The gcloud approach is useful for scripting pre-deployment checks or piping quota data into a monitoring pipeline. Combine it with Cloud Monitoring to track quota usage over time. See monitoring your first GCP project for how to use the APIs & Services dashboard to spot rate limit trends before they cause outages.

How to request a quota increase

Most allocation quota increases can be submitted directly through the Console. Approvals for modest increases are often automatic. Large increases require manual review, and approval is not guaranteed.

Go to IAM & Admin > Quotas
Filter to find the specific quota metric and region you need
Select the checkbox next to it and click Edit Quotas
Enter the new limit you need
Add a brief justification: explain the workload you are running and why you need the higher limit
Submit the request

Modest increases, such as doubling a CPU quota from 24 to 48 vCPUs, are often auto-approved within minutes. Large increases for hundreds of GPUs or very high-bandwidth quotas require manual review from Google and may take days. Approval depends on account history, billing activity, and the size of the request.

Do not wait until launch day

Large quota increase requests require manual review. If you submit a request for hundreds of GPUs or a large CPU allocation on the day of a launch, you will be waiting. Build quota review into your pre-launch checklist, days or weeks in advance. This is especially true when expanding to a new region, where you are starting from scratch at default limits.

If you are planning a large deployment and unsure what quotas to request, use the cloud cost estimator to model your resource requirements first. Knowing roughly how many vCPUs, IPs, and disk GB you need makes it easier to write a solid justification and request the right amount in a single submission.

When to actively check your quotas

At low scale you will rarely think about quotas. As workloads grow, checking them becomes part of routine deployment planning. Do it proactively in these situations:

Before deploying many VMs. If you are creating 10 or more VMs in a region, verify your CPU and instance quota first. Review your autoscaling instance group settings too, since the autoscaler can attempt to create many VMs at once during a traffic spike.
Before a launch or planned traffic spike. Quotas that look fine at current traffic levels may not hold under launch-day load. Request increases in advance, not after the first failure.
Before expanding to a new region. Each region has its own quota defaults. Quotas in your primary region do not apply to a secondary region you have never used. Request increases per region before you start deploying there.
Before enabling autoscaling. Autoscalers react to load, not to your quota headroom. An autoscaler hitting a quota limit during a real traffic event will silently fail to create some VMs. The service just will not scale fast enough.
Before running batch or migration jobs. Large batch jobs and data migrations often create many resources in parallel. Check disk quota, CPU quota, and any service-specific quotas before starting the job.
Before high-volume API automation. Any tool that calls GCP APIs in a loop, whether provisioning resources, polling status, or running reports, should be tested against the relevant rate quotas in a non-production project first.

Common mistakes

Checking quotas too late. Most quota problems are discovered partway through a large operation. Creating many VMs in a region where you have a low CPU quota fails partway through, leaving your deployment in an inconsistent state. Check your limits before any large-scale operation.
Confusing quota errors with API-not-enabled errors. Both return error messages that can look similar at first glance. A quota error means the API is enabled but the resource limit is exceeded. An API-not-enabled error means the service itself has not been activated for the project. See API not enabled errors for how to tell them apart.
Treating rate limits like allocation limits. Rate quota errors reset on their own. Submitting an increase request before trying exponential backoff is usually the wrong first move. Check your client code before filing anything.
Forgetting quotas are regional. A quota increase in one region does not affect another. If you expand to europe-west1 after building everything in us-central1, the new region starts at its own default limits. Request increases per region.
Not building retries and backoff into automation. Any script or service that calls GCP APIs at scale should have retry logic with exponential backoff. Rate quotas are designed to be handled this way. Code that crashes on a rate limit error instead of retrying gracefully is the problem more often than the quota ceiling itself.
Assuming quota increases are automatic. GCP does not increase quotas as your usage grows. You must proactively request them. GCP also does not warn you when you are approaching a limit. You have to watch that yourself via Cloud Monitoring. Set up alerts on quota metrics before you need them.
Waiting until launch day. Large quota increase requests require manual review. Submitting a request for hundreds of GPUs or a very large CPU allocation on the day of a launch means you will be waiting. Build quota review into your pre-launch checklist alongside billing budgets and alerts.

Quota errors vs similar-looking errors

Quota errors are commonly confused with two other error types in GCP. Here is how to tell them apart quickly:

Error type	What it means	Where to look	Fix
Quota exceeded	You have consumed the allowed resource count or request rate	IAM & Admin > Quotas	Delete resources, add backoff, or request an increase
API not enabled	The service API has not been activated for the project	APIs & Services > Library	Enable the API for the project
Permission denied	Your account or service account lacks the required IAM role	IAM & Admin > IAM	Grant the correct IAM role
Billing issue	The project has no active billing account, or billing is disabled	Billing > Account management	Link a billing account to the project

How to distinguish them by error code

Quota errors return RESOURCE_EXHAUSTED or contain “Quota exceeded.” API-not-enabled errors return SERVICE_DISABLED or include “has not been used in project” or “it is disabled.” See API not enabled errors for a full breakdown of both types and how to resolve each one.

Frequently asked questions

What are GCP quotas and limits?

GCP quotas are limits that Google Cloud places on resource usage per project and per region. They exist to protect shared infrastructure from runaway usage, prevent accidental large-scale resource creation, and ensure fair access across all customers. Every new GCP project starts with conservative default quotas that you can request to increase as your workloads grow.

What is the difference between a rate quota and an allocation quota in GCP?

Rate quotas limit how many API requests you can make per minute or per second. They reset automatically after the time window closes. Allocation quotas limit how many resources can exist at the same time, such as the number of VM instances or static IP addresses in a region. Allocation quotas do not reset. If you hit one, you must either delete existing resources or request an increase.

Are GCP quotas per project or per region?

Both. Some quotas are enforced per project across all regions. Others are enforced per project per region. vCPU quotas, for example, are regional: a high CPU quota in us-central1 does not carry over to europe-west1. If you plan to deploy workloads across multiple regions, you need to request quota increases in each region separately.

How do I request a GCP quota increase?

Go to IAM & Admin > Quotas in the Cloud Console. Filter to find the specific quota you need to increase, select it, and click Edit Quotas. Enter the new limit and provide a brief justification. Modest increases, such as doubling your CPU quota, are often auto-approved within minutes. Large increases for GPUs or high-bandwidth quotas may require manual review from Google and can take longer. Approval is not guaranteed.

What should I do when I get a quota exceeded error?

First, identify whether the error is a rate quota (resets automatically) or an allocation quota (requires an increase or resource deletion). For rate quota errors, add exponential backoff and retry logic rather than immediately requesting an increase. For allocation quota errors, check IAM & Admin > Quotas in the Console, find the specific metric, and either delete unused resources or submit an increase request. Do not wait until launch day to discover you have a quota problem.

Last verified: 21 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.