GCP Quotas and Limits Explained: Rate Quotas, Allocation Quotas, and Increase Requests
GCP quotas are limits on how much of a resource your project can use, either at once or within a given time window. Every new project starts with conservative defaults. At low usage they are invisible. At scale, they become one of the most common causes of failed deployments.
By the end of this page you will understand what rate and allocation quotas are, how to read a quota error message, how to check your current usage, and how to request an increase before it blocks you.
GCP quotas in simple terms
Think of GCP like a shared car park attached to a motorway. Your project gets a fixed number of parking spaces: once all the spaces are taken, no new car can enter until one leaves. That is an allocation quota. There is also a speed limit on the road into the car park: you can drive in continuously, but not faster than the limit allows. Every minute the counter resets and you can send the same volume of traffic again. That is a rate quota.
These limits exist for three reasons. They protect the shared infrastructure GCP runs on, so one project cannot starve other customers. They protect your own account from accidental large-scale bills. And they give Google time to plan capacity for unusually high usage patterns. See how GCP billing works for how quotas and billing controls work together as two independent layers of protection.
Here is what each key term means in plain English:
- Quota: a limit on how many resources you can create or how many API calls you can make in a given period.
- Rate quota: limits requests per minute or per second. Resets automatically after the window closes.
- Allocation quota: limits how many resources exist at once (VMs, static IPs, vCPUs). Does not reset. You must delete resources or request an increase.
- Quota metric: the specific resource being measured, such as
CPUS,IN_USE_ADDRESSES, orDISKS_TOTAL_GB. - Quota increase request: a formal request to raise a specific limit, submitted through the Cloud Console.
GCP does not raise quotas as your usage grows. Quotas stay at the default, or the last approved level, until you explicitly request an increase. This catches a lot of teams off guard right before a launch.
How GCP quotas and limits work
Quotas in GCP operate at two levels: project-level and region-level. Some limits apply across all regions within your project. Others are enforced separately per region. The regional dimension is the most common source of confusion.
A high vCPU quota in us-central1 does not carry over to
europe-west1. Each region starts with its own default limits.
If you plan to deploy across multiple regions, or if you are expanding from
your primary region to a new one, you need to request quota increases in
each region separately before you start.
New projects always start with conservative defaults. GCP has no way of knowing
whether you are running a quick experiment or planning a production deployment at
scale. Think of it like a new bank account with a starter overdraft limit: the bank
begins cautiously and expects you to request more capacity once you have a track record.
A new project in us-central1 might have a default CPU quota in the range
of 8 to 24 vCPUs. That is enough for a few VMs, but it will block you quickly as you
scale. It is not a permanent ceiling. It is just where you start.
Quotas apply at the point of resource creation. If you try to create a Compute Engine VM that would push your total vCPU count above the regional limit, the creation fails immediately. Existing resources continue running without interruption. Only the new request is blocked.
For autoscaling instance groups, this has a specific consequence worth understanding. An autoscaler reacts to load by attempting to create new VMs, sometimes many at once during a traffic spike. If the resulting CPU or instance count would exceed your quota, the autoscaler will fail to create some of those VMs. From the user’s perspective, the service simply does not scale fast enough. There is no obvious error on the surface.
Rate quotas vs allocation quotas
This distinction matters more than almost anything else on this page. The fix for each type is completely different, and mixing them up wastes time.
Rate quotas
Rate quotas limit how many API requests you can make per minute or per second.
They reset automatically after the time window closes. Think of it as the speed
limit on the road into the car park: you can keep sending traffic indefinitely,
just not faster than the limit allows. A script that calls
gcloud compute instances list in a tight loop will eventually hit the
Compute Engine read requests rate quota. The correct response is to slow down your
request rate and add exponential backoff with jitter, not to file an increase request.
Allocation quotas
Allocation quotas limit how many of a resource can exist simultaneously in your project or region. These do not reset. The car park analogy applies directly: once all the spaces are taken, nothing new can enter until a space is freed. If you have a 24 vCPU allocation quota and your VMs are using all 24, you cannot create another VM until you either delete one or get the quota increased.
Common allocation quotas:
- vCPUs per region (the one you will hit first as you scale VMs)
- VM instances per project
- In-use external static IP addresses
- Persistent disk capacity (total GB) per region
- VPC networks per project
- Cloud Storage buckets per project
Side-by-side comparison
| Rate quota | Allocation quota | |
|---|---|---|
| What it limits | API requests per minute or per second | Resources that exist simultaneously |
| Resets automatically? | Yes, after the time window | No. You must delete resources or get an increase |
| Common examples | Compute Engine read requests per minute, BigQuery API calls per minute | vCPUs per region, static IPs, VM instances, persistent disk total GB |
| Typical fix | Add exponential backoff; slow down request rate | Delete unused resources or submit an increase request |
| Increase request needed? | Rarely, only for genuinely high-volume use cases | Yes, once existing resources consume the limit |
Reading quota error messages
Quota errors have a consistent structure. Once you know how to read them, you can tell immediately whether you need to fix your code or request an increase.
# Allocation quota error: hitting a CPU limit when creating VMs
# ERROR: (gcloud.compute.instances.create) Could not fetch resource:
# - Quota 'CPUS' exceeded. Limit: 24.0 in region us-central1.
#
# What it tells you:
# - Metric: CPUS (allocation quota)
# - Current limit: 24 vCPUs in us-central1
# - Fix: go to IAM & Admin > Quotas, filter by 'CPUS' (us-central1), request an increase
# - Or: delete VMs you no longer need to free up the allocation
# Rate quota error: making too many API calls too quickly
# ERROR: RESOURCE_EXHAUSTED:
# Quota exceeded for quota metric 'compute.googleapis.com/read_requests_per_minute'
#
# What it tells you:
# - Metric: read_requests_per_minute (rate quota)
# - This resets automatically. You do not need to request an increase.
# - Fix: add exponential backoff and jitter to your request loopThe three things to extract from any quota error:
- Quota metric name: for example,
CPUS,IN_USE_ADDRESSES, orread_requests_per_minute. This is the exact resource being limited. - Current limit and region: tells you the cap and whether it is regional or global.
- Rate or allocation: rate quotas include time-window language (per minute, per second). Allocation quotas name a count.
For rate quota errors, start with your client code. Is it sending requests faster than necessary? Adding exponential backoff with jitter handles temporary rate exhaustion gracefully, and it is almost always the right fix. A higher rate quota ceiling will not fix code that hammers an API without any pacing.
How to check your current quota usage
Before a large deployment, check what your quota limits are in the target region. This is the step most teams skip, only to discover the problem partway through rollout.
Using the Cloud Console
- Open the Cloud Console
- Go to IAM & Admin > Quotas
- Filter by service (e.g., Compute Engine) or search by metric name (e.g.,
CPUS) - Check the Current usage and Limit columns to see how close you are
The APIs & Services > Dashboard shows real-time API quota consumption per service. If you are consistently close to a rate limit, this is where you will see it. You can also set up Cloud Monitoring alerts on quota metrics to get notified before hitting the ceiling, not after.
Using gcloud
# Check compute quotas for a specific region (useful before large deployments)
gcloud compute regions describe us-central1 \
--format="table(quotas.metric,quotas.limit,quotas.usage)"
# Check project-level compute quotas (applies across all regions)
gcloud compute project-info describe \
--format="table(quotas.metric,quotas.limit,quotas.usage)"The gcloud approach is useful for scripting pre-deployment checks or piping quota data into a monitoring pipeline. Combine it with Cloud Monitoring to track quota usage over time. See monitoring your first GCP project for how to use the APIs & Services dashboard to spot rate limit trends before they cause outages.
How to request a quota increase
Most allocation quota increases can be submitted directly through the Console. Approvals for modest increases are often automatic. Large increases require manual review, and approval is not guaranteed.
- Go to IAM & Admin > Quotas
- Filter to find the specific quota metric and region you need
- Select the checkbox next to it and click Edit Quotas
- Enter the new limit you need
- Add a brief justification: explain the workload you are running and why you need the higher limit
- Submit the request
Modest increases, such as doubling a CPU quota from 24 to 48 vCPUs, are often auto-approved within minutes. Large increases for hundreds of GPUs or very high-bandwidth quotas require manual review from Google and may take days. Approval depends on account history, billing activity, and the size of the request.
Large quota increase requests require manual review. If you submit a request for hundreds of GPUs or a large CPU allocation on the day of a launch, you will be waiting. Build quota review into your pre-launch checklist, days or weeks in advance. This is especially true when expanding to a new region, where you are starting from scratch at default limits.
If you are planning a large deployment and unsure what quotas to request, use the cloud cost estimator to model your resource requirements first. Knowing roughly how many vCPUs, IPs, and disk GB you need makes it easier to write a solid justification and request the right amount in a single submission.
When to actively check your quotas
At low scale you will rarely think about quotas. As workloads grow, checking them becomes part of routine deployment planning. Do it proactively in these situations:
Before deploying many VMs. If you are creating 10 or more VMs in a region, verify your CPU and instance quota first. Review your autoscaling instance group settings too, since the autoscaler can attempt to create many VMs at once during a traffic spike.
Before a launch or planned traffic spike. Quotas that look fine at current traffic levels may not hold under launch-day load. Request increases in advance, not after the first failure.
Before expanding to a new region. Each region has its own quota defaults. Quotas in your primary region do not apply to a secondary region you have never used. Request increases per region before you start deploying there.
Before enabling autoscaling. Autoscalers react to load, not to your quota headroom. An autoscaler hitting a quota limit during a real traffic event will silently fail to create some VMs. The service just will not scale fast enough.
Before running batch or migration jobs. Large batch jobs and data migrations often create many resources in parallel. Check disk quota, CPU quota, and any service-specific quotas before starting the job.
Before high-volume API automation. Any tool that calls GCP APIs in a loop, whether provisioning resources, polling status, or running reports, should be tested against the relevant rate quotas in a non-production project first.
Common mistakes
Checking quotas too late. Most quota problems are discovered partway through a large operation. Creating many VMs in a region where you have a low CPU quota fails partway through, leaving your deployment in an inconsistent state. Check your limits before any large-scale operation.
Confusing quota errors with API-not-enabled errors. Both return error messages that can look similar at first glance. A quota error means the API is enabled but the resource limit is exceeded. An API-not-enabled error means the service itself has not been activated for the project. See API not enabled errors for how to tell them apart.
Treating rate limits like allocation limits. Rate quota errors reset on their own. Submitting an increase request before trying exponential backoff is usually the wrong first move. Check your client code before filing anything.
Forgetting quotas are regional. A quota increase in one region does not affect another. If you expand to
europe-west1after building everything inus-central1, the new region starts at its own default limits. Request increases per region.Not building retries and backoff into automation. Any script or service that calls GCP APIs at scale should have retry logic with exponential backoff. Rate quotas are designed to be handled this way. Code that crashes on a rate limit error instead of retrying gracefully is the problem more often than the quota ceiling itself.
Assuming quota increases are automatic. GCP does not increase quotas as your usage grows. You must proactively request them. GCP also does not warn you when you are approaching a limit. You have to watch that yourself via Cloud Monitoring. Set up alerts on quota metrics before you need them.
Waiting until launch day. Large quota increase requests require manual review. Submitting a request for hundreds of GPUs or a very large CPU allocation on the day of a launch means you will be waiting. Build quota review into your pre-launch checklist alongside billing budgets and alerts.
Quota errors vs similar-looking errors
Quota errors are commonly confused with two other error types in GCP. Here is how to tell them apart quickly:
| Error type | What it means | Where to look | Fix |
|---|---|---|---|
| Quota exceeded | You have consumed the allowed resource count or request rate | IAM & Admin > Quotas | Delete resources, add backoff, or request an increase |
| API not enabled | The service API has not been activated for the project | APIs & Services > Library | Enable the API for the project |
| Permission denied | Your account or service account lacks the required IAM role | IAM & Admin > IAM | Grant the correct IAM role |
| Billing issue | The project has no active billing account, or billing is disabled | Billing > Account management | Link a billing account to the project |
Quota errors return RESOURCE_EXHAUSTED or contain “Quota exceeded.”
API-not-enabled errors return SERVICE_DISABLED or include “has not been
used in project” or “it is disabled.” See
API not enabled errors
for a full breakdown of both types and how to resolve each one.
Summary
- Rate quotas limit API call frequency and reset automatically; allocation quotas limit resource count and require an increase request or resource deletion
- New projects start with conservative default quotas per project and per region; increases are never automatic
- Read quota errors carefully: they name the specific metric, the current limit, and the region where the limit applies
- Check quotas in IAM & Admin > Quotas and via gcloud before any large deployment or region expansion
- Request allocation quota increases well before you need them; large requests require manual review and approval is not guaranteed
- Quotas are per project and per region; an increase in one region does not carry to others
- Set up Cloud Monitoring alerts on quota metrics so you are warned before a limit blocks a deployment
Frequently asked questions
What are GCP quotas and limits?
GCP quotas are limits that Google Cloud places on resource usage per project and per region. They exist to protect shared infrastructure from runaway usage, prevent accidental large-scale resource creation, and ensure fair access across all customers. Every new GCP project starts with conservative default quotas that you can request to increase as your workloads grow.
What is the difference between a rate quota and an allocation quota in GCP?
Rate quotas limit how many API requests you can make per minute or per second. They reset automatically after the time window closes. Allocation quotas limit how many resources can exist at the same time, such as the number of VM instances or static IP addresses in a region. Allocation quotas do not reset. If you hit one, you must either delete existing resources or request an increase.
Are GCP quotas per project or per region?
Both. Some quotas are enforced per project across all regions. Others are enforced per project per region. vCPU quotas, for example, are regional: a high CPU quota in us-central1 does not carry over to europe-west1. If you plan to deploy workloads across multiple regions, you need to request quota increases in each region separately.
How do I request a GCP quota increase?
Go to IAM & Admin > Quotas in the Cloud Console. Filter to find the specific quota you need to increase, select it, and click Edit Quotas. Enter the new limit and provide a brief justification. Modest increases, such as doubling your CPU quota, are often auto-approved within minutes. Large increases for GPUs or high-bandwidth quotas may require manual review from Google and can take longer. Approval is not guaranteed.
What should I do when I get a quota exceeded error?
First, identify whether the error is a rate quota (resets automatically) or an allocation quota (requires an increase or resource deletion). For rate quota errors, add exponential backoff and retry logic rather than immediately requesting an increase. For allocation quota errors, check IAM & Admin > Quotas in the Console, find the specific metric, and either delete unused resources or submit an increase request. Do not wait until launch day to discover you have a quota problem.