BigQuery Query Too Large Error: Fix Bytes Billed and resourcesExceeded

If BigQuery rejected your query with a “bytes billed” limit error, or your query failed mid-execution with resourcesExceeded, this page walks you through diagnosing which error you actually hit, understanding why it happened, and fixing it. These are two different failures with different causes and different fixes. Start by checking the exact error message in the query results or job details.

Simple explanation

BigQuery charges and allocates resources based on two things: data read from storage (bytes scanned) and data shuffled between workers during execution (intermediate data from JOINs, GROUP BY, ORDER BY). The number of rows in your final result has almost nothing to do with either cost.

Think of it like a warehouse

Imagine you need one item from a massive warehouse. Bytes scanned is how many aisles the forklift drives through to find it. Shuffle is how much stuff gets moved to a sorting area before your item is picked out. Asking for “just one item” (LIMIT 1) does not stop the forklift from driving every aisle. You reduce cost by telling the forklift exactly which aisle (partition) and shelf (cluster) to check.

When someone says their query is “too large”, they usually mean one of two unrelated problems:

The query would scan too many bytes and was blocked by a safety limit before running.
The query ran but failed because it produced too much intermediate data for BigQuery’s workers to handle.

The first is a billing safety issue. The second is a query design issue. Both are fixable, but the fixes are different. If you are new to how BigQuery handles data, the BigQuery overview covers the fundamentals.

How to identify which error you actually hit

Check the exact error message. The wording tells you whether the query was blocked before running or failed during execution:

Error / symptom	What it means	Was the query run?	Were you charged?	Typical fix
`Query exceeded limit for bytes billed`	The query would scan more data than your `maximumBytesBilled` setting allows	No	No	Reduce bytes scanned (partition filters, column pruning) or raise the limit
`resourcesExceeded`	The query ran but generated more intermediate data than BigQuery could shuffle/process	Yes (partially)	Yes (for bytes scanned before failure)	Restructure the query to reduce shuffle. Filter earlier, materialise intermediates
`Response too large to return`	The final result set exceeds the maximum response size for interactive queries	Yes	Yes	Write results to a destination table instead of returning them to the client

Note

“Response too large” is a different error from the other two. It means the query succeeded but the result set is too big to send back inline. The fix is to set a destination table. The rest of this page focuses on the bytes billed and resourcesExceeded cases.

How BigQuery decides cost and work

Understanding the scan pipeline helps you predict which changes will actually reduce cost. BigQuery processes a query in stages, and each stage has a different cost driver. For full pricing details, see BigQuery pricing explained.

Bytes scanned (storage read)

BigQuery reads data column by column. If your query references three columns out of fifty, it reads only those three. This is column pruning, and it happens automatically based on your SELECT and WHERE clauses.

Analogy: newspaper vs entire archive

Column pruning is like reading only the sports section instead of every page of every newspaper in the archive. SELECT * reads every section of every paper. SELECT score, team reads only the sports column. The archive is the same size either way, but you pay for what you actually pull off the shelf.

Partition pruning

If the table is partitioned, BigQuery skips entire partitions that your WHERE clause excludes. A year of daily partitions filtered to one week scans roughly 7/365ths of the table. Partition pruning requires the filter to reference the partition column directly with a literal or a statically evaluable expression.

Cluster block pruning

Clustered tables store data sorted by the cluster columns within each partition. When your WHERE clause filters on a cluster column, BigQuery can skip storage blocks that do not contain matching values. This is block pruning.

There is an important nuance here. On clustered tables, the bytes estimate from a dry run is an upper bound. Actual bytes scanned at execution time can be lower because block pruning is applied during execution, not during query planning. A dry run might report 50 GB, but the query actually scans 12 GB if the cluster filter is selective.

Why LIMIT usually does not reduce cost

Common misconception

LIMIT 10 does not mean BigQuery reads only 10 rows. BigQuery scans all matching columns and partitions first, then discards everything beyond the limit. A LIMIT 10 on a 1 TB unpartitioned table still scans 1 TB. The only case where LIMIT can reduce scanning is on clustered tables where BigQuery can sometimes stop reading blocks early, but this is not guaranteed and should not be relied on for cost control.

How to fix maximum bytes billed errors

When you see this error, the query was blocked before running and you were not charged:

Error: Query exceeded limit for bytes billed: 10737418240.
34359738368 or higher required.

The message tells you the limit (10 GB here) and the estimated bytes needed (32 GB). Work through these steps to bring the scan down.

Step 1: Run a dry run to confirm the estimate

Dry runs are free and return instantly. Use them to check how much data a query would scan before you commit to running it:

# Free estimate of bytes a query would scan
bq query \
  --dry_run \
  --use_legacy_sql=false \
  'SELECT user_id, event_type
   FROM `myproject.analytics.events`
   WHERE DATE(created_at) = "2026-01-15"'

# Output example:
# Query successfully validated. Assuming the tables are not modified,
# running this query will process 47234567 bytes of data.

You can also dry-run from the Python client library:

from google.cloud import bigquery

client = bigquery.Client()
job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)

query = """
SELECT user_id, event_type
FROM `myproject.analytics.events`
WHERE DATE(created_at) = '2026-01-15'
"""

query_job = client.query(query, job_config=job_config)
bytes_estimate = query_job.total_bytes_processed
print(f"Estimated bytes to scan: {bytes_estimate:,}")
# Estimated bytes to scan: 47,234,567

Tip

Dry runs cost nothing and take less than a second. Make them a habit before running any query on a table you have not queried before. If you regularly estimate costs across multiple services, see estimating cloud costs for broader strategies.

Step 2: Remove SELECT * and select only the columns you need

BigQuery’s columnar storage means each column you add to your query adds to the bytes scanned. SELECT * on a wide table with JSON or BYTES columns can scan 10 to 100x more data than selecting the three columns you actually need.

-- Expensive: scans every column in the table
SELECT *
FROM `myproject.analytics.user_profiles`
WHERE region = 'EU';

-- Much cheaper: scans only three columns
SELECT user_id, email, region
FROM `myproject.analytics.user_profiles`
WHERE region = 'EU';

Step 3: Add partition filters

If the table is partitioned, a filter on the partition column is the single biggest cost reduction available. Without it, BigQuery reads every partition.

-- No partition filter: scans ALL days in the table
SELECT user_id, event_type
FROM `myproject.analytics.events`
WHERE user_id = 'abc123';

-- With partition filter: scans only one day
SELECT user_id, event_type
FROM `myproject.analytics.events`
WHERE DATE(created_at) = '2026-01-15'
  AND user_id = 'abc123';

Partition pruning can silently fail

Pruning requires the filter to directly reference the partition column with a literal or a function BigQuery can evaluate at plan time. Wrapping the partition column in a UDF, CAST, FORMAT_TIMESTAMP, or subquery quietly disables pruning with no error message. Keep partition filters simple: DATE(created_at) = ‘2026-01-15’ or created_at BETWEEN ‘2026-01-01’ AND ‘2026-01-31’. Check Bytes processed in query details to verify pruning is active.

Step 4: Add cluster key filters where the table is clustered

If the table is clustered on a column you are filtering, add that filter to your WHERE clause. Block pruning during execution can substantially reduce the actual bytes scanned below the dry-run estimate.

-- Table clustered on (region, event_type)
-- Cluster filter on region enables block pruning
SELECT user_id, event_type, created_at
FROM `myproject.analytics.events`
WHERE DATE(created_at) = '2026-01-15'
  AND region = 'EU';

Step 5: Use narrower date ranges

If your analysis only needs the last 7 days, do not query the last 365. Tighten the date range to scan only what you need.

Step 6: Consider staged queries with destination tables

If your final query needs to join or aggregate a large volume of data, consider breaking it into two stages: first write a filtered subset to a destination table, then run your analysis on the smaller table. The total bytes scanned is often lower because the destination table is smaller and can be partitioned/clustered for your specific use case.

# Stage 1: write a filtered subset to a temp table
bq query \
  --destination_table=myproject:analytics.events_jan_eu \
  --use_legacy_sql=false \
  'SELECT user_id, event_type, created_at
   FROM `myproject.analytics.events`
   WHERE DATE(created_at) BETWEEN "2026-01-01" AND "2026-01-31"
     AND region = "EU"'

# Stage 2: run analysis on the smaller table
bq query \
  --use_legacy_sql=false \
  'SELECT event_type, COUNT(*) AS event_count
   FROM `myproject.analytics.events_jan_eu`
   GROUP BY event_type
   ORDER BY event_count DESC'

Step 7: Set or adjust maximumBytesBilled intentionally

The maximumBytesBilled setting is a safety net, not a performance tool. Set it to a value that matches your cost expectations for a given query. If the query would exceed it, BigQuery rejects the query before running, and you are not charged.

from google.cloud import bigquery

client = bigquery.Client()

# Set a 10 GB safety limit
job_config = bigquery.QueryJobConfig(
    maximum_bytes_billed=10 * 1024 ** 3  # 10 GB in bytes
)

query = """
SELECT user_id, event_type
FROM `myproject.analytics.events`
WHERE DATE(created_at) = '2026-01-15'
"""

query_job = client.query(query, job_config=job_config)
results = query_job.result()

You can also set a default limit in the BigQuery Console under query settings so that every interactive query has a cap. This is a good practice for teams where analysts run ad-hoc queries regularly. For more cost control strategies, see BigQuery cost optimisation.

How to fix resourcesExceeded

A resourcesExceeded error means the query ran but failed because it produced more intermediate data than BigQuery could handle during execution. This is fundamentally different from a bytes billed error. It is about shuffle and compute resources, not about data scanned from storage.

Analogy: assembly line overflow

Think of BigQuery workers as stations on an assembly line. The bytes billed error is like the delivery truck refusing to unload because the order is too big for your budget. resourcesExceeded is different: the raw materials arrived fine, but the assembly line jammed because a JOIN or GROUP BY created too many half-assembled parts for the workers to process at once.

The most common causes are large shuffles from JOINs, high-cardinality GROUP BY operations, ORDER BY on huge result sets, and accidental CROSS JOINs. The query plan in Execution details (visible in the Console or via the Jobs API) shows you exactly which stage failed and how much data was shuffled.

Check the query plan first

Before changing anything, open the query’s Execution details in the BigQuery Console. Look for:

Which stage failed (marked in red)
The Shuffle output bytes for each stage. A large number here is the bottleneck.
The Records read vs Records written ratio. A stage that reads 1M rows and writes 10B rows is producing a data explosion.

Where to find Execution details

In the BigQuery Console, run the query, then click the Execution details tab below the results. Each stage shows input rows, output rows, shuffle bytes, and time. You can also retrieve the same information via the Jobs API by reading the statistics.query.queryPlan field on the job resource.

Filter earlier to reduce intermediate data

Move WHERE clauses as close to the data source as possible. If you are joining two tables, add partition and column filters to both sides of the JOIN before the join happens, not after.

-- Problematic: joins all-time data from both tables, then filters
SELECT a.user_id, b.order_id, b.total
FROM `myproject.analytics.users` a
JOIN `myproject.analytics.orders` b
  ON a.user_id = b.user_id
WHERE DATE(b.created_at) = '2026-01-15'
  AND a.region = 'EU';

-- Better: use CTEs to filter each table before joining
WITH filtered_users AS (
  SELECT user_id
  FROM `myproject.analytics.users`
  WHERE region = 'EU'
),
filtered_orders AS (
  SELECT user_id, order_id, total
  FROM `myproject.analytics.orders`
  WHERE DATE(created_at) = '2026-01-15'
)
SELECT u.user_id, o.order_id, o.total
FROM filtered_users u
JOIN filtered_orders o
  ON u.user_id = o.user_id;

The optimiser cannot always save you

BigQuery’s query optimiser can sometimes push filters down automatically, but it cannot always do so. Complex expressions, nested subqueries, and UDFs can prevent automatic pushdown. Writing the filter explicitly in the right place guarantees it happens early.

Reduce high-cardinality intermediates

If a GROUP BY produces millions of groups, consider whether you can pre-aggregate, use APPROX_COUNT_DISTINCT instead of COUNT(DISTINCT …), or group by a coarser key first.

-- High-cardinality GROUP BY: one group per session_id across all time
SELECT session_id, COUNT(*) AS events
FROM `myproject.analytics.events`
GROUP BY session_id;

-- Reduced cardinality: group by date first, then session
SELECT DATE(created_at) AS event_date, session_id, COUNT(*) AS events
FROM `myproject.analytics.events`
WHERE DATE(created_at) BETWEEN '2026-01-01' AND '2026-01-07'
GROUP BY event_date, session_id;

Materialise intermediate results

If a single query tries to do too much (multiple joins, aggregations, and transformations), split it into stages. Write intermediate results to destination tables, then query those smaller tables.

# Stage 1: aggregate daily metrics into a smaller table
bq query \
  --destination_table=myproject:analytics.daily_metrics_jan \
  --use_legacy_sql=false \
  'SELECT DATE(created_at) AS event_date, user_id, COUNT(*) AS events
   FROM `myproject.analytics.events`
   WHERE DATE(created_at) BETWEEN "2026-01-01" AND "2026-01-31"
   GROUP BY event_date, user_id'

# Stage 2: join the smaller aggregated table
bq query \
  --use_legacy_sql=false \
  'SELECT m.user_id, m.events, u.account_type
   FROM `myproject.analytics.daily_metrics_jan` m
   JOIN `myproject.analytics.users` u
     ON m.user_id = u.user_id
   WHERE u.region = "EU"'

Avoid accidental CROSS JOINs

CROSS JOINs can explode data volume

A CROSS JOIN between two tables of size N and M produces N × M rows. If both tables have 1 million rows, that is 1 trillion intermediate rows. Even an intentional CROSS JOIN at that scale will fail. If you need a cross join, filter both sides aggressively first so the intermediate result stays manageable.

Split large transformations into batches

If a query processes a year of data and fails, try processing one month at a time and writing each batch to a destination table:

# Process one month at a time to stay within resource limits
for month in 2025-10 2025-11 2025-12 2026-01; do
  bq query \
    --destination_table="myproject:analytics.monthly_result_${month//-/_}" \
    --use_legacy_sql=false \
    "SELECT user_id, event_type, COUNT(*) AS cnt
     FROM \`myproject.analytics.events\`
     WHERE DATE(created_at) BETWEEN '${month}-01' AND DATE_SUB(DATE_ADD(DATE '${month}-01', INTERVAL 1 MONTH), INTERVAL 1 DAY)
     GROUP BY user_id, event_type"
done

Reservations, slots, and concurrency

On-demand BigQuery queries share a pool of slots (compute units). If your query needs more slots than the on-demand pool can provide, or if concurrent queries are consuming the available slots, you may hit resourcesExceeded even with a well-structured query.

If you have confirmed that the query plan is reasonable and filters are tight, the issue may be capacity. Options include:

Purchase BigQuery Reservations (editions) for dedicated slot capacity
Reduce concurrency by scheduling large queries at off-peak times
Use BATCH priority (—priority=BATCH) which queues the job instead of failing immediately when slots are scarce

Slots do not fix bad queries

Increasing reservations is appropriate when you have already optimised the SQL and still need more capacity. Buying more slots for a poorly structured query just means you spend more money on the same inefficient work. Optimise the query first, then scale capacity if needed. For a full comparison of analytical database options, see BigQuery vs Cloud SQL.

When to use each fix

Technique	Use this when…
Partition filter	The table is partitioned and your query does not filter on the partition column, or uses a wider range than needed
Column pruning (remove SELECT *)	You are scanning a wide table and only need a few columns
Cluster key filter	The table is clustered and you can add a WHERE clause on the cluster column to enable block pruning
Dry run	Before running any query on a table you have not queried before, or when estimating cost for a new query pattern
Materialised intermediate tables	A single query is too complex, involves multiple large joins, or consistently hits resourcesExceeded
Reservation / slot changes	Query plan is already optimised but on-demand capacity is insufficient, or you need predictable performance for production workloads

For a more detailed comparison of cost and performance strategies, see BigQuery performance optimisation.

Common beginner mistakes

Assuming LIMIT makes queries cheap. LIMIT 10 on a 1 TB table still scans 1 TB. LIMIT controls rows returned, not bytes scanned. Use partition filters and column selection to reduce cost.
Using SELECT * on wide tables. Every column adds to the scan. On a table with 50 columns including JSON blobs, selecting three columns can be 20x cheaper.
No partition filter on a partitioned table. This is the single most expensive mistake. Always include a filter on the partition column.
Filtering too late in the query. Applying WHERE clauses after a JOIN means BigQuery still has to read and shuffle all the unfiltered data first. Move filters into CTEs or subqueries before the JOIN.
Joining two large tables without reducing either side first. If both tables have billions of rows, the shuffle can exceed resource limits. Filter or aggregate each table before joining.
Not checking bytes processed before running. A dry run is free and takes less than a second. It tells you exactly how much a query will cost. Make it a habit before running queries on unfamiliar tables.

Maximum bytes billed vs resourcesExceeded

Attribute	Maximum bytes billed error	resourcesExceeded error
When it happens	Before the query runs	During query execution
Were you charged?	No	Yes (for the scan that occurred before failure)
Root cause	Query would scan more bytes than the `maximumBytesBilled` limit	Query produced too much intermediate data (shuffle) for available compute resources
Controlled by	The `maximumBytesBilled` setting on the job or project	Query shape, data volume, and available slots
Primary fix	Reduce bytes scanned: partition filters, column pruning, narrower date ranges	Restructure query: filter earlier, materialise intermediates, reduce cardinality, add reservations
Dry run helps?	Yes. Shows the exact bytes that would be scanned.	No. Dry run does not estimate shuffle or execution resources.

Frequently asked questions

What does 'query too large' mean in BigQuery?

It usually means one of two things. Either your query would scan more bytes than the maximumBytesBilled limit allows (the query is blocked before running and you are not charged), or the query ran but hit a resourcesExceeded error because it generated too much intermediate data during shuffles. The fix depends on which error you actually received.

Does LIMIT reduce BigQuery cost?

No. BigQuery scans entire columns before applying LIMIT. A query with LIMIT 10 on a 1 TB table still scans 1 TB unless partition or cluster pruning reduces the scan. LIMIT only limits the rows returned to the client. To reduce cost, use partition filters, select specific columns, and add cluster key filters.

Why is my partitioned table still scanning too much data?

Partition pruning only works when the filter directly references the partition column with a literal value or a date function BigQuery can evaluate at plan time. Wrapping the partition column in a UDF, CAST, or subquery prevents pruning. Check the Bytes processed figure in query details to confirm pruning is active. Also verify you are filtering by the actual partition column, not a different date column.

When does clustering help reduce scanned bytes?

Clustering helps when your WHERE clause filters on the clustered columns. BigQuery stores clustered data in sorted blocks and can skip blocks that do not contain matching values. The benefit increases with table size. On clustered tables, the bytes estimate from a dry run is an upper bound. Actual bytes scanned at execution time can be lower because block pruning happens during execution, not during planning.

What causes resourcesExceeded even when bytes billed seems reasonable?

resourcesExceeded is about compute resources during execution, not about the volume of data scanned from storage. A query that scans a modest amount of data can still fail if a JOIN, GROUP BY, or ORDER BY produces massive intermediate results that exceed shuffle limits. The fix is to reduce the size of intermediate data: filter earlier, reduce cardinality before joining, or materialise intermediate results into staged tables.

Last verified: 27 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.