BigQuery Cost Optimisation: Reduce Bytes Scanned and Lower Query Costs

BigQuery charges for the amount of data each query scans, not for the rows it returns. On the default on-demand plan the rate is $6.25 per TB processed, so a single query on a large unoptimised table can cost more than you expect. This page covers the practical techniques that reduce bytes scanned and lower your BigQuery bill without changing what your queries return.

If you are new to BigQuery, start with What Is BigQuery? and BigQuery Pricing Explained for context on how pricing works. This page focuses on the hands-on steps: partitioning, clustering, column pruning, dry runs, cost caps, and finding the queries behind your spend.

BigQuery cost optimisation in simple terms

BigQuery uses columnar storage. When you run a query, it reads only the columns you select, but it reads them across every row that matches your scan scope. The total volume of data read is reported as bytes scanned (also called bytes processed), and that number determines your on-demand query cost.

Think of it like a library

Imagine a library where you pay by the number of pages the librarian has to flip through, not the number of pages you take home. Asking for “just the first 10 results” (LIMIT 10) does not stop the librarian from searching every shelf. To pay less, you need to tell the librarian which section to look in (partitioning) and which columns on the shelf to check (column selection).

Two things do not reduce bytes scanned: adding a LIMIT clause and adding a WHERE filter on a non-partition column. BigQuery still reads the underlying data before applying those filters.

Query cost and storage cost are separate charges. Storage is billed per GB per month ($0.02 active, $0.01 long-term). Query cost is billed per TB scanned. Most teams find that query cost grows faster than storage cost, which is why reducing bytes scanned is the primary cost lever. For the full pricing breakdown, see BigQuery Pricing Explained.

How BigQuery cost optimisation works

Every technique on this page maps to one of four ideas:

  1. Reduce data scanned. Partition tables so queries read only the relevant time range, cluster so queries skip irrelevant blocks, and select only the columns you need.
  2. Reduce unnecessary repeated work. Use materialized views to pre-compute aggregates that dashboards or reports query many times a day.
  3. Add guardrails before expensive queries run. Use dry runs and maximumBytesBilled to catch costly queries before they execute.
  4. Monitor which queries are causing spend. Use INFORMATION_SCHEMA.JOBS and Cost Breakdown Reports to find the users, queries, and tables behind your bill.

Quick wins

If you want to reduce BigQuery costs today, start with this checklist:

  • Replace every SELECT * with an explicit column list.
  • Add partition filters to every query on a partitioned table.
  • Cluster tables on the columns your queries filter on most often.
  • Run bq query —dry_run before executing unfamiliar queries.
  • Set —maximum_bytes_billed on ad-hoc queries as a safety net.
  • Review your most expensive jobs weekly using INFORMATION_SCHEMA.JOBS.

Each technique is explained in detail below.

Why LIMIT does not reduce cost

This catches almost everyone

LIMIT controls how many rows are returned to your application, but BigQuery scans the data before applying the limit. A LIMIT 10 on a 10 TB table still processes all 10 TB.

-- Scans the ENTIRE table. LIMIT is applied after the scan.
SELECT *
FROM `project.dataset.large_events_table`
LIMIT 100;

-- Also scans the entire table.
-- The WHERE on a non-partition column does not prune partitions.
SELECT user_id, event_type
FROM `project.dataset.large_events_table`
WHERE event_type = 'purchase'
LIMIT 1000;

On a 10 TB table at on-demand rates, both queries above would process the full 10 TB. Use partition filters and column selection to control cost. Use LIMIT only to cap the result set sent to your application.

Partitioning

Partitioning divides a table into segments, usually by a date or timestamp column. When your query includes a filter on the partition column, BigQuery reads only the matching partitions and skips the rest entirely. This is called partition pruning and it is the single biggest cost control available in BigQuery.

Think of it like a filing cabinet

An unpartitioned table is one giant drawer stuffed with every record you have ever collected. A partitioned table sorts those records into labelled folders by date. When you ask for “March 2026,” BigQuery opens only that folder instead of rummaging through every record in the cabinet.

When to partition

Partition any table that grows over time and is queried with a date or time-range filter. Event tables, log tables, transaction tables, and analytics tables are all strong candidates. For a deeper look at partition strategies, see Partitioned Tables.

Creating a partitioned table

-- Create a partitioned table from an existing raw table
CREATE TABLE `project.dataset.events`
PARTITION BY DATE(created_at)
AS SELECT * FROM `project.dataset.events_raw`;

What good partition filters look like

-- Without a partition filter: scans ALL partitions
-- On a 2 TB table this processes the full 2 TB
SELECT user_id, event_type
FROM `project.dataset.events`
WHERE user_id = '12345';

-- With a partition filter: scans only the matching date range
-- If 30 days holds roughly 50 GB of a 2 TB table, this processes ~50 GB
SELECT user_id, event_type
FROM `project.dataset.events`
WHERE user_id = '12345'
  AND DATE(created_at) BETWEEN '2026-02-25' AND '2026-03-27';

A table holding 5 years of data where most queries need only the last 30 days can see a large reduction in bytes scanned. In this example, that drops from 2 TB down to roughly 50 GB. Always include the partition column in the WHERE clause.

Requiring partition filters

You can configure a table so that any query without a partition filter is rejected. This prevents accidental full-table scans.

-- Require partition filters on this table
ALTER TABLE `project.dataset.events`
SET OPTIONS (require_partition_filter = true);

Automatic cleanup with partition expiration

-- Automatically delete partitions older than 90 days
CREATE TABLE `project.dataset.logs`
(
  log_id STRING,
  message STRING,
  created_at TIMESTAMP
)
PARTITION BY DATE(created_at)
OPTIONS (
  partition_expiration_days = 90
);

Partition expiration removes old data automatically, reducing both storage cost and the risk of accidentally scanning expired data.

Clustering

Clustering physically sorts data within each partition by one or more columns. When a query filters on a clustered column, BigQuery can skip entire blocks of data within the partition. Clustering works best as a complement to partitioning, not a replacement.

When clustering helps

Add clustering when your queries frequently filter or aggregate on specific non-date columns such as user_id, region, or event_type.

Column order matters

BigQuery clusters data in the order you specify. Put the column you filter on most often first. If your queries almost always filter by region and sometimes by user_id, declare CLUSTER BY region, user_id, not the other way around.

-- Partitioned by date, clustered by user_id and event_type
CREATE TABLE `project.dataset.events`
PARTITION BY DATE(created_at)
CLUSTER BY user_id, event_type
AS SELECT * FROM `project.dataset.events_raw`;

-- This query benefits from both partition pruning and clustering
-- It reads only the matching date partition and skips blocks
-- that do not contain the target user_id
SELECT event_type, created_at
FROM `project.dataset.events`
WHERE DATE(created_at) = '2026-03-27'
  AND user_id = '12345';
Tip

You can specify up to four clustering columns. Effectiveness decreases with each additional column, so keep the list short and aligned with your actual query patterns.

Column pruning

Because BigQuery uses columnar storage, you are charged only for the columns you select. Replacing SELECT * with an explicit column list is one of the easiest cost reductions available. It requires no schema changes and no table rebuilds.

-- Scans every column. Expensive on wide tables.
SELECT *
FROM `project.dataset.user_profiles`
WHERE region = 'EU';

-- Scans only three columns. Significantly cheaper.
SELECT user_id, email, signup_date
FROM `project.dataset.user_profiles`
WHERE region = 'EU';

On a table with 100 columns, selecting 3 columns means BigQuery reads roughly 3% of the columnar data instead of 100%. The exact savings depend on column sizes, but the direction is always the same: fewer columns selected means fewer bytes scanned.

Quick test for your own queries

Open your most-used query and check whether it starts with SELECT *. If it does, list only the columns you actually use downstream. This is a free improvement you can make in under a minute.

Materialized views and repeated queries

If the same aggregate query runs many times a day (for a dashboard, a report, or a scheduled summary), a materialized view pre-computes the result and stores it. Subsequent queries that match the view’s definition read the small materialised result instead of scanning the full source table.

This is most valuable for BI dashboards, daily email reports, and any pattern where multiple users or scheduled jobs run the same expensive aggregation repeatedly.

-- Pre-compute daily event counts
CREATE MATERIALIZED VIEW `project.dataset.daily_event_counts`
PARTITION BY event_date
AS
SELECT
  DATE(created_at) AS event_date,
  event_type,
  COUNT(*) AS event_count
FROM `project.dataset.events`
GROUP BY event_date, event_type;

-- Dashboard queries hit the materialized view, not the raw table
SELECT event_type, SUM(event_count)
FROM `project.dataset.daily_event_counts`
WHERE event_date >= '2026-01-01'
GROUP BY event_type;

BigQuery automatically refreshes materialized views when the base table changes and can transparently route compatible queries to the view even if the query references the base table. For performance tuning beyond cost, see BigQuery Performance Optimisation.

Guardrails before you run expensive queries

Reducing bytes scanned is the long-term solution, but guardrails prevent individual expensive queries from surprising you while you get there.

Dry runs: estimate cost before executing

A dry run validates the query and returns the estimated bytes that would be processed, without executing the query or incurring any charge.

# Estimate bytes before running the query
bq query \
  --use_legacy_sql=false \
  --dry_run \
  'SELECT user_id, event_type
   FROM `project.dataset.events`
   WHERE DATE(created_at) = "2026-03-27"'

The output shows Query successfully validated. Assuming the tables are not modified, running this query will process N bytes of data. Divide by 1 TB (1,099,511,627,776 bytes) and multiply by $6.25 for a rough on-demand cost estimate. In the BigQuery Console, the query editor shows the same estimate in the top-right corner before you click Run.

Think of it like checking the price tag before buying

A dry run is free and takes a few seconds. It tells you exactly how much data a query would scan before you commit to running it. Get into the habit of running —dry_run on any query you have not seen before, especially on large or unfamiliar tables.

maximumBytesBilled: hard cost cap per query

Set a byte limit on individual queries. If the estimated scan exceeds the limit, BigQuery rejects the query with an error instead of running it.

# Cap this query at 1 GB scanned (roughly $0.006 at on-demand rates)
bq query \
  --use_legacy_sql=false \
  --maximum_bytes_billed=1000000000 \
  'SELECT user_id, event_type
   FROM `project.dataset.events`
   WHERE DATE(created_at) = "2026-03-27"'
-- You can also set the cap inside SQL using query options
-- This rejects the query if estimated scan exceeds 10 GB
SET @@query.maximumBytesBilled = 10000000000;

SELECT user_id, event_type
FROM `project.dataset.events`
WHERE DATE(created_at) BETWEEN '2026-03-01' AND '2026-03-27';

This is especially useful for analyst teams running ad-hoc queries on large tables. Set a sensible default (for example, 10 GB) so that forgotten partition filters produce an error instead of a bill.

Custom quotas

GCP lets you set custom quotas on BigQuery usage at the project or user level. You can cap the total TB processed per day for an entire project, preventing runaway costs from scheduled queries, dashboards, and ad-hoc exploration alike. Configure these in the GCP Console under IAM & Admin → Quotas. For a broader view of GCP cost controls, see GCP Cost Optimisation Strategy.

How to find what is costing you money

Optimisation starts with visibility. Before tuning queries, identify which jobs, users, and tables are driving spend.

INFORMATION_SCHEMA.JOBS: find expensive queries

-- Top 20 most expensive queries in the last 7 days
SELECT
  job_id,
  user_email,
  query,
  total_bytes_processed,
  ROUND(total_bytes_processed / POW(1024, 4), 4) AS tb_processed,
  ROUND(total_bytes_processed / POW(1024, 4) * 6.25, 2) AS estimated_cost_usd,
  creation_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  AND job_type = 'QUERY'
  AND state = 'DONE'
  AND total_bytes_processed > 0
ORDER BY total_bytes_processed DESC
LIMIT 20;

Look for patterns: repeated queries from dashboards, scheduled jobs running more often than needed, and ad-hoc SELECT * queries on large tables. These are your highest-impact optimisation targets.

Cloud billing and cost breakdown reports

For a project-wide view, BigQuery billing exports let you break down costs by service, dataset, and resource. See GCP Cost Breakdown Reports for how to set up the export and run useful queries against it. To find expensive resources across all GCP services (not just BigQuery), see How to Find Your Most Expensive GCP Resources.

Optimise the right thing first

Do not start tuning random queries. Run the INFORMATION_SCHEMA.JOBS query above and sort by bytes processed. The top 5 results often account for the majority of your scan costs. Fix those first before touching anything else.

What to investigate

  • Dashboard queries. BI tools can run the same heavy query every time a user opens a dashboard. Materialized views help here.
  • Scheduled queries. Jobs that run hourly may only need to run daily. Check the schedule frequency.
  • Analyst ad-hoc queries. A single SELECT * on a large table without a partition filter can be surprisingly expensive. Guardrails catch these.
  • ETL and data pipeline jobs. Pipelines that reprocess entire tables instead of incremental partitions can drive high scan volumes.

When to use these optimisations

Not every BigQuery table needs every technique. Focus your effort where the payoff is highest:

  • Large event and log tables. These grow daily and are the most common source of high scan costs. Partitioning and partition expiration are essential.
  • Repeated dashboard queries. If the same aggregation runs dozens of times a day, a materialized view pays for itself quickly.
  • Analyst-heavy projects. Teams running ad-hoc exploration on wide tables benefit most from column pruning and maximumBytesBilled guardrails.
  • Ad-hoc exploration on wide tables. Tables with many columns (50+) where analysts typically need only a handful of fields. Column pruning has an outsized effect here.

For small tables (under a few GB), the cost per query is negligible at on-demand rates and these optimisations are optional. Focus on the tables that appear in your INFORMATION_SCHEMA.JOBS results with high total_bytes_processed.

Partitioning vs clustering vs materialized views

TechniqueBest forWhat it reducesWhen not to reach for it first
PartitioningTime-series tables queried by date rangeBytes scanned by skipping entire partitionsTables queried across all dates equally
ClusteringHigh-cardinality filter columns (user_id, region)Bytes scanned within each partitionColumns with very low cardinality (boolean flags)
Materialized viewsRepeated aggregate queries (dashboards, reports)Repeated scan cost by reading pre-computed resultsOne-off analytical queries that change frequently

In practice, most large tables benefit from partitioning first, clustering second, and materialized views where repeated aggregation patterns exist. For a full discussion of GCP pricing models including committed-use options, see GCP Pricing Models.

Common mistakes

  1. Relying on LIMIT to control cost. LIMIT controls rows returned, not bytes scanned. A query with LIMIT 10 on a large table still processes the full scan. Use partition filters and column selection instead.

  2. Querying partitioned tables without partition filters. A partitioned table without a partition column in the WHERE clause scans all partitions, the same cost as if the table were not partitioned at all. Consider enabling require_partition_filter on critical tables.

  3. Using SELECT * on wide tables. BigQuery charges for every column selected. On a wide table, selecting only the columns you need can reduce bytes scanned substantially.

  4. Choosing poor clustering columns. Clustering on a column you never filter on provides no benefit. Clustering on a column with very few distinct values (such as a boolean) provides minimal block pruning. Choose columns that appear in your most common WHERE clauses.

  5. Not estimating bytes before running. Use —dry_run or check the Console estimate before executing queries on unfamiliar tables. A five-second check can prevent an unexpected charge.

  6. Not reviewing which jobs are driving spend. Without periodic review of INFORMATION_SCHEMA.JOBS, expensive queries from dashboards, scheduled jobs, or ad-hoc exploration can run unchecked for weeks.

  7. Not setting partition expiration on log tables. Tables accumulating rows every day grow indefinitely without partition expiration. Set a retention period matching your use case, typically 30 to 365 days for event and log data.

Frequently asked questions

Does LIMIT reduce BigQuery cost?

No. BigQuery scans the data it needs before applying LIMIT, so a query with LIMIT 10 on a 10 TB table still processes all 10 TB. To reduce cost, use partition filters and select only the columns you need. LIMIT only controls how many rows are returned to your application.

Should I use partitioning or clustering first?

Start with partitioning. Partitioning by date is the single biggest cost lever because most queries naturally filter by time range. Add clustering after partitioning for columns you frequently filter on, such as user_id or region. Clustering reduces bytes scanned within each partition, but it cannot replace partitioning for broad time-range pruning.

How can I estimate cost before I run a query?

Use the --dry_run flag with bq query. BigQuery returns the estimated bytes that would be processed without actually running the query or incurring any cost. In the Console, the query editor shows estimated bytes in the top-right corner before you click Run. Multiply the estimated bytes by $6.25 per TB for a rough on-demand cost estimate.

When does on-demand stop making sense?

On-demand charges per TB scanned with no commitment. If your team consistently scans many terabytes every day and the monthly bill is predictable, capacity pricing (slot reservations) can be cheaper. Calculate your average monthly TB scanned and compare the on-demand cost against the equivalent slot reservation. For variable or exploratory workloads, on-demand is usually better value.

How do I find the queries causing my BigQuery bill?

Query the INFORMATION_SCHEMA.JOBS view to list recent jobs sorted by total_bytes_processed. This shows which queries, users, and projects are scanning the most data. Combine this with BigQuery billing exports to Cost Breakdown Reports for a full picture of spend by service and resource.

Last verified: 27 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.