BigQuery Partitioned Tables: How Pruning Cuts Cost

A partitioned table in BigQuery is a table split into smaller segments based on the values in one column, usually a date. When your query filters on that column, BigQuery reads only the matching partitions and skips everything else. The result is fewer bytes scanned, lower cost, and faster queries on large tables. If your table has more than a few gigabytes and your queries consistently filter on a specific column, partitioning should be one of the first things you set up.

BigQuery partitioned tables in simple terms

On a table with two years of daily data, a query for a single day scans all 730 days of rows if the table is not partitioned. With partitioning, the same query touches only one day’s worth of data.

The problem partitioning solves is straightforward: large tables are expensive and slow to scan in full. Partitioning lets BigQuery skip the irrelevant portions automatically, as long as your query filters on the partition column. If you are working with BigQuery for the first time, think of partitioning as the most direct way to control scan volume on growing tables.

Why partitioning matters

BigQuery charges for on-demand queries based on the number of bytes scanned. On a non-partitioned table with two years of event data, a query asking about yesterday scans every byte in the table. On a date-partitioned table, the same query reads only yesterday’s partition.

Cost example

A 10 TB event table at on-demand pricing ($6.25 per TB scanned) costs $62.50 per full-table scan. If yesterday’s partition holds roughly 14 GB, the same query costs about $0.09. That is a 700x cost reduction from a single WHERE clause.

Over hundreds of queries per day, those savings compound fast. For a deeper look at how scan volume affects your bill, see BigQuery pricing explained.

Performance improves for the same reason. Fewer bytes to read means fewer workers needed and faster results. On large production tables, partitioning is not a nice-to-have. It is a baseline requirement.

How partition pruning works

Partition pruning is the mechanism that makes partitioning useful. When BigQuery compiles your query, it checks whether your WHERE clause filters on the partition column. If it does, BigQuery identifies which partitions match and reads only those. Every other partition is skipped entirely.

Here is a query that prunes correctly on a table partitioned by DATE(event_timestamp):

-- Partition pruning: reads only the 2026-03-01 partition
SELECT user_id, event_type, COUNT(*) AS events
FROM analytics.events
WHERE DATE(event_timestamp) = '2026-03-01'
GROUP BY user_id, event_type;

And here is a query on the same table that does not prune:

-- No pruning: scans every partition in the table
SELECT user_id, event_type, COUNT(*) AS events
FROM analytics.events
WHERE user_id = 'usr-5678'
GROUP BY user_id, event_type;

Why the second query scans everything

The second query filters on user_id, not on the partition column. BigQuery has no way to know which partitions contain rows for that user, so it must open and scan all of them. The filter still reduces the result set, but it does not reduce the bytes scanned or the cost.

Keep partition filters simple

Pruning only works with direct filters on the partition column. Wrapping the column in a function that BigQuery cannot evaluate at compile time, or filtering through a subquery, may prevent pruning.

Partition types in BigQuery

Ingestion-time partitioning

Rows are assigned to partitions based on when BigQuery ingested them. The partition column is a pseudo-column called _PARTITIONTIME.

When to use it: when you do not have a reliable timestamp column in your data, or when you are streaming data and want automatic daily partitions without schema changes.

-- Create a table with ingestion-time partitioning
CREATE TABLE analytics.raw_logs (
  log_message STRING,
  severity STRING
)
PARTITION BY _PARTITIONTIME;

Late-arriving data lands in the wrong partition

Ingestion-time partitioning is tied to load time, not event time. If a row describing a March 1 event arrives on March 3, it goes into the March 3 partition. This can make date-filtered queries return incomplete results for a given day.

Time-unit column partitioning

Rows are partitioned by the value of a top-level DATE, TIMESTAMP, or DATETIME column in your schema. You choose the granularity: hour, day, month, or year.

Recommended approach

This is the most common partitioning strategy. Use it when your data has a reliable event timestamp and your queries consistently filter on that timestamp. It gives you precise control over which column drives partitioning.

-- Create a table partitioned by day on a TIMESTAMP column
CREATE TABLE analytics.events (
  event_id STRING,
  user_id STRING,
  event_type STRING,
  event_timestamp TIMESTAMP
)
PARTITION BY DATE(event_timestamp);

Tradeoff: you must choose the right granularity. Partitioning by hour on a table that receives only a few hundred rows per hour creates many tiny partitions with high metadata overhead. Match granularity to your data volume and query patterns.

Integer-range partitioning

Rows are partitioned by an INTEGER column using a defined range and interval. For example, partition a customer_id column into ranges of 10,000.

When to use it: when your natural query filter is an integer key rather than a date. This is less common but useful for tables where every query filters on an ID range.

-- Create a table with integer-range partitioning
CREATE TABLE analytics.customer_metrics (
  customer_id INT64,
  metric_name STRING,
  metric_value FLOAT64
)
PARTITION BY RANGE_BUCKET(customer_id, GENERATE_ARRAY(0, 1000000, 10000));

Tradeoff: you must define the range boundaries upfront. Values outside the defined range go into an overflow partition. If your ID space grows beyond the defined range, the overflow partition accumulates data and pruning becomes less effective for those rows.

Partitioning vs clustering

Partitioning and clustering both reduce bytes scanned, but they work at different levels. Partitioning divides the table into separate physical segments that BigQuery can skip entirely. Clustering sorts the rows within each partition by up to four columns, allowing BigQuery to skip blocks of rows within a partition.

	Partitioning	Clustering
Best for	Filtering on one column (usually date)	Filtering on multiple columns or high-cardinality values
Pruning granularity	Entire partitions	Blocks within a partition
Ideal filter pattern	Equality or range on the partition column	Equality or range on the first clustering column
Cost estimate before execution	Accurate (BigQuery knows partition sizes)	Upper-bound only (block-level pruning happens at runtime)
When it breaks down	Too many small partitions increase metadata overhead	Filtering only on the 2nd or 3rd clustering column reduces benefit
Use both together	Partition by date, then cluster by the columns you filter on most within each partition. This gives partition-level pruning plus block-level pruning.

Clustering column order matters

BigQuery sorts rows by clustering columns left to right. The first clustering column provides the most pruning benefit. If your queries almost always filter on event_type and only sometimes on user_id, list event_type first.

For more on combining these techniques, see BigQuery performance optimisation.

When to use partitioned tables

Event and clickstream tables where queries almost always filter by date range
Daily fact tables in a data warehouse that are queried by business date
Billing and export data such as BigQuery billing export tables that grow continuously and are queried by invoice period
Log and time-series analytics where retention windows drive queries to recent data
Large append-heavy analytics tables that grow by millions of rows per day and are queried for recent windows

Rule of thumb

Consider partitioning any table over a few gigabytes where queries consistently filter on one column. If that column is a date or timestamp, you are in the sweet spot for time-unit partitioning.

When partitioning is the wrong tool

Many tiny partitions. If your partition column has extremely high cardinality or very low data volume per partition, you end up with thousands of small partitions. Each partition carries metadata overhead, and BigQuery has a limit of 4,000 partitions per partitioned table. In these cases, clustering alone may be a better fit.

Multi-column or high-cardinality filters. If your queries filter on several columns and rarely use a single dominant column, clustering across those columns often provides better pruning than partitioning on one of them.

Queries rarely filter on the partition column. Partitioning only helps when queries include a filter on the partition column. If most of your queries use full-table aggregations or filter on unrelated columns, partitioning adds complexity without benefit. For these workloads, focus on other performance techniques such as column selection and approximate aggregation.

How to create and use a partitioned table

Create a partitioned table

CREATE TABLE analytics.events (
  event_id STRING,
  user_id STRING,
  event_type STRING,
  event_timestamp TIMESTAMP,
  payload STRING
)
PARTITION BY DATE(event_timestamp);

Create a partitioned and clustered table

Combining partitioning and clustering gives you both partition-level and block-level pruning. If you are loading data into BigQuery for analytics, this is the recommended table setup for large event tables.

CREATE TABLE analytics.events
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type
AS SELECT * FROM staging.raw_events;

Query with a partition filter

-- Prunes to one day's partition, then uses clustering to skip blocks
SELECT user_id, COUNT(*) AS event_count
FROM analytics.events
WHERE DATE(event_timestamp) = '2026-03-01'
  AND event_type = 'purchase'
GROUP BY user_id;

Set require_partition_filter

This option causes BigQuery to reject queries that do not filter on the partition column. It is a cost-control guardrail that prevents accidental full-table scans.

-- Enable require_partition_filter on an existing table
ALTER TABLE analytics.events
SET OPTIONS (require_partition_filter = true);

Queries without a partition filter will be rejected

Once require_partition_filter is enabled, any query that omits a filter on the partition column fails immediately. This includes ad-hoc exploration queries. Make sure your team knows this setting is active before enabling it on shared tables.

-- This query would be rejected with require_partition_filter enabled
SELECT COUNT(*) FROM analytics.events WHERE event_type = 'purchase';

Inspect partitions with INFORMATION_SCHEMA

You can check which partitions exist and how large they are using INFORMATION_SCHEMA.PARTITIONS.

SELECT
  partition_id,
  total_rows,
  total_logical_bytes
FROM `project.analytics.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'events'
ORDER BY partition_id DESC
LIMIT 10;

Using the bq CLI

You can also create partitioned tables and query partition metadata from the command line. This is useful when scripting table creation in data pipelines.

# Create a partitioned table from the command line
bq mk \
  --table \
  --time_partitioning_field event_timestamp \
  --time_partitioning_type DAY \
  --clustering_fields user_id,event_type \
  analytics.events \
  event_id:STRING,user_id:STRING,event_type:STRING,event_timestamp:TIMESTAMP,payload:STRING

How to know it is working

Check bytes processed. After running a query, look at the “Bytes processed” line in the BigQuery console or in the query job metadata. If your partition filter is working, bytes processed should be a fraction of the total table size.

Use dry runs to compare

Run your query as a dry run before executing it. The dry run shows estimated bytes scanned without costing anything. Compare the estimate with and without the partition filter to confirm pruning is working.

-- Dry run to estimate bytes scanned without executing the query
-- In the BigQuery console, click "More" > "Dry run" before running
-- Or use the bq CLI:
-- bq query --dry_run --use_legacy_sql=false 'SELECT ...'

Inspect existing partitions. Use the INFORMATION_SCHEMA.PARTITIONS query from the previous section to verify that data is distributed across partitions as expected. If all data is in a single partition, your partitioning column or granularity may need adjustment.

Avoid accidental full-table scans. Enable require_partition_filter on tables where full scans are never intentional. Monitor the total_bytes_processed field in INFORMATION_SCHEMA.JOBS to catch queries that scan more than expected. If a query is scanning too much data, see troubleshooting BigQuery query too large for guidance.

Partition expiry

Partition expiry automatically deletes partitions older than a specified number of days. This is useful for event tables, logs, and any data with a defined retention window.

-- Set partition expiry to 90 days at table creation
CREATE TABLE analytics.events (
  event_id STRING,
  user_id STRING,
  event_timestamp TIMESTAMP
)
PARTITION BY DATE(event_timestamp)
OPTIONS (partition_expiration_days = 90);

-- Or add it to an existing table
ALTER TABLE analytics.events
SET OPTIONS (partition_expiration_days = 90);

Expired partitions are permanently deleted

Partition expiry is a lifecycle and deletion setting, not a soft delete or archive feature. Once a partition expires, the data is gone. There is no undo, no recycle bin, and no recovery path. Before enabling expiry on a production table, verify that the window matches your data retention requirements and that no downstream process depends on data older than the expiry window.

Common beginner mistakes

Filtering on the wrong column. A query with WHERE user_id = ‘abc’ on a date-partitioned table does not trigger pruning. The filter must be on the partition column. Enable require_partition_filter = true on large tables to catch these queries before they run.
Choosing the wrong partition column. Pick the column your queries filter on most consistently. If queries filter by date 90% of the time, partition by date. If you partition by a column that queries rarely use, you get the metadata overhead of partitioning with none of the pruning benefit.
Making partitions too granular. Partitioning by hour on a low-volume table creates thousands of tiny partitions. Each partition carries metadata overhead and BigQuery limits tables to 4,000 partitions. Match granularity (hour, day, month, year) to your data volume.
Clustering columns in the wrong order. BigQuery sorts by clustering columns left to right. If queries almost always filter on event_type, list it first. Filtering only on the second or third clustering column gives diminished pruning benefit.
Enabling partition expiry without matching retention requirements. Expired partitions are permanently deleted. If a downstream ETL or ELT pipeline depends on historical data, deleting it silently will break that pipeline.
Assuming partitioning can be added in place. You cannot convert an existing non-partitioned table into a partitioned table. You must create a new partitioned table and copy the data into it, then swap the table references.

Frequently asked questions

What is a partitioned table in BigQuery?

A partitioned table is a BigQuery table divided into segments called partitions based on the values in one column. When you query the table and filter on that column, BigQuery reads only the matching partitions and skips the rest. This reduces bytes scanned and lowers query cost.

What is partition pruning?

Partition pruning is the process where BigQuery skips partitions that cannot match your WHERE clause. For a table partitioned by date, filtering on WHERE DATE(event_timestamp) = '2026-03-01' reads only that single partition. Without a filter on the partition column, BigQuery scans every partition in the table.

Partitioning vs clustering: which should I choose?

Use partitioning when queries consistently filter on one column such as a date. Use clustering when queries filter on multiple columns or when the filter column has high cardinality. For large tables, use both: partition on the date column and cluster on the columns you filter most frequently within each partition.

Can I partition an existing table?

Not in place. Partitioning must be set at table creation. To partition an existing non-partitioned table, create a new partitioned table and copy the data using CREATE TABLE ... AS SELECT or bq cp. Then swap the old table for the new one.

Does require_partition_filter improve cost control?

Yes. Setting require_partition_filter to true on a table causes BigQuery to reject any query that does not include a filter on the partition column. This prevents accidental full-table scans, which is especially valuable on large tables where a single unfiltered query can scan terabytes.

Last verified: 26 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.

BigQuery Partitioned Tables: How Pruning Cuts Cost

BigQuery partitioned tables in simple terms

Filing cabinet with labelled drawers

Why partitioning matters

How partition pruning works

Skipping chapters you do not need

Partition types in BigQuery

Ingestion-time partitioning

Time-unit column partitioning

Integer-range partitioning

Partitioning vs clustering

Library floors and shelves

When to use partitioned tables

When partitioning is the wrong tool

How to create and use a partitioned table

Create a partitioned table

Create a partitioned and clustered table

Query with a partition filter

Set require_partition_filter

Inspect partitions with INFORMATION_SCHEMA

Using the bq CLI

How to know it is working

Partition expiry

Common beginner mistakes

Summary

Frequently asked questions

BigQuery Partitioned Tables: How Pruning Cuts Cost

BigQuery partitioned tables in simple terms

Filing cabinet with labelled drawers

Why partitioning matters

How partition pruning works

Skipping chapters you do not need

Partition types in BigQuery

Ingestion-time partitioning

Time-unit column partitioning

Integer-range partitioning

Partitioning vs clustering

Library floors and shelves

When to use partitioned tables

When partitioning is the wrong tool

How to create and use a partitioned table

Create a partitioned table

Create a partitioned and clustered table

Query with a partition filter

Set require_partition_filter

Inspect partitions with INFORMATION_SCHEMA

Using the bq CLI

How to know it is working

Partition expiry

Common beginner mistakes

Summary

Related pages

Frequently asked questions