BigQuery Partitioned Tables: How Pruning Cuts Cost
A partitioned table in BigQuery is a table split into smaller segments based on the values in one column, usually a date. When your query filters on that column, BigQuery reads only the matching partitions and skips everything else. The result is fewer bytes scanned, lower cost, and faster queries on large tables. If your table has more than a few gigabytes and your queries consistently filter on a specific column, partitioning should be one of the first things you set up.
BigQuery partitioned tables in simple terms
Filing cabinet with labelled drawers
A partitioned table works like a filing cabinet where each drawer is labelled with a date. When you need March data, you pull open the March drawer and ignore every other drawer entirely. Without partitioning, every query dumps the entire cabinet onto the floor and sorts through everything.
On a table with two years of daily data, a query for a single day scans all 730 days of rows if the table is not partitioned. With partitioning, the same query touches only one day’s worth of data.
The problem partitioning solves is straightforward: large tables are expensive and slow to scan in full. Partitioning lets BigQuery skip the irrelevant portions automatically, as long as your query filters on the partition column. If you are working with BigQuery for the first time, think of partitioning as the most direct way to control scan volume on growing tables.
Why partitioning matters
BigQuery charges for on-demand queries based on the number of bytes scanned. On a non-partitioned table with two years of event data, a query asking about yesterday scans every byte in the table. On a date-partitioned table, the same query reads only yesterday’s partition.
A 10 TB event table at on-demand pricing ($6.25 per TB scanned) costs $62.50 per full-table scan. If yesterday’s partition holds roughly 14 GB, the same query costs about $0.09. That is a 700x cost reduction from a single WHERE clause.
Over hundreds of queries per day, those savings compound fast. For a deeper look at how scan volume affects your bill, see BigQuery pricing explained.
Performance improves for the same reason. Fewer bytes to read means fewer workers needed and faster results. On large production tables, partitioning is not a nice-to-have. It is a baseline requirement.
How partition pruning works
Partition pruning is the mechanism that makes partitioning useful. When BigQuery compiles your query, it checks whether your WHERE clause filters on the partition column. If it does, BigQuery identifies which partitions match and reads only those. Every other partition is skipped entirely.
Skipping chapters you do not need
Partition pruning is like using a book’s table of contents to jump straight to Chapter 7 instead of reading every page from the start. BigQuery checks which “chapters” (partitions) your query needs, opens only those, and leaves the rest closed.
Here is a query that prunes correctly on a table partitioned by DATE(event_timestamp):
-- Partition pruning: reads only the 2026-03-01 partition
SELECT user_id, event_type, COUNT(*) AS events
FROM analytics.events
WHERE DATE(event_timestamp) = '2026-03-01'
GROUP BY user_id, event_type;And here is a query on the same table that does not prune:
-- No pruning: scans every partition in the table
SELECT user_id, event_type, COUNT(*) AS events
FROM analytics.events
WHERE user_id = 'usr-5678'
GROUP BY user_id, event_type;The second query filters on user_id, not on the partition column. BigQuery has no way to know which partitions contain rows for that user, so it must open and scan all of them. The filter still reduces the result set, but it does not reduce the bytes scanned or the cost.
Pruning only works with direct filters on the partition column. Wrapping the column in a function that BigQuery cannot evaluate at compile time, or filtering through a subquery, may prevent pruning.
Partition types in BigQuery
Ingestion-time partitioning
Rows are assigned to partitions based on when BigQuery ingested them. The partition column is a pseudo-column called _PARTITIONTIME.
When to use it: when you do not have a reliable timestamp column in your data, or when you are streaming data and want automatic daily partitions without schema changes.
-- Create a table with ingestion-time partitioning
CREATE TABLE analytics.raw_logs (
log_message STRING,
severity STRING
)
PARTITION BY _PARTITIONTIME;Ingestion-time partitioning is tied to load time, not event time. If a row describing a March 1 event arrives on March 3, it goes into the March 3 partition. This can make date-filtered queries return incomplete results for a given day.
Time-unit column partitioning
Rows are partitioned by the value of a top-level DATE, TIMESTAMP, or DATETIME column in your schema. You choose the granularity: hour, day, month, or year.
This is the most common partitioning strategy. Use it when your data has a reliable event timestamp and your queries consistently filter on that timestamp. It gives you precise control over which column drives partitioning.
-- Create a table partitioned by day on a TIMESTAMP column
CREATE TABLE analytics.events (
event_id STRING,
user_id STRING,
event_type STRING,
event_timestamp TIMESTAMP
)
PARTITION BY DATE(event_timestamp);Tradeoff: you must choose the right granularity. Partitioning by hour on a table that receives only a few hundred rows per hour creates many tiny partitions with high metadata overhead. Match granularity to your data volume and query patterns.
Integer-range partitioning
Rows are partitioned by an INTEGER column using a defined range and interval. For example, partition a customer_id column into ranges of 10,000.
When to use it: when your natural query filter is an integer key rather than a date. This is less common but useful for tables where every query filters on an ID range.
-- Create a table with integer-range partitioning
CREATE TABLE analytics.customer_metrics (
customer_id INT64,
metric_name STRING,
metric_value FLOAT64
)
PARTITION BY RANGE_BUCKET(customer_id, GENERATE_ARRAY(0, 1000000, 10000));Tradeoff: you must define the range boundaries upfront. Values outside the defined range go into an overflow partition. If your ID space grows beyond the defined range, the overflow partition accumulates data and pruning becomes less effective for those rows.
Partitioning vs clustering
Partitioning and clustering both reduce bytes scanned, but they work at different levels. Partitioning divides the table into separate physical segments that BigQuery can skip entirely. Clustering sorts the rows within each partition by up to four columns, allowing BigQuery to skip blocks of rows within a partition.
Library floors and shelves
Partitioning is like a library with separate floors for each year. Clustering is like alphabetising the books on each floor. When you need a 2024 book by an author starting with “M”, you go to the 2024 floor (partitioning skips all other floors) and then jump straight to the M section (clustering skips irrelevant shelves).
| Partitioning | Clustering | |
|---|---|---|
| Best for | Filtering on one column (usually date) | Filtering on multiple columns or high-cardinality values |
| Pruning granularity | Entire partitions | Blocks within a partition |
| Ideal filter pattern | Equality or range on the partition column | Equality or range on the first clustering column |
| Cost estimate before execution | Accurate (BigQuery knows partition sizes) | Upper-bound only (block-level pruning happens at runtime) |
| When it breaks down | Too many small partitions increase metadata overhead | Filtering only on the 2nd or 3rd clustering column reduces benefit |
| Use both together | Partition by date, then cluster by the columns you filter on most within each partition. This gives partition-level pruning plus block-level pruning. | |
BigQuery sorts rows by clustering columns left to right. The first clustering column provides the most pruning benefit. If your queries almost always filter on event_type and only sometimes on user_id, list event_type first.
For more on combining these techniques, see BigQuery performance optimisation.
When to use partitioned tables
- Event and clickstream tables where queries almost always filter by date range
- Daily fact tables in a data warehouse that are queried by business date
- Billing and export data such as BigQuery billing export tables that grow continuously and are queried by invoice period
- Log and time-series analytics where retention windows drive queries to recent data
- Large append-heavy analytics tables that grow by millions of rows per day and are queried for recent windows
Consider partitioning any table over a few gigabytes where queries consistently filter on one column. If that column is a date or timestamp, you are in the sweet spot for time-unit partitioning.
When partitioning is the wrong tool
Many tiny partitions. If your partition column has extremely high cardinality or very low data volume per partition, you end up with thousands of small partitions. Each partition carries metadata overhead, and BigQuery has a limit of 4,000 partitions per partitioned table. In these cases, clustering alone may be a better fit.
Multi-column or high-cardinality filters. If your queries filter on several columns and rarely use a single dominant column, clustering across those columns often provides better pruning than partitioning on one of them.
Queries rarely filter on the partition column. Partitioning only helps when queries include a filter on the partition column. If most of your queries use full-table aggregations or filter on unrelated columns, partitioning adds complexity without benefit. For these workloads, focus on other performance techniques such as column selection and approximate aggregation.
How to create and use a partitioned table
Create a partitioned table
CREATE TABLE analytics.events (
event_id STRING,
user_id STRING,
event_type STRING,
event_timestamp TIMESTAMP,
payload STRING
)
PARTITION BY DATE(event_timestamp);Create a partitioned and clustered table
Combining partitioning and clustering gives you both partition-level and block-level pruning. If you are loading data into BigQuery for analytics, this is the recommended table setup for large event tables.
CREATE TABLE analytics.events
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type
AS SELECT * FROM staging.raw_events;Query with a partition filter
-- Prunes to one day's partition, then uses clustering to skip blocks
SELECT user_id, COUNT(*) AS event_count
FROM analytics.events
WHERE DATE(event_timestamp) = '2026-03-01'
AND event_type = 'purchase'
GROUP BY user_id;Set require_partition_filter
This option causes BigQuery to reject queries that do not filter on the partition column. It is a cost-control guardrail that prevents accidental full-table scans.
-- Enable require_partition_filter on an existing table
ALTER TABLE analytics.events
SET OPTIONS (require_partition_filter = true);Once require_partition_filter is enabled, any query that omits a filter on the partition column fails immediately. This includes ad-hoc exploration queries. Make sure your team knows this setting is active before enabling it on shared tables.
-- This query would be rejected with require_partition_filter enabled
SELECT COUNT(*) FROM analytics.events WHERE event_type = 'purchase';Inspect partitions with INFORMATION_SCHEMA
You can check which partitions exist and how large they are using INFORMATION_SCHEMA.PARTITIONS.
SELECT
partition_id,
total_rows,
total_logical_bytes
FROM `project.analytics.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'events'
ORDER BY partition_id DESC
LIMIT 10;Using the bq CLI
You can also create partitioned tables and query partition metadata from the command line. This is useful when scripting table creation in data pipelines.
# Create a partitioned table from the command line
bq mk \
--table \
--time_partitioning_field event_timestamp \
--time_partitioning_type DAY \
--clustering_fields user_id,event_type \
analytics.events \
event_id:STRING,user_id:STRING,event_type:STRING,event_timestamp:TIMESTAMP,payload:STRINGHow to know it is working
Check bytes processed. After running a query, look at the “Bytes processed” line in the BigQuery console or in the query job metadata. If your partition filter is working, bytes processed should be a fraction of the total table size.
Run your query as a dry run before executing it. The dry run shows estimated bytes scanned without costing anything. Compare the estimate with and without the partition filter to confirm pruning is working.
-- Dry run to estimate bytes scanned without executing the query
-- In the BigQuery console, click "More" > "Dry run" before running
-- Or use the bq CLI:
-- bq query --dry_run --use_legacy_sql=false 'SELECT ...'Inspect existing partitions. Use the INFORMATION_SCHEMA.PARTITIONS query from the previous section to verify that data is distributed across partitions as expected. If all data is in a single partition, your partitioning column or granularity may need adjustment.
Avoid accidental full-table scans. Enable require_partition_filter on tables where full scans are never intentional. Monitor the total_bytes_processed field in INFORMATION_SCHEMA.JOBS to catch queries that scan more than expected. If a query is scanning too much data, see troubleshooting BigQuery query too large for guidance.
Partition expiry
Partition expiry automatically deletes partitions older than a specified number of days. This is useful for event tables, logs, and any data with a defined retention window.
-- Set partition expiry to 90 days at table creation
CREATE TABLE analytics.events (
event_id STRING,
user_id STRING,
event_timestamp TIMESTAMP
)
PARTITION BY DATE(event_timestamp)
OPTIONS (partition_expiration_days = 90);
-- Or add it to an existing table
ALTER TABLE analytics.events
SET OPTIONS (partition_expiration_days = 90);Partition expiry is a lifecycle and deletion setting, not a soft delete or archive feature. Once a partition expires, the data is gone. There is no undo, no recycle bin, and no recovery path. Before enabling expiry on a production table, verify that the window matches your data retention requirements and that no downstream process depends on data older than the expiry window.
Common beginner mistakes
- Filtering on the wrong column. A query with
WHERE user_id = ‘abc’on a date-partitioned table does not trigger pruning. The filter must be on the partition column. Enablerequire_partition_filter = trueon large tables to catch these queries before they run. - Choosing the wrong partition column. Pick the column your queries filter on most consistently. If queries filter by date 90% of the time, partition by date. If you partition by a column that queries rarely use, you get the metadata overhead of partitioning with none of the pruning benefit.
- Making partitions too granular. Partitioning by hour on a low-volume table creates thousands of tiny partitions. Each partition carries metadata overhead and BigQuery limits tables to 4,000 partitions. Match granularity (hour, day, month, year) to your data volume.
- Clustering columns in the wrong order. BigQuery sorts by clustering columns left to right. If queries almost always filter on
event_type, list it first. Filtering only on the second or third clustering column gives diminished pruning benefit. - Enabling partition expiry without matching retention requirements. Expired partitions are permanently deleted. If a downstream ETL or ELT pipeline depends on historical data, deleting it silently will break that pipeline.
- Assuming partitioning can be added in place. You cannot convert an existing non-partitioned table into a partitioned table. You must create a new partitioned table and copy the data into it, then swap the table references.
Summary
- A partitioned table divides data into segments based on one column. Partition pruning skips segments that do not match your WHERE clause, reducing bytes scanned and cost.
- Three partition types: ingestion-time, time-unit column (DATE/TIMESTAMP/DATETIME), and integer-range. Time-unit column partitioning on your event timestamp is the most common approach.
- Clustering sorts rows within partitions by up to four columns, enabling block-level pruning. Column order matters: the first clustering column provides the most benefit.
- Use partitioning and clustering together on large tables for maximum cost and performance improvement.
- Enable
require_partition_filter = trueto prevent accidental full-table scans. - Partition expiry permanently deletes old partitions. Set it to match your retention policy and verify no downstream process depends on the expired data.
- Partitioning must be set at table creation. To partition an existing table, create a new partitioned table and copy the data.
Frequently asked questions
What is a partitioned table in BigQuery?
A partitioned table is a BigQuery table divided into segments called partitions based on the values in one column. When you query the table and filter on that column, BigQuery reads only the matching partitions and skips the rest. This reduces bytes scanned and lowers query cost.
What is partition pruning?
Partition pruning is the process where BigQuery skips partitions that cannot match your WHERE clause. For a table partitioned by date, filtering on WHERE DATE(event_timestamp) = '2026-03-01' reads only that single partition. Without a filter on the partition column, BigQuery scans every partition in the table.
Partitioning vs clustering: which should I choose?
Use partitioning when queries consistently filter on one column such as a date. Use clustering when queries filter on multiple columns or when the filter column has high cardinality. For large tables, use both: partition on the date column and cluster on the columns you filter most frequently within each partition.
Can I partition an existing table?
Not in place. Partitioning must be set at table creation. To partition an existing non-partitioned table, create a new partitioned table and copy the data using CREATE TABLE ... AS SELECT or bq cp. Then swap the old table for the new one.
Does require_partition_filter improve cost control?
Yes. Setting require_partition_filter to true on a table causes BigQuery to reject any query that does not include a filter on the partition column. This prevents accidental full-table scans, which is especially valuable on large tables where a single unfiltered query can scan terabytes.