BigQuery Architecture Explained: How Storage, Compute, and Slots Work

BigQuery splits storage and compute into two independent systems. Your data sits in Colossus, Google’s distributed file system, stored in a columnar format. Your queries run on Dremel, a parallel engine that fans work across thousands of workers called slots. The two sides communicate over Jupiter, Google’s internal petabit network. This page explains how those pieces fit together, why the design makes BigQuery fast for analytics, and where it falls short.

After reading this, you will understand how a query moves from SQL text to a finished result, why SELECT * costs more than selecting specific columns, and when BigQuery is the wrong tool for your workload. If you are new to BigQuery itself, start with the BigQuery Overview first.

Simple explanation

BigQuery stores data separately from the engine that runs queries. That is the core idea.

Data is stored in columns, not rows. When your query asks for three columns out of a hundred, BigQuery reads only those three and skips the rest. This is why analytical queries that aggregate or filter a few columns across billions of rows finish quickly.

When a query runs, BigQuery breaks it into pieces and hands each piece to a different worker. Thousands of workers run at the same time, each reading a slice of the data. The results get combined at the end. These workers are called slots.

This design is excellent for scanning large datasets. It is poor for finding a single row by ID, because there are no row-level indexes. BigQuery has to scan the entire column to find one value.

Why this architecture matters

You do not need to understand BigQuery’s internals to write SQL. But you do need to understand them to write efficient SQL. Knowing why SELECT * is expensive, why partition filters speed up queries, and why point lookups are slow shapes better design decisions before you write a single line.

Architecture knowledge also affects cost. BigQuery on-demand pricing charges per byte scanned. Reading fewer columns and pruning partitions directly reduces your bill. The BigQuery Pricing Explained page covers this in detail, but the pricing model only makes sense once you understand the storage format.

Cost connection

Every architectural concept on this page has a direct cost consequence. Columnar storage means fewer columns = fewer bytes scanned = lower bill. Partition pruning means fewer partitions read = lower bill. Understanding the architecture is understanding the pricing. See BigQuery Cost Optimisation for practical techniques.

It also guides debugging. If a query is slower than expected, the architecture tells you where to look: slot contention, data skew, missing partition filters, or a cross join producing unexpected fanout.

How BigQuery architecture works

Storage layer: Colossus and Capacitor

BigQuery stores all table data in Colossus, Google’s distributed file system. Colossus handles replication, durability, and geo-distribution transparently. You never interact with it directly.

Inside Colossus, data is stored in Capacitor, a proprietary columnar format. In a columnar format, all values for a single column are stored together rather than each row together. If your query reads three columns from a 100-column table, BigQuery reads only the physical data for those three columns and skips the other 97 entirely.

Capacitor also compresses data per column. Values within a column share a type and often have repeated patterns, so compression ratios are typically 4 to 10 times better than raw CSV. This reduces the volume of data read from disk further still. When you load data into BigQuery, it converts your source format into Capacitor automatically.

Compute layer: Dremel and slots

Dremel is the query engine that powers BigQuery. It is a massively parallel processing (MPP) engine that breaks a query into a tree of sub-tasks and distributes them across thousands of workers simultaneously. This is why BigQuery can scan terabytes in seconds: the work is divided across an enormous fleet of machines.

The unit of compute in Dremel is a slot. A slot is roughly equivalent to a virtual CPU with some associated memory. BigQuery automatically determines how many slots to assign to each query. On on-demand pricing, queries draw from a shared pool. On capacity pricing, you purchase reserved slots used only by your project.

A single large query may consume thousands of slots in parallel for a few seconds, then release them. You do not pre-allocate per query; BigQuery handles scheduling transparently.

Why storage and compute are separated

The most important architectural aspect of BigQuery is that Colossus and Dremel are completely separate systems connected by Jupiter, Google’s internal petabit-speed network fabric.

This separation means you can scale each dimension independently. You can store petabytes of data without provisioning any compute. You can run a burst of heavy queries without paying for additional storage. There is no cluster to size, no nodes to add, and no data redistribution when you scale.

Idle time costs almost nothing for compute. Traditional MPP warehouses require you to keep a cluster running even when no queries execute. BigQuery charges for storage continuously, but compute only when a query runs. For a deeper look at how this affects your bill, see BigQuery Cost Optimisation.

Why the network is not a bottleneck

The Jupiter network between Colossus and Dremel workers is fast enough that reading data over the network does not slow things down. This is why the architecture works at scale: the network does not become the constraint even for multi-terabyte scans.

How a query executes step by step

Here is what happens from the moment you submit SQL to the moment you get results:

  1. SQL submitted. You send a query through the console, CLI, or API. BigQuery receives it.
  2. Query parsed and optimised. The query planner parses the SQL, validates it against table schemas, and produces an optimised execution plan. It decides which partitions to read and which columns to scan.
  3. Stages created. The plan is broken into a tree of stages. Each stage represents a distinct operation: scan, filter, aggregate, join, or sort. Stages form a directed acyclic graph.
  4. Slots assigned. Dremel assigns slot workers to each stage based on the data volume and available capacity. A single stage can receive hundreds or thousands of slots.
  5. Workers read column data. Each slot worker reads its assigned slice of column data directly from Colossus. Because data is columnar, workers read only the columns the query needs. Partition filters eliminate entire partitions before reading begins.
  6. Intermediate results shuffled. Between stages, partial results are redistributed across workers. For example, after scanning, data may be repartitioned by a join key so each worker handles one key range.
  7. Final result aggregated and returned. The root stage collects partial results from all workers, performs any remaining aggregation, and returns the final output to you.

You can inspect this process after any query runs. The query plan explanation in the BigQuery console shows each stage, how many workers ran, how much data each stage read and wrote, and where time was spent. It is your primary tool for diagnosing slow queries. See BigQuery Performance Optimisation for guidance on reading execution plans.

Why BigQuery is fast for analytics but slow for point lookups

BigQuery’s architecture is optimised for one pattern: reading large amounts of columnar data in parallel. Thousands of slots each read a slice of the relevant columns simultaneously. For aggregate queries like COUNT, SUM, or AVG across billions of rows, this parallelism makes BigQuery extremely fast.

Point lookups are the opposite pattern. To find a single row where user_id = ‘abc123’, BigQuery has no row-level index to jump to that row. It scans the entire user_id column across the table. On a billion-row table, that means reading a billion values to find one.

Partitioning and clustering help but are coarse-grained. A date partition filter might eliminate 364 out of 365 days. Clustering on user_id lets BigQuery skip blocks that cannot contain the target value. But within a matching block, it still scans every value. This is fundamentally different from a B-tree index in a relational database, which can locate a row in microseconds.

A practical example: why column and partition pruning matter

Suppose you have a table called events with 100 columns, partitioned by event_date, containing a year of data. Consider two queries:

Expensive query
SELECT * FROM events

This reads all 100 columns across all 365 partitions. BigQuery charges for every byte scanned. With a 500 GB table, you pay for the full 500 GB.

Efficient query
SELECT user_id, event_type, COUNT(*)
FROM events
WHERE event_date = ‘2026-03-01’
GROUP BY user_id, event_type

This reads only two columns (user_id and event_type) from one partition (a single day). Instead of 500 GB, it might scan 200 MB. The query runs faster and costs a fraction of the price.

Now consider a point lookup on the same table:

SELECT * FROM events WHERE user_id = ‘abc123’

Even though you want one row, BigQuery has to scan the user_id column across every partition (no date filter) and every column (SELECT *). Without a partition filter or column selection, this query is slow and expensive despite returning almost no data. For workloads that need fast single-row access, Bigtable or Cloud SQL are better choices.

When to use BigQuery

BigQuery is the right choice when your workload matches its architecture:

  • Analytics and reporting. Aggregating, filtering, and joining large datasets for dashboards, reports, or ad hoc exploration.
  • Data warehousing. Storing historical data from multiple sources and running ELT pipelines to transform it in place.
  • Large scans. Queries that read millions or billions of rows to produce summaries, trends, or segments.
  • Batch processing. Scheduled queries, materialized views, and periodic transformations on warehouse or lake data.
  • Machine learning. BigQuery ML lets you train models directly in the warehouse using SQL.
  • Streaming ingestion with analytics. Combining real-time inserts with analytical queries, especially when paired with Dataflow pipelines.

When not to use BigQuery

BigQuery is the wrong tool when your workload conflicts with its architecture:

  • Low-latency point lookups. Fetching a single row by key in under 10 milliseconds is not what BigQuery does. Use Bigtable or Cloud SQL.
  • OLTP workloads. High-frequency row-level inserts, updates, and deletes with transactional guarantees require a relational database, not a warehouse.
  • Application backends. If your web or mobile app needs to read/write individual records per request, BigQuery’s query latency (typically 1-3 seconds minimum) is too high.
  • Row-by-row transactions. BigQuery does not support multi-row ACID transactions the way Cloud SQL or Spanner does.
  • Small, frequently updated datasets. A table with 10,000 rows that gets updated every second is better served by a traditional database.
A common trap

Teams sometimes use BigQuery as an application backend because the SQL interface feels familiar. This works in development with small datasets, but breaks in production when query latency and concurrency limits become a problem. Choose the right database for the access pattern before you build around it.

BigQuery vs Cloud SQL vs Bigtable

These three services cover different workload shapes. Choosing the wrong one leads to either poor performance or unnecessary cost. For a detailed breakdown, see BigQuery vs Cloud SQL.

BigQueryCloud SQLBigtable
Best workloadAnalytics, reporting, data warehousingOLTP, application backends, transactional dataHigh-throughput reads/writes, time-series, IoT
Query patternFull-table scans, aggregations, joins across large datasetsPoint lookups, indexed queries, joins on small-to-medium tablesRow-key lookups, range scans on sorted keys
Typical latency1-30 seconds per query1-100 milliseconds per queryUnder 10 milliseconds per read/write
Storage modelColumnar (Capacitor in Colossus), storage separate from computeRow-oriented (MySQL/PostgreSQL), storage on attached disksWide-column (sorted by row key), distributed across nodes
Compute modelServerless slots, no provisioningProvisioned VM instancesProvisioned node clusters
Wrong choice whenYou need sub-second lookups or row-level transactionsYou need to scan terabytes for analyticsYou need SQL, joins, or complex aggregations

Common beginner mistakes

  1. Treating BigQuery like a row-oriented database. Filtering on a non-partition, non-clustered column still requires a full column scan. A WHERE user_id = ‘abc’ clause does not make a query fast unless user_id is a partition or clustering column. If you need indexed lookups, use Cloud SQL or Bigtable.
  2. Expecting WHERE on arbitrary columns to behave like an indexed query. In a relational database, you add an index and the query planner uses it. BigQuery has no equivalent. Partition pruning and clustering are the closest tools, but they work on specific columns you define at table creation, not on any column in a WHERE clause.
  3. Not understanding slot contention vs on-demand capacity. On the on-demand tier, many heavy concurrent queries compete for a shared slot pool. This causes queue latency, not a pricing problem. Capacity pricing with dedicated slots gives you predictable throughput, but you pay for those slots whether you use them or not. Calculate the trade-off before switching.
  4. Confusing BigQuery storage with Cloud Storage. BigQuery manages its own internal storage in Colossus. Cloud Storage is a separate service. You can load data from Cloud Storage into BigQuery or query it via external tables, but the two storage systems are distinct. Data in Cloud Storage is not automatically queryable by BigQuery.
  5. Skipping the execution plan after a slow query. The query plan explanation in the console shows exactly which stage consumed the most time and data. It tells you whether the problem is data skew, insufficient pruning, an inefficient join, or slot contention. Skipping it means guessing instead of diagnosing.

Frequently asked questions

What are BigQuery slots?

A slot is a unit of compute in BigQuery, roughly equivalent to a virtual CPU with some memory. Dremel assigns slots to queries automatically. On-demand pricing draws from a shared pool. Capacity pricing gives you a reserved number of slots dedicated to your project. More slots means more parallel workers on your query.

Why is BigQuery fast for scans but slow for point lookups?

BigQuery is built around columnar storage and massively parallel reads. Full-column scans spread across thousands of workers finish in seconds. But fetching a single row by an arbitrary key requires scanning the entire column because there are no row-level indexes. For point lookups, use Bigtable or Cloud SQL instead.

What is Colossus?

Colossus is Google's distributed file system. BigQuery stores all table data in Colossus, completely separate from the compute layer. Colossus handles replication, durability, and geo-distribution. You never interact with it directly.

What is Dremel?

Dremel is the massively parallel query engine behind BigQuery. It breaks each query into a tree of stages, assigns slot workers to each stage, and fans work out across thousands of machines simultaneously. Dremel is why BigQuery can process terabytes in seconds.

Does BigQuery have indexes?

BigQuery does not have traditional row-level indexes like a relational database. Instead, it relies on partitioning and clustering for data pruning. Partition filters eliminate entire date ranges. Clustering sorts data within partitions so BigQuery can skip irrelevant blocks. These are coarse-grained compared to B-tree indexes but effective for analytical workloads.

Last verified: 26 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.