What Is BigQuery? Architecture, Pricing, and Use Cases

BigQuery is Google Cloud’s serverless data warehouse. You write SQL, BigQuery distributes the work across thousands of machines, and you get results on billions of rows in seconds. There are no servers to provision, no indexes to create, and no clusters to manage. You pay for the data you store and the data your queries scan.

BigQuery is the right tool when you need to analyse large volumes of data: dashboards, historical reporting, event analytics, business intelligence. It is the wrong tool when you need fast single-row lookups, high-frequency transactional writes, or sub-10ms response times. For those workloads, Cloud SQL or Cloud Spanner is a better fit.

BigQuery in simple terms

Analogy

A traditional database is like a filing cabinet. Each drawer holds a complete folder for one record, and you can pull a single folder out quickly by its label. BigQuery is more like a research team in a warehouse full of spreadsheets. Ask it to find one specific receipt and it is slow. Ask it to total up every transaction from the last three years and it finishes before the filing cabinet has opened its second drawer.

BigQuery stores each column of data separately. When you run a query, it reads only the columns you asked for across the entire dataset at once. That design makes it slow at finding one specific record but extremely fast at summarising millions of them.

You interact with BigQuery using standard SQL. If you have written SELECT, JOIN, and GROUP BY queries before, you already know the language. The difference is in how BigQuery executes your query and how you are billed for it.

What BigQuery is

BigQuery is an OLAP (Online Analytical Processing) system. It is designed for queries that read large portions of a table and return aggregated results: totals, averages, counts, distributions, and trends over time.

Unlike traditional data warehouses that require you to provision hardware and manage clusters, BigQuery is fully serverless. Google handles capacity, replication, and scaling. You define a schema, load your data, and run queries. That separation of storage and compute is what makes BigQuery different from running PostgreSQL or MySQL on a managed instance.

BigQuery uses GoogleSQL, which is ANSI SQL compliant. There is no proprietary query language to learn. Standard SQL skills transfer directly.

Note

OLAP means your queries read lots of rows and crunch them into summaries. OLTP (the opposite) means your queries read or write one row at a time. BigQuery is OLAP only. If you need OLTP, look at Cloud SQL or Cloud Spanner.

How BigQuery works

Columnar storage

BigQuery stores data in a columnar format called Capacitor. Instead of writing each row as a unit, it stores all values for a single column together on disk. When your query asks for five columns from a 100-column table, BigQuery reads only those five columns and skips the other 95 entirely.

This is why SELECT * is expensive. It forces BigQuery to read every column, even the ones you do not need. On a wide table, that can cost 10x to 20x more than selecting only the columns your query actually uses.

Analogy

Imagine a library where every book’s chapters are stored on separate shelves: all Chapter 1s on one shelf, all Chapter 2s on the next, and so on. If you only need Chapter 3, you grab it from one shelf and ignore the rest. SELECT * is like photocopying every chapter of every book when you only needed one.

Serverless execution

When you submit a query, BigQuery’s execution engine (based on the Dremel architecture) breaks it into stages and distributes those stages across thousands of workers. You do not choose how many workers to use or manage any infrastructure. BigQuery allocates compute automatically and releases it when the query finishes. For a deeper look at how Dremel, Colossus, and slots fit together, see BigQuery Architecture.

Projects, datasets, and tables

BigQuery organises data in a three-level hierarchy. A project contains one or more datasets, and each dataset contains tables and views. Datasets are the primary unit of access control and have a geographic location (for example, europe-west2) that is set at creation and cannot be changed afterward.

A fully qualified table reference looks like project_id.dataset_id.table_id. In SQL, write it with backticks when the project ID contains a hyphen: my-project.analytics.events.

Billing by bytes scanned

Under on-demand pricing, you are charged based on the number of bytes your query reads from storage. This is the most important mental model for controlling costs in BigQuery. Fewer columns and tighter partition filters mean fewer bytes scanned and a lower bill.

Warning

Adding LIMIT 10 to a query does not reduce bytes scanned. BigQuery reads the full data before applying the limit. To actually reduce cost, use partition filters and select only the columns you need.

Tip

Run a dry run before executing a query on an unfamiliar table. It tells you exactly how many bytes BigQuery will scan before you incur any cost. Use bq query —dry_run from the CLI or check the query validator in the console.

When to use BigQuery

BigQuery is built for analytical workloads that read large amounts of data. Common use cases include:

  • Business intelligence and dashboards. Connecting tools like Looker, Data Studio, or Metabase to BigQuery for daily, weekly, or ad-hoc reporting across millions of rows.
  • Event and clickstream analytics. Storing billions of user events (page views, clicks, purchases) and querying them for funnels, cohorts, and retention analysis.
  • ELT pipelines. Loading raw data into BigQuery first, then transforming it with SQL. This ELT approach takes advantage of BigQuery’s compute to handle transformations at scale.
  • Historical analysis. Querying months or years of accumulated data for trends, forecasts, or anomaly detection.
  • Data exploration. Running ad-hoc SQL against datasets you have not seen before to understand structure, quality, and patterns.
  • Cross-source joins. Joining data from multiple sources (application logs, CRM exports, ad platform data) in one place for unified analysis.
Tip

If your use case involves the phrase “aggregate across all rows” or “report on the last 12 months,” BigQuery is almost certainly the right tool. If it involves “look up this one user’s profile” or “update this row right now,” it is not.

When not to use BigQuery

BigQuery is not designed for every database workload. Avoid it for:

  • Transactional application queries. If your application inserts, updates, or deletes individual rows at high frequency, use Cloud SQL or Cloud Spanner. BigQuery is not optimised for OLTP patterns.
  • Frequent point lookups. Retrieving a single row by primary key is not what BigQuery is built for. A columnar scan engine adds unnecessary overhead for simple key-value access.
  • Low-latency operational workloads. If you need sub-10ms response times for your application’s read path, BigQuery’s query startup time alone will exceed that. Use a purpose-built operational database or cache layer instead.
  • High-write streaming without buffering. While BigQuery supports streaming inserts, writing individual rows from application code is expensive and slow compared to batching. The canonical GCP streaming pattern is Pub/Sub to Dataflow to BigQuery, not direct row-by-row inserts.
Warning

A common early mistake is using BigQuery as your application’s primary database. If your web app writes a row on every page view and reads it back on the next request, you will hit latency and cost problems quickly. Use an OLTP database for the live path and replicate data into BigQuery for analysis.

BigQuery vs Cloud SQL

This is one of the most common questions beginners ask. The short answer: BigQuery is for analytics, Cloud SQL is for applications. They solve different problems and are often used together in the same project.

DimensionBigQueryCloud SQL
Primary use caseAnalytics, reporting, BI (OLAP)Application backends, transactions (OLTP)
Query patternLarge scans, aggregations across millions of rowsSmall reads/writes by primary key
InfrastructureFully serverless, no instance to manageManaged instance (you choose CPU, RAM, disk)
SQL dialectGoogleSQL (ANSI compliant)MySQL, PostgreSQL, or SQL Server
Billing modelPer TB scanned (on-demand) or reserved slotsPer hour the instance runs + storage
LatencySeconds (query startup overhead)Milliseconds for indexed lookups
ScalingAutomatic, no upper boundVertical (resize instance) or read replicas
IndexesNone (uses columnar scans and partitions)Standard B-tree and other index types

A common GCP pattern is to run your application on Cloud SQL for transactional reads and writes, then export or replicate data into BigQuery for analytics and reporting. For a detailed comparison, see BigQuery vs Cloud SQL. If you need globally distributed transactions, Cloud Spanner fills the gap between Cloud SQL and BigQuery. For help choosing the right database for your workload, see Choosing the Right Storage Service.

Pricing basics

BigQuery separates storage cost from compute cost. You pay for each independently.

Storage costs $0.02 per GB per month for active tables. Tables (or partitions) that have not been modified for 90 consecutive days automatically drop to $0.01 per GB per month. The first 10 GB of storage each month is free.

Compute (on-demand) charges $6.25 per TB of data scanned by your queries. The first 1 TB of scanning per month is free. This is the model most beginners and small teams should start with.

Compute (capacity pricing) lets you purchase reserved compute slots at a flat rate. This suits teams with heavy, predictable query workloads where per-TB pricing would be more expensive. Most teams do not need this on day one.

Analogy

Think of on-demand pricing like a pay-per-ride taxi: you pay for each trip based on distance. Capacity pricing is like leasing a car: you pay a fixed monthly rate regardless of how much you drive. If you only take a few trips a week, the taxi is cheaper. If you are driving all day every day, the lease saves money.

The practical takeaway: your biggest cost lever is controlling how many bytes your queries scan. Select only the columns you need, filter on partitioned columns, and use dry runs to check scan size before executing. For a complete breakdown of pricing tiers and cost-reduction strategies, see BigQuery Pricing Explained.

Common beginner mistakes

  1. Using SELECT * on large tables. BigQuery’s columnar storage means unused columns cost nothing to skip, but SELECT * forces a full scan of every column. On a 50-column table where your query needs 5, you pay for 10x more data than necessary. Always list the columns you need.
  2. Treating BigQuery like a transactional database. Inserting rows one at a time from application code is expensive and slow. BigQuery is for analytics on accumulated data, not live application writes. Use Cloud SQL for your application’s write path and load data into BigQuery for analysis.
  3. Ignoring dataset location. Dataset location is set at creation and cannot be changed. If your dataset is in us-central1 but your other resources are in europe-west2, you will pay cross-region egress costs on every query. Choose the right location before you load any data.
  4. Assuming LIMIT reduces cost. LIMIT 10 limits rows returned, not bytes scanned. BigQuery still performs a full scan before applying the limit. Use partition filters and column selection to reduce actual scan volume.
  5. Skipping dry runs on unfamiliar tables. Before running a query against a table you have not seen before, run bq query —dry_run to see the scan size. A single careless query on a multi-terabyte table can cost more than a month of careful usage.
  6. Not partitioning large tables. Without partitions, every query scans the entire table. Partitioning by date or another high-cardinality column lets BigQuery skip irrelevant data entirely. See BigQuery Performance Optimisation for more strategies.

How to get started

The fastest path from zero to a working query:

  1. Open the BigQuery console in any GCP project (the sandbox is free and does not require a billing account for small queries).
  2. Run a query against a public dataset like bigquery-public-data.samples.shakespeare to see how the interface works.
  3. Follow Running Your First Query for a guided walkthrough.
  4. Load your own data using one of the methods in Loading Data into BigQuery.
  5. Learn how to control costs with partitioned tables and the strategies in BigQuery Pricing Explained.

Frequently asked questions

How do I run my first BigQuery query?

Open the BigQuery console in your GCP project, paste a SQL statement into the editor, and click Run. The public dataset bigquery-public-data.samples.shakespeare is available to everyone and costs nothing to query. See the Running Your First Query guide for a step-by-step walkthrough.

How much does BigQuery cost for a small team?

On-demand pricing gives you 1 TB of free query scanning per month and 10 GB of free storage. A small team running a few dashboards will often stay within the free tier. After that, queries cost $6.25 per TB scanned and active storage costs $0.02 per GB per month.

Can I use BigQuery for transactional application queries?

No. BigQuery is an analytics engine designed for large reads and aggregations. It is not suited for high-frequency writes, point lookups by primary key, or sub-10ms latency requirements. Use Cloud SQL or Cloud Spanner for transactional workloads.

How do I get data into BigQuery?

The most common methods are batch loading files from Cloud Storage (CSV, JSON, Avro, Parquet), streaming inserts via the API, and the BigQuery Data Transfer Service for sources like Google Analytics or Google Ads. See Loading Data into BigQuery for the full list of options.

Do my existing SQL skills work in BigQuery?

Yes. BigQuery uses GoogleSQL, which is ANSI SQL compliant. Standard SELECT, JOIN, GROUP BY, window functions, and CTEs all work. The main difference is how you think about cost: every query is billed by bytes scanned, so you optimise by selecting fewer columns and filtering on partitions.

Last verified: 26 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.