Firestore Data Model Explained: Collections, Documents, and Schema Design

The Firestore data model is built on three things: collections, documents, and subcollections. Understanding how they connect is the foundation. The more important lesson is that Firestore rewards developers who design their data structure around their query patterns from the start. Get this right and reads are fast and cheap. Get it wrong and you are restructuring data in production.

This page covers how Firestore organises data, how to model real application structures like users and orders, why denormalisation is expected, and the decisions that affect what you can query and what it costs. If you are new to Firestore, the Firestore Overview covers when the service fits and how per-operation billing works.

The simple explanation

Analogy

Think of Firestore as a filing cabinet. Each drawer is a collection. Each folder inside a drawer is a document. Each folder contains data fields: a name, an email address, a status, a timestamp. Some folders have a mini-cabinet attached called a subcollection, which holds more folders of its own.

There is no template every folder must follow. Two folders in the same drawer can hold completely different things. That flexibility is real but it is also a responsibility: the database will not stop you from storing data inconsistently.

The complexity in Firestore does not come from the structure itself. It comes from understanding how structure decisions control what you can query later, and what those queries cost per read.

How Firestore organises data

Collections

A collection is a container for documents. It has no properties of its own. Collections come into existence when you create the first document inside them, and disappear when the last document is removed. You do not create or delete collections directly.

A typical application has a small number of top-level collections: users, products, orders. Each one holds all documents of that type.

Documents

A document is the unit of data in Firestore. It is a set of named values similar to a JSON object, living inside a collection. Every document has a unique ID within its collection. IDs can be auto-generated by Firestore or set explicitly by your application.

Two documents in the same collection can have completely different fields. There is no enforced schema. This is useful for evolving data models, but it means you are responsible for field consistency at the application layer.

Fields

Each field in a document has a name and a typed value. Firestore supports more types than plain JSON: String, Integer, Float, Boolean, Timestamp, Array, Map, Reference, GeoPoint, Bytes, and Null.

Tip

Always use the Timestamp type for dates and times, never a string. Timestamp fields support ordering and range filters correctly. A string like “2026-03-08” cannot be reliably range-queried unless every document that ever writes to that field uses exactly the same format, which is hard to guarantee once multiple clients are writing.

Document paths

Every document has a path that uniquely identifies it in the database. Paths alternate between collection names and document IDs:

  • /users/alice points to document alice in the users collection
  • /products/widget-42 points to document widget-42 in the products collection
  • /users/alice/orders/order-001 points to a document inside the orders subcollection of user alice
Note

Paths always have an even number of segments. An odd number of segments points to a collection; an even number points to a document. So /users is a collection, /users/alice is a document, and /users/alice/orders is a subcollection.

Subcollections

A document can contain a subcollection, a collection nested inside the document. Subcollections model one-to-many relationships where child documents belong to a specific parent.

A users collection where each user has an orders subcollection looks like this:

users/
  alice/
    name: "Alice Smith"
    email: "alice@example.com"
    orders/                  ← subcollection
      order-001/
        total: 42.50
        status: "shipped"
        createdAt: 2026-03-01
      order-002/
        total: 15.00
        status: "pending"
        createdAt: 2026-03-18
  bob/
    name: "Bob Jones"
    email: "bob@example.com"
    orders/
      order-003/
        total: 99.00
        status: "delivered"
        createdAt: 2026-03-10

Each order is its own document. You can read, query, and paginate orders independently without touching the parent user document. The user document stays small regardless of how many orders accumulate.

Subcollections support up to 100 levels of nesting. In practice one or two levels covers almost every real application. Deeper nesting makes queries harder to write without adding any performance benefit.

How it works in practice

Here is a realistic e-commerce structure using three top-level collections and two subcollection levels:

users/
  user-abc123/
    displayName: "Alice Smith"
    email: "alice@example.com"
    createdAt: Timestamp
    orders/
      order-001/
        items: [{ productId: "widget-42", qty: 2, priceAtPurchase: 21.25 }]
        total: 42.50
        status: "shipped"
        createdAt: Timestamp

products/
  widget-42/
    name: "Widget Pro"
    price: 21.25
    category: "tools"
    stock: 148
    updatedAt: Timestamp

support_tickets/
  ticket-789/
    userId: "user-abc123"
    subject: "Order not arrived"
    status: "open"
    createdAt: Timestamp
    messages/
      msg-001/
        body: "My order has not arrived."
        authorId: "user-abc123"
        sentAt: Timestamp

Notice that each order item stores priceAtPurchase directly rather than relying on the product’s current price. Product prices change; order records should reflect what the customer actually paid at the time. This is a deliberate denormalisation decision.

The support_tickets collection uses a messages subcollection for the conversation thread. Each message is individually addressable, and the ticket document stays compact as the conversation grows.

To control who can read or write these documents from a client application, see Firestore Security Rules.

Designing your data model around queries

In a relational database you normalise data first and write queries afterwards. In Firestore, the process is reversed: you start with the queries your application needs, then design documents and indexes to serve them.

This is not optional. Firestore can only run queries against indexed fields. There are no joins. If a query needs data from two unrelated collections at once, Firestore cannot combine them in a single operation. You must restructure your data, denormalise, or accept multiple reads. Understanding how Firestore queries and indexes work before finalising your schema will save you from expensive rewrites later.

Denormalisation is expected

Denormalisation means storing the same data in more than one place to make reads faster. In a relational database this is a sign of poor design. In Firestore it is standard practice.

Analogy

Think of a paper receipt. When you buy something, the receipt records the price at that moment. It does not store a live reference to the product’s current price. If the price changes next week, your receipt still shows what you paid. Firestore data modelling works the same way: you write values into documents when they are needed, rather than looking them up from a shared source on every read.

If every order screen needs to display the buyer’s name, you have two options: read the user document on every order display (an extra billable read), or store the display name directly on the order document at write time. Most Firestore applications choose the second option. The tradeoff is that if the user changes their display name later, existing order documents will show the old value. Whether that matters is a product decision, not a database limitation.

Structure follows your most frequent read

If your home screen always shows a user’s profile with their three most recent orders, you can either fetch the user document and query the orders subcollection separately, or store those three recent orders as an embedded field in the user document and read everything in one operation.

Embedding works well for high-traffic screens where reducing read count matters for latency and cost. The subcollection works better for full order history, filtering, and pagination. Many applications use both: an embedded recentOrders field for the home screen, and a full subcollection for the order history page.

Warning

Firestore limits writes to approximately one write per second per document. A global counter, a live inventory field, or any value updated by many concurrent users will cause write contention and dropped writes under load. Use sharded counter patterns or move high-frequency aggregations to a separate service.

When this data model fits

Firestore’s document model works well when:

  • Records vary in structure: user profiles, product catalogues, or content pages where some items have fields others do not
  • You need real-time push updates to clients as data changes, without polling
  • Your app works offline: the mobile SDK caches data locally and syncs when connectivity returns
  • You are building a chat app, social feed, or notification system where each user’s data is accessed independently
  • Your schema is still evolving: adding a new field to documents requires no migration

It is a poor fit when:

  • You need complex joins across many collections: use Cloud SQL for relational workloads
  • You need analytical queries over millions of documents: BigQuery is built for that
  • You have high-throughput time-series writes at scale: Bigtable handles those patterns more efficiently

For a full comparison of GCP storage options, see Choosing the Right Storage Service.

Common mistakes

  1. Storing dates as strings instead of Timestamps. A string field like “2026-03-08” cannot be reliably sorted or range-filtered unless every document that ever writes to that field uses exactly the same format. Timestamps are a first-class type with correct ordering and range filter support built in. Use Timestamp for every field representing a point in time.

  2. Growing arrays inside documents instead of using subcollections. Appending to an array rewrites the entire document and counts as one billable write. The array also grows inside the document toward the 1 MB size limit. For lists that accumulate over time, including orders, comments, messages, and log entries, use a subcollection. Each item becomes its own document, individually addressable and pageable.

  3. Using a single document as a shared counter. Firestore allows approximately one write per second per document. Any field updated by many concurrent clients will cause write contention and failures under load. Use sharded counters or aggregate through a separate service.

  4. Designing the schema before thinking about queries. The most common Firestore mistake is modelling data relationally and then discovering that the queries you need are not supported. Define your key read patterns first: what does the main screen load, what does a list page display, what filters does search require. Those answers define the document structure and indexes needed to serve them.

  5. Ignoring document size as data accumulates. A document that stores a growing event history or message thread as a single array will eventually hit the 1 MB limit, often in production under real load. If a field is written to repeatedly, model it as a subcollection from the start.

Firestore versus relational databases

Developers coming from SQL databases often apply the same modelling instincts to Firestore. Those instincts are useful background, but the practical rules are different enough to cause real problems if you treat them as equivalent.

No joins, no foreign key constraints

In a relational database you store users in one table, orders in another, and join them at query time. In Firestore there is no equivalent of a JOIN. If you want to display a user’s name alongside an order, you either store the name in the order document at write time, or make a second read to the user document at display time. Each dereference is a separate billable read and a separate network call.

The reference field type lets you store a path to another document, but Firestore does not automatically fetch the referenced document. You fetch it explicitly when you need it.

Subcollections versus arrays

When you need to store a list of related items, you have two options:

  • Array in the parent document reads all items in one operation together with the parent. Simple to access but impossible to query on item fields independently. Grows inside the document toward the 1 MB limit. Best for small, stable lists always read with the parent.

  • Subcollection stores each item as its own document, queryable, sortable, and pageable on its own. The parent stays small. Requires knowing the parent’s path. Best for lists that grow or need individual addressing.

Tip

Default to subcollections for any list that grows. Use arrays only for small, bounded lists you always read together with the parent document, such as a product’s two or three category tags.

Flexible schema versus enforced schema

Relational databases enforce a schema at the table level. Every row must match the column definitions. Firestore has no equivalent constraint. Documents in the same collection can have completely different fields. This is genuinely useful for evolving data models, but inconsistencies accumulate without application-level validation or well-written security rules that require specific fields on writes.

Frequently asked questions

Is Firestore schema-less?

Firestore has no enforced schema at the database level. You can store any fields in any document without declaring them first. But flexible schema does not mean consequence-free. Your query capability depends on what fields exist and whether they are indexed. If some documents store dates as strings and others as Timestamps, range queries will behave inconsistently across those documents. Treat field consistency as an application responsibility, not something the database enforces for you.

When should I use subcollections instead of arrays?

Use subcollections for any list that will grow over time or that needs to be queried, paginated, or addressed individually. Arrays work well for small, fixed-size lists you always read together with the parent document, such as a list of three category tags on a product. The key limit: every document has a 1 MB maximum size, and arrays grow inside the document. A user with thousands of activity records stored as an array will eventually hit that limit. In a subcollection, each record is its own document and the parent stays small.

Can documents in the same collection have different fields?

Yes. Firestore does not enforce a uniform structure across documents in a collection. One user document can have a phoneNumber field while another does not. This is useful for evolving schemas without migrations, but it means your application code must handle missing fields gracefully. Never assume a field is present just because it exists in most documents.

How does the Firestore data model affect what you can query?

Directly and significantly. Firestore only runs queries against indexed fields. Single-field indexes are created automatically, but multi-field queries require composite indexes configured in advance. You cannot query across two unrelated top-level collections simultaneously because there are no joins. If you need to show a user their recent orders alongside their profile in a single read, you must either store some order data inside the user document or accept two separate reads. Your data structure determines your query options.

What is the maximum size of a Firestore document?

A single Firestore document can be at most 1 MB. This includes all field names and their values. For large binary data like images or files, store the content in Cloud Storage and keep only a reference URL in the Firestore document. The limit is rarely hit for typical user or product data, but grows quickly if you store arrays that accumulate items over time. Chat messages, event logs, and activity histories are common culprits. Use subcollections for any list expected to grow.

Last verified: 23 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.