GCP Pub/Sub Push vs Pull: Differences, Throughput, and When to Use Each

Every Pub/Sub subscription uses either push or pull delivery. Pull is usually the better choice for long-running consumers that need throughput and flow control, such as Dataflow pipelines, GKE workers, and Compute Engine services. Push is usually the better choice for serverless consumers that should react to events without running a persistent process, like Cloud Run services and Cloud Functions.

Choosing the wrong delivery mode does not break anything immediately. But it affects scaling behavior, cost, operational complexity, and how much control you have over message processing. This page explains how each mode works, when to use it, and how to decide.

If you are new to Pub/Sub, start with the Pub/Sub Overview for topics, subscriptions, and at-least-once delivery. The Pub/Sub Messaging Model page covers fan-out, ordering keys, and server-side filtering.

Simple explanation

Push means Pub/Sub sends messages to you. You give Pub/Sub an HTTPS URL. When a message arrives on the topic, Pub/Sub makes an HTTP POST to that URL with the message in the request body. Your service processes the message and returns an HTTP success response to acknowledge it.

Pull means you ask Pub/Sub for messages. Your code opens a connection to the Pub/Sub API and requests messages when it is ready. You control how many messages to fetch, how fast to process them, and when to acknowledge each one.

Both modes deliver the same messages from the same subscription. The difference is who initiates the delivery and how much control the consumer has over the process.

Analogy

Pull is like picking up your mail from the post office. You go when you are ready, take as many letters as you can carry, and handle them at your own pace. Push is like a courier who rings your doorbell the moment each letter arrives. Both deliver the mail. The right choice depends on whether you prefer to work on your own schedule or respond immediately to each arrival.

How push delivery works

When you create a push subscription, you provide an HTTPS endpoint URL. Pub/Sub takes responsibility for delivering messages to that endpoint.

Pub/Sub initiates delivery. When a message is published to the topic, Pub/Sub sends an HTTP POST request to your endpoint with the message data in the request body.
The endpoint must use HTTPS. Pub/Sub does not deliver to plain HTTP endpoints. The TLS certificate must be valid and signed by a trusted certificate authority.
Acknowledgement is based on the HTTP response. Your endpoint returns an HTTP 2xx status code (200, 201, 202, 204, or 102) to acknowledge the message. Any other status code tells Pub/Sub the delivery failed.
Failed deliveries are retried with exponential backoff. If your endpoint returns a non-2xx response or does not respond within the ack deadline, Pub/Sub retries. The backoff starts short and increases with each retry, up to a configurable maximum.

Push delivery works well for webhook-style consumers. A Cloud Run service can sit idle with zero instances, and Pub/Sub wakes it by sending an HTTP request when a message arrives. The service processes the message, returns 200, and scales back down. You pay only for the time the service runs.

Note

Push delivers messages one at a time by default. You can enable batching on the subscription to allow multiple messages per HTTP request, which improves throughput. Even with batching, push throughput is lower than an optimized pull consumer at high volume.

How pull delivery works

In pull mode, the subscriber connects to the Pub/Sub API and requests messages. The subscriber decides when to fetch, how many to fetch, and when to acknowledge.

Modern Pub/Sub client libraries use streaming pull by default. The client opens a long-lived bidirectional gRPC stream to Pub/Sub. Messages flow over this stream as they become available, without the subscriber polling repeatedly. The client library handles connection management, flow control, and lease extension automatically.

The subscriber controls concurrency by configuring how many messages the client library fetches ahead (the flow control setting). If processing slows down, the client stops pulling more messages until the backlog clears. This built-in backpressure prevents the subscriber from being overwhelmed.

Pull is the standard choice for long-running consumers. A Dataflow pipeline reading from a Pub/Sub subscription uses pull. A GKE deployment with multiple worker pods uses pull. A Compute Engine application processing a high-volume event stream uses pull. In all these cases, the consumer is already running and gains nothing from having Pub/Sub initiate delivery.

Analogy

Streaming pull is like a conveyor belt between Pub/Sub and your application. Messages roll toward your workers automatically, but you control the speed of the belt. If your workers are busy, the belt slows down. If they are free, it speeds up. You never have to walk to the warehouse and ask for more boxes.

Side-by-side comparison

Dimension	Pull	Push
Who initiates delivery	Subscriber requests messages from Pub/Sub	Pub/Sub sends HTTP POST to your endpoint
Endpoint requirement	None. Subscriber connects outbound to the Pub/Sub API	HTTPS endpoint with a valid TLS certificate
Authentication model	Subscriber authenticates to Pub/Sub using IAM credentials	Pub/Sub sends an OIDC JWT; endpoint validates the token
Acknowledgement	Explicit ack/nack call after processing	HTTP 2xx response = ack; anything else = retry
Retry behavior	Unacked messages are redelivered after ack deadline expires	Non-2xx or timeout triggers exponential backoff retry
Batching	Subscriber controls batch size and prefetch count	One message per request by default; configurable batching available
Throughput potential	High. Subscriber controls parallelism and fetch rate	Lower. Per-request HTTP overhead limits throughput at high volume
Flow control / backpressure	Full. Client library manages outstanding message count	Limited. Pub/Sub controls delivery rate
Scale-to-zero	No. A running process must be pulling	Yes. Pub/Sub wakes the endpoint on message arrival
Operational overhead	Must run and monitor the consumer process	Lower. Serverless platform handles scaling and availability
Best-fit runtimes	Dataflow, GKE, Compute Engine, any long-running process	Cloud Run, Cloud Functions, App Engine
Best use cases	Streaming pipelines, batch workers, high-throughput processing	Event-driven webhooks, lightweight triggers, intermittent workloads

When to use push

Push works best when your consumer is a serverless service that should respond to events without running continuously.

Cloud Run service reacting to order events. An e-commerce platform publishes order events to a Pub/Sub topic. A Cloud Run service receives push deliveries, validates the order, and writes it to a database. Between bursts, the service scales to zero. The team pays only for request processing time.

Cloud Functions handling lightweight triggers. A Cloud Function resizes uploaded images when a notification arrives from a Cloud Storage bucket via Pub/Sub. The function runs for a few seconds per image and scales to zero between uploads.

Intermittent workloads where cost matters. A notification service sends emails when specific events occur. Traffic is bursty: a few messages per minute during quiet periods, hundreds during peak. Push to a Cloud Run service keeps costs proportional to actual usage.

Push is also useful when your consumer is behind a load balancer that distributes incoming Pub/Sub requests across multiple backend instances. The HTTP delivery model integrates naturally with standard HTTP infrastructure.

When to use pull

Pull works best when your consumer is a long-running process that needs control over how messages are consumed.

Dataflow streaming pipeline. A streaming pipeline reads events from Pub/Sub, applies windowing and aggregation, and writes results to BigQuery. Dataflow manages parallelism, checkpointing, and scaling internally. Pull gives the pipeline full control over consumption rate.

GKE worker deployment. A Kubernetes deployment runs 10 worker pods, each pulling messages from the same subscription. Pub/Sub distributes messages across the pods. Each pod controls its own concurrency and batch size. Horizontal pod autoscaling adds or removes pods based on subscription backlog metrics.

Compute Engine or VM-based processing. A data enrichment service runs on Compute Engine VMs. Each VM opens a streaming pull connection and processes messages at its own pace. The service needs to call external APIs with rate limits, so it uses flow control to avoid exceeding those limits. For help choosing between Cloud Run, GKE, and Compute Engine, see the comparison page.

Any workload needing batching or concurrency control. If your processing logic benefits from accumulating messages and processing them in batches (bulk database inserts, batch API calls), pull gives you direct control over batch size and timing.

When not to use this pattern

Push and pull cover most Pub/Sub consumption patterns, but two alternatives are worth knowing about.

Tip

Export subscriptions deliver messages directly into a Google Cloud resource (BigQuery or Cloud Storage) without any consumer code. If you need to land Pub/Sub messages into BigQuery for analytics, an export subscription handles this without a Dataflow pipeline or Cloud Function in between. Less infrastructure, less to operate.

Cloud Tasks is a better fit when you need per-task controls: scheduled delivery at a specific time, rate limiting to protect a downstream API, deduplication by task name, or explicit task-level retry configuration. Pub/Sub distributes events to subscribers. Cloud Tasks dispatches individual units of work to an HTTP endpoint with task-level guarantees. If your workload is “execute this specific HTTP request reliably, at this rate, at this time,” Cloud Tasks is the right tool.

Pub/Sub push vs pull vs Cloud Tasks

These three options overlap enough to cause confusion. Here is the distinction.

Pub/Sub pull is for consumers that control their own consumption. The subscriber decides when to fetch, how much to fetch, and when to ack. Best for streaming pipelines, worker pools, and high-throughput processing.

Pub/Sub push is for event-driven delivery to an HTTPS endpoint. Pub/Sub sends messages as they arrive. Best for serverless consumers that scale to zero.

Cloud Tasks is for queued HTTP task execution with per-task controls. You create a task with a target URL, and Cloud Tasks dispatches it with rate limiting, scheduled delivery, and task-level retry configuration. Best for job queues, webhook delivery with rate protection, and deferred work.

The key question: are you distributing events to subscribers, or dispatching individual tasks to a worker? Events go through Pub/Sub. Tasks go through Cloud Tasks. For a deeper comparison, see Pub/Sub vs Cloud Tasks.

Security and authentication

Push and pull use different authentication models, and each has its own security considerations.

Push authentication

Warning

A push endpoint without authentication accepts HTTP POST requests from anyone who knows the URL. Always configure push subscriptions with OIDC authentication. An unauthenticated push endpoint is one of the most common Pub/Sub security mistakes in production.

When OIDC is configured, Pub/Sub generates a signed JSON Web Token (JWT) and includes it as a Bearer token in the Authorization header of each HTTP POST. Your endpoint validates this token against Google’s OIDC certificates to confirm the request is genuinely from Pub/Sub and intended for your service.

Cloud Run handles token validation automatically when you configure the push subscription’s service account with the roles/run.invoker role on the Cloud Run service. For custom endpoints, validate the token in your handler code.

# Create a push subscription with OIDC authentication
gcloud pubsub subscriptions create order-push \
  --topic=order-events \
  --push-endpoint=https://order-service-xxxx-ew.a.run.app/events/orders \
  --push-auth-service-account=pubsub-invoker@my-app-prod.iam.gserviceaccount.com \
  --project=my-app-prod

Pull authentication

In pull mode, the subscriber authenticates to the Pub/Sub API using IAM credentials, typically a service account key or workload identity. The subscriber needs the roles/pubsub.subscriber role on the subscription. There is no inbound endpoint to protect because the subscriber initiates the connection outbound.

# Pull messages from a subscription (useful for testing and debugging)
gcloud pubsub subscriptions pull order-processor \
  --limit=5 \
  --auto-ack \
  --project=my-app-prod

Performance, batching, and throughput

Pull generally achieves higher throughput than push. A pull consumer controls its own fetch rate, batch size, and parallelism. With streaming pull and multiple threads or processes, a single consumer can handle tens of thousands of messages per second.

Push delivers messages as individual HTTP requests by default. Each request carries the overhead of an HTTP connection, TLS handshake (or reuse), request parsing, and response handling. At very high message rates (tens of thousands per second), this per-request overhead becomes the bottleneck. Configuring batching on the push subscription helps but does not close the gap with an optimized pull consumer.

Cloud Run tip

Push is the simpler integration path for Cloud Run and works well at low-to-moderate volume. But for sustained high-volume workloads, pull via the Pub/Sub client library inside the Cloud Run service often performs better. Set minimum instances to 1 or more, and let the service control its own batch size and processing rate instead of absorbing per-request HTTP overhead.

Neither mode is universally better. Push trades throughput ceiling for operational simplicity and scale-to-zero. Pull trades operational simplicity for throughput and control. Match the mode to your workload characteristics.

When to choose push or pull

Use this checklist to make a quick decision.

Need scale to zero? Lean push. Pub/Sub wakes your endpoint when messages arrive.
Need maximum throughput? Lean pull. The subscriber controls parallelism and batch size.
Need strict control over batching and concurrency? Lean pull. The client library gives you flow control settings.
Consumer is a Dataflow pipeline or GKE worker? Use pull. These runtimes are already running and manage their own consumption.
Consumer is a Cloud Run service or Cloud Function with intermittent traffic? Use push. The webhook model maps directly to these runtimes.
Consumer is a Cloud Run service with sustained high traffic? Consider pull. Set minimum instances and use the client library for better throughput.
Want the simplest possible setup for moderate traffic? Use push to a serverless endpoint. Less code, less infrastructure to manage.

Good to know

You can switch a subscription between push and pull after creation. Modifying the push configuration does not lose accumulated messages. If you start with push and later need pull (or vice versa), you do not have to recreate the subscription.

Common mistakes

Leaving a push endpoint unauthenticated. Without OIDC, anyone who discovers your endpoint URL can send fake messages. Always set —push-auth-service-account when creating a push subscription and validate the identity token in your endpoint.
Acknowledging before processing completes. If a push endpoint returns HTTP 200 before processing finishes and then fails, the message is lost. Return 200 only after processing is confirmed. Return a non-2xx status on failure so Pub/Sub retries. The same principle applies to pull: do not ack until processing succeeds.
Using push for very high-throughput pipelines. At tens of thousands of messages per second, per-request HTTP overhead limits push throughput. Use pull with the client library and multiple threads for high-volume workloads.
Missing dead-letter configuration for persistent failures. If a message consistently fails processing (malformed data, a permanent upstream error), Pub/Sub retries until the message expires. Configure a dead-letter topic to capture messages that exceed the maximum delivery attempts instead of retrying indefinitely.
Assuming push is always simpler for Cloud Run. Push is simpler at low volume. At sustained high volume, the per-request overhead and lack of consumer-side flow control can make push harder to operate than pull with the client library. Evaluate based on your actual traffic pattern.

Frequently asked questions

What is the difference between push and pull in Pub/Sub?

In pull mode, your subscriber calls the Pub/Sub API to request messages when it is ready. In push mode, Pub/Sub sends messages as HTTP POST requests to an HTTPS endpoint you specify. Pull gives the consumer full control over fetch rate and concurrency. Push offloads delivery timing to Pub/Sub, which makes it a natural fit for serverless endpoints that scale to zero.

Does a push subscription require a public endpoint?

The endpoint must be reachable by Pub/Sub over HTTPS. For public Cloud Run or Cloud Functions services, the URL is publicly routable. For endpoints behind a VPC, you can use VPC Service Controls or a load balancer with Identity-Aware Proxy. The endpoint must have a valid TLS certificate. Pub/Sub does not deliver to plain HTTP.

Can Cloud Run use pull instead of push?

Yes. A Cloud Run service can use the Pub/Sub client library to open a streaming pull connection and process messages directly. This is often better for sustained high-volume workloads because the service controls its own batch size and concurrency. Push is simpler for low-to-moderate traffic where scale-to-zero matters most.

Can one topic have both push and pull subscriptions?

Yes. Each subscription on a topic is independent. You can attach a push subscription for a Cloud Run event handler and a pull subscription for a Dataflow pipeline to the same topic. Each subscription receives its own copy of every message.

When should I use Cloud Tasks instead of Pub/Sub push?

Use Cloud Tasks when you need per-task controls: scheduled delivery times, rate limiting, deduplication by task name, or explicit task-level retries. Use Pub/Sub push when you need event distribution to one or more subscribers without per-message scheduling or rate controls. See the full comparison at Pub/Sub vs Cloud Tasks.

Last verified: 26 March 2026 Cloud services change frequently. Verify details against official documentation before making infrastructure decisions.