Stateless vs Stateful Services in Azure
Whether a service is stateless or stateful is one of the most consequential design decisions in cloud architecture. The choice affects how you scale, how you deploy, how you recover from failures, and how much your system costs to run. Azure provides different tools for each pattern, and understanding when to use them prevents a category of scaling and reliability problems that are painful to retrofit later.
Defining stateless and stateful
A stateless service processes each request independently using only information contained in that request (and data fetched from external sources during processing). It holds no memory between requests. Two consecutive requests from the same client are indistinguishable to the service — it has no notion of “this client was here before.”
A stateful service maintains data between requests, either in memory or in local storage on the same instance. The service’s behaviour for a given request depends on the history of prior interactions. A shopping cart API that stores the cart in the web server’s local memory is stateful. The same API that stores the cart in a database or cache is stateless, even though it uses persistent state — the key distinction is whether the state is held locally on the instance or externalised to a shared store.
This distinction matters because local state is not visible to other instances. If a request for a user’s shopping cart arrives at a different instance than the one that built the cart, that instance has no idea what is in the cart.
How each pattern scales
Stateless services scale horizontally with zero coordination overhead. When Azure’s autoscaler adds a new instance of your stateless API, that instance immediately begins handling requests at the same quality as the others. When load drops and the autoscaler removes an instance, no data is lost because nothing was stored locally. This is why cloud-native applications strongly prefer stateless services at the compute tier.
Stateful services scale differently. Adding instances does not immediately help unless the state is partitioned or replicated to the new instances. Azure Service Fabric, Azure Functions Durable Entities, and Azure Kubernetes Service with persistent volumes support stateful workloads at scale, but they require more operational care.
# Configure autoscale for a stateless Azure App Service
az monitor autoscale create \
--resource-group myRG \
--resource myWebApp \
--resource-type Microsoft.Web/serverFarms \
--name myAutoscale \
--min-count 2 \
--max-count 10 \
--count 2
# Add scale-out rule: add 2 instances when CPU > 70% for 5 minutes
az monitor autoscale rule create \
--resource-group myRG \
--autoscale-name myAutoscale \
--condition "Percentage CPU > 70 avg 5m" \
--scale out 2
# Add scale-in rule: remove 1 instance when CPU < 30% for 10 minutes
az monitor autoscale rule create \
--resource-group myRG \
--autoscale-name myAutoscale \
--condition "Percentage CPU < 30 avg 10m" \
--scale in 1Externalising state in Azure
The practical path to stateless compute is externalising all categories of state to managed services designed to hold that data reliably and quickly.
Session state belongs in Azure Cache for Redis. Redis stores key-value pairs in memory at sub-millisecond latency. Azure Cache for Redis supports clustering (horizontal partitioning across shards) and geo-replication for multi-region deployments. ASP.NET Core, Node.js, and Java all have first-class Redis session providers.
Application data belongs in a managed database: Azure SQL Database, Cosmos DB, Azure Database for PostgreSQL, or Azure Database for MySQL, depending on your data model. These are shared stores visible to all instances of your service.
Files and blobs belong in Azure Blob Storage. Never write files to the local filesystem of a compute instance — those files are not visible to other instances and are lost if the instance is replaced. Mount Azure Files (SMB/NFS) if you need a filesystem abstraction, or write directly to Blob Storage.
Distributed locks and leader election belong in Azure Blob Storage leases or Azure Cosmos DB optimistic concurrency. If your service needs to ensure that only one instance performs a particular operation at a time, use a distributed lock pattern rather than relying on process-local mutexes.
# Example: using Redis for session state in a Python Flask app
from flask import Flask, session
from flask_session import Session
import redis
app = Flask(__name__)
app.config['SESSION_TYPE'] = 'redis'
app.config['SESSION_REDIS'] = redis.from_url(
'rediss://:password@myredis.redis.cache.windows.net:6380/0',
ssl_cert_reqs=None
)
app.config['SESSION_PERMANENT'] = False
app.config['SESSION_USE_SIGNER'] = True
Session(app)
@app.route('/cart/add', methods=['POST'])
def add_to_cart():
item_id = request.json['item_id']
cart = session.get('cart', [])
cart.append(item_id)
session['cart'] = cart # written to Redis, not local memory
return {'cart_size': len(cart)}Azure services for stateful workloads
Some workloads genuinely require stateful compute — where the cost of externalising every piece of state on every operation would be prohibitive. Azure provides several options.
Azure Durable Functions manages workflow state automatically. Each orchestration function’s progress — which activities have run, what their outputs were, which are waiting — is persisted to Azure Storage by the Durable Functions runtime. If a function instance is interrupted, the orchestration restarts and replays from the last checkpoint. The application developer writes stateful workflow logic; the framework handles state persistence transparently.
Azure Service Fabric Stateful Services allow services to store state directly in a replicated in-memory collection (Reliable Collections) that is managed by Service Fabric. Service Fabric handles partitioning, replication across nodes, and failover. This is appropriate for low-latency stateful scenarios where external store round trips would dominate performance.
Azure Kubernetes Service with persistent volumes allows pods to mount persistent storage that survives pod restarts and rescheduling. Azure Disk (for single-writer workloads) and Azure Files (for multi-writer workloads) are both supported as persistent volume types. The volume follows the pod’s lifecycle as managed by the storage class and reclaim policy.
# Kubernetes PersistentVolumeClaim for an Azure Disk
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 20Gi
---
# Pod using the PVC
apiVersion: v1
kind: Pod
metadata:
name: stateful-pod
spec:
containers:
- name: app
image: myapp:latest
volumeMounts:
- mountPath: /data
name: my-storage
volumes:
- name: my-storage
persistentVolumeClaim:
claimName: my-pvcThe problem with sticky sessions
Sticky sessions (also called session affinity) are a configuration option on Azure Application Gateway and Azure Load Balancer that route every request from a given client to the same backend instance. They are a workaround for stateful web tier behaviour, not a solution to it.
Sticky sessions cause several problems. If the instance a client is stuck to fails, that client’s session is lost. When you deploy a new version, instances are replaced one at a time, but clients stuck to old instances cannot be migrated gracefully. Auto-scaling does not distribute load evenly when sessions are pinned. If one instance has many long-lived sessions and another is idle, you cannot scale in the idle instance without breaking those sessions.
The correct approach is to eliminate the need for sticky sessions by externalising session state. Once sessions are in Redis, all instances are equivalent and sticky sessions are unnecessary. Disable session affinity in your load balancer as a forcing function to verify that your application is truly stateless.
Decision framework
Use this framework when designing a new service or evaluating an existing one:
- Identify every category of data the service holds in local memory or local disk between requests.
- For each category, determine whether it can be externalised to a managed Azure service (Redis, database, blob storage).
- If externalisation has acceptable latency overhead (under ~5ms per request is usually fine), externalise it and make the service stateless.
- If externalisation overhead is unacceptable (e.g., a high-frequency trading system needing microsecond latency), use a stateful service pattern (Service Fabric Reliable Collections, in-memory caching with periodic persistence) and accept the operational complexity.
- For workflow state (orchestrating multi-step processes), prefer Durable Functions, which externalise state transparently without developer overhead.
Common mistakes
- Writing files to the local filesystem of App Service or Container Apps. App Service instances are ephemeral — a deployment or restart creates a new instance and the old filesystem is gone. Files written to /tmp or equivalent paths are not replicated to other instances. Use Azure Blob Storage or Azure Files instead.
- Using sticky sessions as a long-term architecture. Sticky sessions hide the symptom without fixing the problem. They create uneven load distribution, complicate zero-downtime deployments, and make scale-in operations disruptive. Treat sticky sessions as a technical debt marker and schedule migration to Redis-backed sessions.
- Storing JWT tokens in server-side session. If your API is truly stateless, it validates a JWT on every request and does not store tokens server-side. If tokens are stored in session, the service is stateful for the duration of the session. Use short-lived JWTs with refresh token rotation instead of server-side session for authentication state.
Summary
- Stateless services store no data locally between requests; all state is in external managed services.
- Stateless services scale horizontally without coordination and recover from instance failures without data loss.
- Externalise session state to Azure Cache for Redis, application data to managed databases, and files to Azure Blob Storage.
- Avoid sticky sessions — they are a workaround that creates operational problems. Fix the root cause by externalising state.
- When genuine stateful compute is needed, use Durable Functions, Service Fabric Stateful Services, or AKS with persistent volumes.
Frequently asked questions
Why does statelessness make services easier to scale?
A stateless service treats every incoming request identically — it does not matter which instance handles the request because no local state is involved. You can add or remove instances at any time without migrating data or breaking in-flight sessions. A stateful service, by contrast, holds data locally that is needed to complete future requests from the same client. Adding instances does not help unless you also migrate or replicate that local data.
How do I handle user sessions if my web tier is stateless?
Externalise session state to a shared store such as Azure Cache for Redis. When a request arrives at any instance, that instance reads the session from Redis, processes the request, and writes the updated session back. Redis operates at sub-millisecond latency at the local region, so the overhead is negligible. App Service has built-in Redis session providers for ASP.NET and Node.js.
When is it acceptable for a service to be stateful?
Stateful services are appropriate when the overhead of externalising state exceeds the benefit, or when the state is ephemeral and loss is acceptable. Examples include a caching layer that is stateful but whose data can be rebuilt, a machine learning inference service that keeps a model loaded in memory, or a streaming processor that maintains a windowed aggregate in memory for performance and can restart and re-read events from the source.