Cloud System Design Interviews: How to Approach Them

System design interviews are where many technically strong candidates stumble. Not because they lack knowledge, but because they do not know how to structure their thinking under pressure, in front of someone watching them work in real time.

This page explains how system design interviews work in cloud and DevOps hiring, the framework that works across question types, and two complete worked examples you can study and adapt.

What System Design Interviews Actually Look Like#

You are given a vague, open-ended prompt: “Design a URL shortener.” “Design a notification system for millions of users.” “Design the infrastructure for a new e-commerce platform.” The interviewer then watches how you think.

The session typically runs 45-60 minutes. You are expected to talk through your thinking, draw diagrams (on a whiteboard or shared diagram tool), ask clarifying questions, and make decisions with stated reasoning.

The interviewer is not looking for the perfect architecture. They are looking for:

Whether you ask the right questions before drawing anything
Whether your decisions are grounded in requirements
Whether you understand the trade-offs of what you are proposing
Whether you can communicate your design clearly
Whether you know what you do not know

A candidate who builds a coherent, well-reasoned design with a few acknowledged gaps beats a candidate who produces a technically impressive diagram while mumbling through the explanation.

System Design at Junior vs Senior Level#

Junior / early-career roles rarely include system design interviews. If they appear, the bar is low: can you describe how a web application works end-to-end? Can you explain what a load balancer does? No one expects a junior engineer to design a globally distributed system.

Mid-level roles include system design for cloud-heavy positions and anything with “platform” or “infrastructure” in the title. The expectation is that you can design a single-region deployment for a medium-complexity application, identify the key components, and explain why you chose them.

Senior / principal roles include more complex system design questions — multi-region, high availability, handling scale, dealing with consistency vs availability trade-offs. You are also expected to identify the places where your design is weakest and discuss them proactively rather than waiting to be caught out.

The Framework: Five Steps in Order#

This framework works for almost every cloud system design question. The goal is to give your thinking structure so you do not jump to drawing an architecture before you understand what you are building.

Step 1 — Clarify Requirements#

Do not start drawing until you have asked questions. Interviewers respect candidates who ask clarifying questions — it signals that you know requirements are not optional.

What to ask:

Who is the user? End consumers? Internal engineers? Other services?
What does the system need to do? Identify the two or three core functions. Resist the urge to add features that were not mentioned.
What are the non-functional requirements? Availability target (99.9%? 99.99%?), acceptable latency, consistency requirements (is stale data okay?), geographic requirements (single region? multi-region?).
Are there constraints? Existing systems it must integrate with? A specific cloud provider? Budget considerations?

Write down what you agreed on. This becomes the rubric you evaluate your design against.

Step 2 — Estimate Scale#

Rough order-of-magnitude estimates tell you whether you are designing for a weekend project or a system that needs genuine engineering investment.

What to estimate:

Users. How many concurrent users? Daily active users? Does it spike (e.g., a ticketing system during a sale)?
Requests per second. Even a rough number: 100 RPS vs 10,000 RPS changes the architecture significantly.
Data volume. How much data is written per day? How long is it retained? What does that mean for storage over five years?
Read/write ratio. Mostly reads (a product catalogue) behaves differently from mostly writes (an activity log).

You do not need to get these numbers exactly right. The point is to make your design choices sensible given the scale. If you design for 1 million RPS but the system needs 1,000 RPS, you have over-engineered it. If you design for 1,000 RPS and then mention you would need to revisit this for higher scale, that is perfectly fine.

Step 3 — Choose the Right Services#

Now you can start designing. At this stage, think in components first, then map to specific services.

Components before services:

I need somewhere to store user data (→ relational database)
I need fast reads for cached data (→ in-memory cache)
I need to deliver files globally (→ CDN)
I need to process events asynchronously (→ message queue)
I need a public entry point with TLS (→ load balancer with HTTPS listener)

Then map to cloud services with reasons:

“I’d use RDS (Postgres) rather than DynamoDB here because the data has strong relational properties and consistent access patterns — I don’t need the horizontal scaling that DynamoDB provides at this scale.”
“I’d use SQS for the queue rather than Kafka because the throughput doesn’t justify managing Kafka, and SQS is fully managed with no operational overhead.”

Name the service and give a reason. Naming a service without explaining why you chose it over alternatives is a missed opportunity.

Step 4 — Address Availability, Scalability, and Fault Tolerance#

Once the core design is on the board, explicitly discuss:

Availability. What happens if one component fails? Are there single points of failure? If the database is a single instance, that is a problem. Multi-AZ deployments, read replicas, and failover groups address this.

Scalability. Where are the bottlenecks as load increases? Can you scale horizontally? If the bottleneck is a stateful component (a database), how do you handle it — read replicas, sharding, or caching to reduce load?

Fault tolerance. What happens when you deploy a new version? Rolling updates with health checks. What happens if the message queue backs up? Dead-letter queues and alerting. What happens if an external dependency is down? Circuit breakers or graceful degradation.

This section is where senior candidates differentiate themselves. Naming a problem before the interviewer asks about it shows you think about the failure modes of systems you design.

Step 5 — Discuss Trade-offs Explicitly#

Every design decision has trade-offs. Name them.

Using a cache improves read performance but introduces cache invalidation complexity and potential stale data.
Multi-region active-active increases availability but introduces consistency challenges and significantly increases cost and operational complexity.
A microservices architecture improves independent scalability and deployment but adds network overhead, distributed tracing requirements, and operational complexity compared to a monolith.
Asynchronous processing via a queue improves throughput but introduces eventual consistency and requires handling of message failures.

The interviewer knows these trade-offs exist. Naming them proactively shows maturity. Ignoring them suggests you have not thought about the real implications of your choices.

Worked Example 1: Design a URL Shortener#

This is a classic warm-up question. It sounds trivial but covers storage, caching, redirection, and scale.

After clarifying requirements: The system creates short URLs from long URLs, redirects users who visit a short URL to the original, and needs to handle millions of created URLs and potentially hundreds of millions of redirections.

Scale estimate: Assume 100M shortened URLs exist. Assume 1B redirects per month = ~400 RPS on average, with spikes to 10x = ~4,000 RPS. Each URL mapping: ~500 bytes (short key + long URL + metadata) = 50GB of data total. This is manageable on a single relational database with caching.

Core design:

The short URL key is a 7-character alphanumeric string (62^7 = 3.5 trillion possible URLs — plenty of headroom). Key generation: hash the long URL and take a subset, or use an auto-incrementing ID converted to base-62.

Storage: PostgreSQL (or equivalent) storing short key → long URL + created date + owner + optional expiry. The database does not need to be massive; the dataset fits in a single well-specced instance.

Redirection path: User visits short.ly/abc1234 → request hits the application server → application server checks in-memory cache (Redis) for the short key → on cache hit, returns 301/302 redirect → on cache miss, queries the database, stores in cache, returns redirect.

Write path: User submits a long URL → generate short key → check for collision (rare) → store in database → return short URL.

Availability: Multi-AZ PostgreSQL deployment. Redis replication. Application servers behind a load balancer, auto-scaling group. Cache miss rate will be low for popular URLs (Zipf distribution — a small fraction of URLs get the vast majority of traffic).

Trade-offs to mention:

301 (permanent redirect) vs 302 (temporary redirect). 301 is cached by the browser — reduces server load but means you lose analytics visibility for returning visits. If analytics matter, use 302.
On cache miss under very high load, a thundering herd problem is possible for a suddenly viral URL. A cache-aside pattern with a short TTL and a mutex can mitigate this.
If 4,000 RPS is too much for one database, read replicas handle the read load since most operations are reads.

Worked Example 2: Design a Real-Time Notification System for 10 Million Users#

This is a harder question testing knowledge of fan-out patterns, WebSockets vs polling, and data pipeline architecture.

After clarifying requirements: Users receive notifications in real time when certain events happen — a social media like, a message received, an order shipped. Notifications appear in the app within seconds of the event. Users have multiple devices. We need to handle 10 million active users.

Scale estimate: 10M users. Assume on average 1 notification per user per hour = ~2,800 notifications per second. Some events trigger fan-out (a celebrity posting triggers notifications to millions of followers — a very different problem). Let’s design for the common case first.

Core design:

Event ingestion: Source services (order service, messaging service, social service) publish events to a message queue (Kafka, SQS, or Pub/Sub depending on the cloud provider). This decouples the event producers from the notification delivery system.

Notification service: A consumer reads from the message queue, determines which users need to be notified, looks up their device tokens and preferences, and routes notifications to the appropriate delivery channels.

Delivery channels:

Mobile push notifications via APNs (Apple) and FCM (Firebase/Google). These work even when the app is in the background and do not require a persistent connection.
In-app notifications for users who are currently in the app. Use WebSockets for real-time delivery to connected clients. A WebSocket gateway (API Gateway WebSockets, or custom connection server) maintains persistent connections.
Email / SMS for important notifications via SES, SNS, or Twilio. These are lower-latency-requirement channels.

User connection management: The WebSocket gateway needs to know which server a user is connected to (because a user might be on any of 100 WebSocket servers). Use Redis to store userId → connectionId/server. When a notification arrives for a user, look up their active connection and route the message to the right server.

Availability and scalability: WebSocket servers are stateless — they only hold transient connections. Scale horizontally behind a load balancer. For the Kafka consumers, partition by user ID to ensure ordered delivery per user. Redis runs in a cluster for capacity and replicated for availability.

Fan-out handling for high-follower accounts: If a single event triggers 5 million notifications (a celebrity post), processing them synchronously would take too long. Use a dedicated fan-out job that queues individual notifications in batches rather than processing all 5M inline. Accept that high-follower-count notifications may take seconds to minutes to fully deliver — this is a stated trade-off.

Trade-offs to mention:

WebSockets vs long-polling. WebSockets are more efficient at scale but require sticky connections or a connection registry. Long-polling is simpler to implement but wasteful at high connection counts.
At-least-once vs exactly-once delivery. Kafka guarantees at-least-once with standard consumer groups. Idempotency logic in the notification service prevents duplicate notifications from being shown to the user.
Storing notification history: a NoSQL store (DynamoDB, Firestore, Bigtable) works well here because reads are always by user ID (partition key) and writes are high-volume append-only.

Common Mistakes in System Design Interviews#

Starting to draw before asking questions. The interviewer gives you a vague prompt. Drawing an architecture before clarifying requirements produces an answer to a question you invented, not the one being asked.

Ignoring trade-offs. Proposing an architecture and moving on without discussing its weaknesses signals that you either haven’t thought about them or don’t know they exist. Both are problems.

Naming services without explaining why. “I’d use DynamoDB for storage.” Why DynamoDB and not S3, RDS, or Redis? Naming without reasoning makes the interviewer uncertain whether you understand the service or are just dropping names.

Designing a perfect system instead of a good system. System design interviews do not have correct answers. An over-engineered design that takes up all the time explaining complexity is worse than a simpler design with clear reasoning and acknowledged limitations.

Not talking. This is a communication exercise. Interviewers who cannot hear your reasoning cannot evaluate your thinking. Narrate your decisions, even tentative ones.

How to Handle a Design You Have Not Built Before#

Almost every system design question involves a system you have not personally built. That is intentional — the question tests thinking, not experience with that specific system.

When you are unfamiliar with a specific service or pattern:

Name what you know and reason from first principles. “I haven’t used Kinesis directly, but it’s a managed streaming service similar to Kafka — I’d use it here for the same reasons I’d use Kafka: ordered, partitioned event delivery with replay capability.”
State what you are uncertain about and how you would resolve it. “I’m not certain about the connection limits per WebSocket server at this scale — I’d benchmark that before committing to the instance type. But the approach would be the same.”
Focus on the architecture, not the specific service names. A design that describes the components and their responsibilities correctly, with service names that are approximately right, shows more capability than one that names the exact right services but cannot explain how they connect.

The interviewer knows you have not designed every system that exists. What they are evaluating is whether you have the judgment to design systems you have not built yet.