Cloud System Design Interviews: How to Approach Them

System design interviews are where many technically strong candidates stumble. Not because they lack knowledge, but because they do not know how to structure their thinking under pressure, in front of someone watching them work in real time.

This page explains how system design interviews work in cloud and DevOps hiring, the framework that works across question types, and two complete worked examples you can study and adapt.

What System Design Interviews Actually Look Like#

You are given a vague, open-ended prompt: “Design a URL shortener.” “Design a notification system for millions of users.” “Design the infrastructure for a new e-commerce platform.” The interviewer then watches how you think.

The session typically runs 45-60 minutes. You are expected to talk through your thinking, draw diagrams (on a whiteboard or shared diagram tool), ask clarifying questions, and make decisions with stated reasoning.

The interviewer is not looking for the perfect architecture. They are looking for:

A candidate who builds a coherent, well-reasoned design with a few acknowledged gaps beats a candidate who produces a technically impressive diagram while mumbling through the explanation.

System Design at Junior vs Senior Level#

Junior / early-career roles rarely include system design interviews. If they appear, the bar is low: can you describe how a web application works end-to-end? Can you explain what a load balancer does? No one expects a junior engineer to design a globally distributed system.

Mid-level roles include system design for cloud-heavy positions and anything with “platform” or “infrastructure” in the title. The expectation is that you can design a single-region deployment for a medium-complexity application, identify the key components, and explain why you chose them.

Senior / principal roles include more complex system design questions — multi-region, high availability, handling scale, dealing with consistency vs availability trade-offs. You are also expected to identify the places where your design is weakest and discuss them proactively rather than waiting to be caught out.

The Framework: Five Steps in Order#

This framework works for almost every cloud system design question. The goal is to give your thinking structure so you do not jump to drawing an architecture before you understand what you are building.

Step 1 — Clarify Requirements#

Do not start drawing until you have asked questions. Interviewers respect candidates who ask clarifying questions — it signals that you know requirements are not optional.

What to ask:

Write down what you agreed on. This becomes the rubric you evaluate your design against.

Step 2 — Estimate Scale#

Rough order-of-magnitude estimates tell you whether you are designing for a weekend project or a system that needs genuine engineering investment.

What to estimate:

You do not need to get these numbers exactly right. The point is to make your design choices sensible given the scale. If you design for 1 million RPS but the system needs 1,000 RPS, you have over-engineered it. If you design for 1,000 RPS and then mention you would need to revisit this for higher scale, that is perfectly fine.

Step 3 — Choose the Right Services#

Now you can start designing. At this stage, think in components first, then map to specific services.

Components before services:

Then map to cloud services with reasons:

Name the service and give a reason. Naming a service without explaining why you chose it over alternatives is a missed opportunity.

Step 4 — Address Availability, Scalability, and Fault Tolerance#

Once the core design is on the board, explicitly discuss:

Availability. What happens if one component fails? Are there single points of failure? If the database is a single instance, that is a problem. Multi-AZ deployments, read replicas, and failover groups address this.

Scalability. Where are the bottlenecks as load increases? Can you scale horizontally? If the bottleneck is a stateful component (a database), how do you handle it — read replicas, sharding, or caching to reduce load?

Fault tolerance. What happens when you deploy a new version? Rolling updates with health checks. What happens if the message queue backs up? Dead-letter queues and alerting. What happens if an external dependency is down? Circuit breakers or graceful degradation.

This section is where senior candidates differentiate themselves. Naming a problem before the interviewer asks about it shows you think about the failure modes of systems you design.

Step 5 — Discuss Trade-offs Explicitly#

Every design decision has trade-offs. Name them.

The interviewer knows these trade-offs exist. Naming them proactively shows maturity. Ignoring them suggests you have not thought about the real implications of your choices.

Worked Example 1: Design a URL Shortener#

This is a classic warm-up question. It sounds trivial but covers storage, caching, redirection, and scale.

After clarifying requirements: The system creates short URLs from long URLs, redirects users who visit a short URL to the original, and needs to handle millions of created URLs and potentially hundreds of millions of redirections.

Scale estimate: Assume 100M shortened URLs exist. Assume 1B redirects per month = ~400 RPS on average, with spikes to 10x = ~4,000 RPS. Each URL mapping: ~500 bytes (short key + long URL + metadata) = 50GB of data total. This is manageable on a single relational database with caching.

Core design:

The short URL key is a 7-character alphanumeric string (62^7 = 3.5 trillion possible URLs — plenty of headroom). Key generation: hash the long URL and take a subset, or use an auto-incrementing ID converted to base-62.

Storage: PostgreSQL (or equivalent) storing short key → long URL + created date + owner + optional expiry. The database does not need to be massive; the dataset fits in a single well-specced instance.

Redirection path: User visits short.ly/abc1234 → request hits the application server → application server checks in-memory cache (Redis) for the short key → on cache hit, returns 301/302 redirect → on cache miss, queries the database, stores in cache, returns redirect.

Write path: User submits a long URL → generate short key → check for collision (rare) → store in database → return short URL.

Availability: Multi-AZ PostgreSQL deployment. Redis replication. Application servers behind a load balancer, auto-scaling group. Cache miss rate will be low for popular URLs (Zipf distribution — a small fraction of URLs get the vast majority of traffic).

Trade-offs to mention:

Worked Example 2: Design a Real-Time Notification System for 10 Million Users#

This is a harder question testing knowledge of fan-out patterns, WebSockets vs polling, and data pipeline architecture.

After clarifying requirements: Users receive notifications in real time when certain events happen — a social media like, a message received, an order shipped. Notifications appear in the app within seconds of the event. Users have multiple devices. We need to handle 10 million active users.

Scale estimate: 10M users. Assume on average 1 notification per user per hour = ~2,800 notifications per second. Some events trigger fan-out (a celebrity posting triggers notifications to millions of followers — a very different problem). Let’s design for the common case first.

Core design:

Event ingestion: Source services (order service, messaging service, social service) publish events to a message queue (Kafka, SQS, or Pub/Sub depending on the cloud provider). This decouples the event producers from the notification delivery system.

Notification service: A consumer reads from the message queue, determines which users need to be notified, looks up their device tokens and preferences, and routes notifications to the appropriate delivery channels.

Delivery channels:

User connection management: The WebSocket gateway needs to know which server a user is connected to (because a user might be on any of 100 WebSocket servers). Use Redis to store userId → connectionId/server. When a notification arrives for a user, look up their active connection and route the message to the right server.

Availability and scalability: WebSocket servers are stateless — they only hold transient connections. Scale horizontally behind a load balancer. For the Kafka consumers, partition by user ID to ensure ordered delivery per user. Redis runs in a cluster for capacity and replicated for availability.

Fan-out handling for high-follower accounts: If a single event triggers 5 million notifications (a celebrity post), processing them synchronously would take too long. Use a dedicated fan-out job that queues individual notifications in batches rather than processing all 5M inline. Accept that high-follower-count notifications may take seconds to minutes to fully deliver — this is a stated trade-off.

Trade-offs to mention:

Common Mistakes in System Design Interviews#

Starting to draw before asking questions. The interviewer gives you a vague prompt. Drawing an architecture before clarifying requirements produces an answer to a question you invented, not the one being asked.

Ignoring trade-offs. Proposing an architecture and moving on without discussing its weaknesses signals that you either haven’t thought about them or don’t know they exist. Both are problems.

Naming services without explaining why. “I’d use DynamoDB for storage.” Why DynamoDB and not S3, RDS, or Redis? Naming without reasoning makes the interviewer uncertain whether you understand the service or are just dropping names.

Designing a perfect system instead of a good system. System design interviews do not have correct answers. An over-engineered design that takes up all the time explaining complexity is worse than a simpler design with clear reasoning and acknowledged limitations.

Not talking. This is a communication exercise. Interviewers who cannot hear your reasoning cannot evaluate your thinking. Narrate your decisions, even tentative ones.

How to Handle a Design You Have Not Built Before#

Almost every system design question involves a system you have not personally built. That is intentional — the question tests thinking, not experience with that specific system.

When you are unfamiliar with a specific service or pattern:

The interviewer knows you have not designed every system that exists. What they are evaluating is whether you have the judgment to design systems you have not built yet.