Cloud Engineer Cheatsheet: Key Concepts, Services, and Patterns
This page is a quick reference for working cloud engineers and people studying for cloud roles. It covers the core service categories, networking and IAM fundamentals, storage patterns, and architectural concepts you will use on the job every day.
Core Service Categories#
| Category | What it does | AWS | GCP | Azure |
|---|---|---|---|---|
| Compute | Run application workloads | EC2 | Compute Engine | Virtual Machines |
| Storage | Persist data | S3 / EBS / EFS | Cloud Storage / Persistent Disk | Blob / Disk / Files |
| Networking | Connect resources | VPC / Route 53 / ELB | VPC / Cloud DNS / Cloud LB | VNet / Azure DNS / Load Balancer |
| Database | Managed data stores | RDS / DynamoDB | Cloud SQL / Spanner / Firestore | Azure SQL / Cosmos DB |
| Identity | Auth and access control | IAM | Cloud IAM | Microsoft Entra ID |
| Monitoring | Observe running systems | CloudWatch | Cloud Monitoring / Logging | Azure Monitor |
| Serverless | Run code without managing VMs | Lambda | Cloud Functions / Cloud Run | Azure Functions |
Networking Fundamentals#
VPC / VNet — A logically isolated virtual network inside a cloud provider. You define the IP address range (using CIDR notation), subnets, and routing rules.
CIDR notation — A way of expressing IP ranges. 10.0.0.0/16 means the first 16 bits are fixed, giving you 65,536 addresses. /24 gives 256 addresses. Smaller suffix = larger range.
Subnets
- Public subnet: has a route to the internet via an Internet Gateway (IGW). Resources here can be reached from the internet (if a public IP is assigned).
- Private subnet: no direct internet route. Traffic must go through a NAT Gateway to reach the internet outbound.
NAT Gateway — Allows resources in private subnets to make outbound connections (e.g., downloading updates) without being directly reachable from the internet.
Load Balancers
- L4 (transport layer): routes based on IP and TCP/UDP port. Fast, protocol-agnostic. AWS NLB / Azure Load Balancer.
- L7 (application layer): routes based on HTTP headers, URL paths, hostnames. Supports SSL termination, host-based routing. AWS ALB / GCP Cloud LB / Azure Application Gateway.
DNS concepts — DNS translates domain names to IP addresses. A records point to IPv4 addresses. CNAME records are aliases. TTL (Time To Live) controls how long records are cached. Cloud providers offer managed DNS: Route 53 (AWS), Cloud DNS (GCP), Azure DNS.
IAM Concepts#
| Term | Definition |
|---|---|
| Principal | The identity making a request — a user, service account, or role |
| Policy | A document defining what actions are allowed or denied on which resources |
| Role | A set of permissions that can be attached to a principal |
| Least privilege | Grant only the permissions needed for the task — nothing more |
| Authentication | Proving who you are (identity) |
| Authorisation | Determining what you are allowed to do (permissions) |
Service accounts vs IAM users — IAM users are for humans. Service accounts (AWS IAM roles for services, GCP service accounts, Azure Managed Identities) are for applications and automation. Avoid giving applications long-lived user credentials. Use service accounts or instance-attached roles instead.
Storage Patterns#
| Type | What it is | Best for | AWS | GCP | Azure |
|---|---|---|---|---|---|
| Object | Flat namespace, accessed via API | Backups, static files, data lakes, media | S3 | Cloud Storage | Blob Storage |
| Block | Raw storage volumes attached to a VM | OS disks, databases, high-IOPS workloads | EBS | Persistent Disk | Azure Disk |
| File | Shared file system mounted by multiple VMs | Shared app configs, legacy NFS workloads | EFS | Filestore | Azure Files |
Object storage is the most common choice for new cloud-native workloads. Block storage is used when you need low-latency disk access (databases, OS volumes). File storage is used when multiple VMs need to share the same directory tree.
Compute Options: When to Use What#
| Option | Best for |
|---|---|
| Virtual Machines | Full OS control, legacy apps, stateful workloads, GPU or specialised hardware needs |
| Containers | Portable microservices, repeatable environments, CI/CD pipelines, Kubernetes workloads |
| Serverless | Event-driven functions, infrequent workloads, API backends — no idle cost, no server management |
Key Architectural Patterns#
Stateless vs stateful — A stateless service holds no session data in memory. Any instance can serve any request. This makes horizontal scaling and rolling deployments much simpler. Store session data in a cache (Redis) or database instead of in the application process.
Idempotency — An operation is idempotent if running it multiple times produces the same result as running it once. Critical for retries and message queues. A HTTP PUT that sets a value to 42 is idempotent; a POST that increments a counter is not.
Immutable infrastructure — Never modify running servers. Instead, build a new image, deploy it, and retire the old one. This eliminates configuration drift and makes rollbacks predictable.
Blue/green deployment — Run two identical environments (blue = current, green = new). Route traffic to green when ready. Rollback is instant: switch traffic back to blue. Requires double the infrastructure during the switch.
Rolling deployment — Replace instances gradually, a few at a time. Lower resource cost than blue/green but rollback is slower.
Cost Basics#
Common cost drivers
- Compute: charged per hour or second while instances are running
- Data transfer: egress (data leaving a cloud region or to the internet) is usually charged; ingress is often free
- Storage: charged per GB-month; also per request for object storage
- Managed services: additional markup over raw infrastructure cost in exchange for reduced operational overhead
Pricing models
| Model | Description |
|---|---|
| On-demand / pay-as-you-go | Full price, no commitment, maximum flexibility |
| Reserved / committed use | 1 or 3 year commitment, 30–70% discount |
| Spot / preemptible | Spare capacity at steep discount (60–90%), can be interrupted with short notice |
Use reserved instances for predictable baseline workloads. Use spot/preemptible for batch jobs, CI builds, and fault-tolerant workloads.
Reliability Concepts#
Regions — Geographically separated data centre clusters. Deploying across regions protects against regional outages.
Availability Zones (AZs) — Physically separate data centres within a single region, connected by low-latency links. Deploying across AZs protects against single-facility failures.
Fault tolerance — The system continues operating correctly even when a component fails. Achieved through redundancy, health checks, and automatic failover.
RTO (Recovery Time Objective) — The maximum acceptable time for a system to be restored after a failure.
RPO (Recovery Point Objective) — The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means you can tolerate losing up to 1 hour of data.
Quick Decision Guide#
| If you need… | Use… |
|---|---|
| Store large files or backups | Object storage (S3 / Cloud Storage / Blob) |
| Run a containerised app at scale | Kubernetes (EKS / GKE / AKS) |
| Run event-driven code without a server | Serverless functions (Lambda / Cloud Functions / Azure Functions) |
| Share a file system across multiple VMs | File storage (EFS / Filestore / Azure Files) |
| Route HTTPS traffic based on URL path | L7 load balancer (ALB / Cloud LB / App Gateway) |
| Reduce costs on long-running VMs | Reserved / committed use pricing |
| Run fault-tolerant batch jobs cheaply | Spot / preemptible instances |
| Restrict what an application can access | Service account or instance-attached IAM role |
| Protect against a single data centre outage | Multi-AZ deployment |
| Protect against a regional outage | Multi-region deployment |