Cloud Engineer Cheatsheet: Key Concepts, Services, and Patterns

This page is a quick reference for working cloud engineers and people studying for cloud roles. It covers the core service categories, networking and IAM fundamentals, storage patterns, and architectural concepts you will use on the job every day.

Core Service Categories#

Category	What it does	AWS	GCP	Azure
Compute	Run application workloads	EC2	Compute Engine	Virtual Machines
Storage	Persist data	S3 / EBS / EFS	Cloud Storage / Persistent Disk	Blob / Disk / Files
Networking	Connect resources	VPC / Route 53 / ELB	VPC / Cloud DNS / Cloud LB	VNet / Azure DNS / Load Balancer
Database	Managed data stores	RDS / DynamoDB	Cloud SQL / Spanner / Firestore	Azure SQL / Cosmos DB
Identity	Auth and access control	IAM	Cloud IAM	Microsoft Entra ID
Monitoring	Observe running systems	CloudWatch	Cloud Monitoring / Logging	Azure Monitor
Serverless	Run code without managing VMs	Lambda	Cloud Functions / Cloud Run	Azure Functions

Networking Fundamentals#

VPC / VNet — A logically isolated virtual network inside a cloud provider. You define the IP address range (using CIDR notation), subnets, and routing rules.

CIDR notation — A way of expressing IP ranges. 10.0.0.0/16 means the first 16 bits are fixed, giving you 65,536 addresses. /24 gives 256 addresses. Smaller suffix = larger range.

Subnets

Public subnet: has a route to the internet via an Internet Gateway (IGW). Resources here can be reached from the internet (if a public IP is assigned).
Private subnet: no direct internet route. Traffic must go through a NAT Gateway to reach the internet outbound.

NAT Gateway — Allows resources in private subnets to make outbound connections (e.g., downloading updates) without being directly reachable from the internet.

Load Balancers

L4 (transport layer): routes based on IP and TCP/UDP port. Fast, protocol-agnostic. AWS NLB / Azure Load Balancer.
L7 (application layer): routes based on HTTP headers, URL paths, hostnames. Supports SSL termination, host-based routing. AWS ALB / GCP Cloud LB / Azure Application Gateway.

DNS concepts — DNS translates domain names to IP addresses. A records point to IPv4 addresses. CNAME records are aliases. TTL (Time To Live) controls how long records are cached. Cloud providers offer managed DNS: Route 53 (AWS), Cloud DNS (GCP), Azure DNS.

IAM Concepts#

Term	Definition
Principal	The identity making a request — a user, service account, or role
Policy	A document defining what actions are allowed or denied on which resources
Role	A set of permissions that can be attached to a principal
Least privilege	Grant only the permissions needed for the task — nothing more
Authentication	Proving who you are (identity)
Authorisation	Determining what you are allowed to do (permissions)

Service accounts vs IAM users — IAM users are for humans. Service accounts (AWS IAM roles for services, GCP service accounts, Azure Managed Identities) are for applications and automation. Avoid giving applications long-lived user credentials. Use service accounts or instance-attached roles instead.

Storage Patterns#

Type	What it is	Best for	AWS	GCP	Azure
Object	Flat namespace, accessed via API	Backups, static files, data lakes, media	S3	Cloud Storage	Blob Storage
Block	Raw storage volumes attached to a VM	OS disks, databases, high-IOPS workloads	EBS	Persistent Disk	Azure Disk
File	Shared file system mounted by multiple VMs	Shared app configs, legacy NFS workloads	EFS	Filestore	Azure Files

Object storage is the most common choice for new cloud-native workloads. Block storage is used when you need low-latency disk access (databases, OS volumes). File storage is used when multiple VMs need to share the same directory tree.

Compute Options: When to Use What#

Option	Best for
Virtual Machines	Full OS control, legacy apps, stateful workloads, GPU or specialised hardware needs
Containers	Portable microservices, repeatable environments, CI/CD pipelines, Kubernetes workloads
Serverless	Event-driven functions, infrequent workloads, API backends — no idle cost, no server management

Key Architectural Patterns#

Stateless vs stateful — A stateless service holds no session data in memory. Any instance can serve any request. This makes horizontal scaling and rolling deployments much simpler. Store session data in a cache (Redis) or database instead of in the application process.

Idempotency — An operation is idempotent if running it multiple times produces the same result as running it once. Critical for retries and message queues. A HTTP PUT that sets a value to 42 is idempotent; a POST that increments a counter is not.

Immutable infrastructure — Never modify running servers. Instead, build a new image, deploy it, and retire the old one. This eliminates configuration drift and makes rollbacks predictable.

Blue/green deployment — Run two identical environments (blue = current, green = new). Route traffic to green when ready. Rollback is instant: switch traffic back to blue. Requires double the infrastructure during the switch.

Rolling deployment — Replace instances gradually, a few at a time. Lower resource cost than blue/green but rollback is slower.

Cost Basics#

Common cost drivers

Compute: charged per hour or second while instances are running
Data transfer: egress (data leaving a cloud region or to the internet) is usually charged; ingress is often free
Storage: charged per GB-month; also per request for object storage
Managed services: additional markup over raw infrastructure cost in exchange for reduced operational overhead

Pricing models

Model	Description
On-demand / pay-as-you-go	Full price, no commitment, maximum flexibility
Reserved / committed use	1 or 3 year commitment, 30–70% discount
Spot / preemptible	Spare capacity at steep discount (60–90%), can be interrupted with short notice

Use reserved instances for predictable baseline workloads. Use spot/preemptible for batch jobs, CI builds, and fault-tolerant workloads.

Reliability Concepts#

Regions — Geographically separated data centre clusters. Deploying across regions protects against regional outages.

Availability Zones (AZs) — Physically separate data centres within a single region, connected by low-latency links. Deploying across AZs protects against single-facility failures.

Fault tolerance — The system continues operating correctly even when a component fails. Achieved through redundancy, health checks, and automatic failover.

RTO (Recovery Time Objective) — The maximum acceptable time for a system to be restored after a failure.

RPO (Recovery Point Objective) — The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means you can tolerate losing up to 1 hour of data.

Quick Decision Guide#

If you need…	Use…
Store large files or backups	Object storage (S3 / Cloud Storage / Blob)
Run a containerised app at scale	Kubernetes (EKS / GKE / AKS)
Run event-driven code without a server	Serverless functions (Lambda / Cloud Functions / Azure Functions)
Share a file system across multiple VMs	File storage (EFS / Filestore / Azure Files)
Route HTTPS traffic based on URL path	L7 load balancer (ALB / Cloud LB / App Gateway)
Reduce costs on long-running VMs	Reserved / committed use pricing
Run fault-tolerant batch jobs cheaply	Spot / preemptible instances
Restrict what an application can access	Service account or instance-attached IAM role
Protect against a single data centre outage	Multi-AZ deployment
Protect against a regional outage	Multi-region deployment