DevOps Cheatsheet: Tools, Pipelines, and Key Concepts
This page is a quick reference for DevOps engineers. It covers the tools, pipeline stages, deployment patterns, and observability concepts you will encounter most often in a cloud-based DevOps role.
Core DevOps Tool Categories#
| Category | Common Tools |
|---|---|
| Source control | Git, GitHub, GitLab, Bitbucket |
| CI/CD | GitHub Actions, GitLab CI, Jenkins, CircleCI, Azure Pipelines |
| Containers | Docker, Kubernetes, Helm |
| Infrastructure as Code | Terraform, Pulumi, Ansible, CloudFormation |
| Monitoring & observability | Prometheus, Grafana, Datadog, CloudWatch, New Relic |
| Secrets management | HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault |
| Artifact registry | Docker Hub, AWS ECR, GCP Artifact Registry, Azure ACR |
CI/CD Pipeline Stages#
A typical CI/CD pipeline moves code from a developer’s commit to a running production deployment through a defined set of stages.
| Stage | What happens |
|---|---|
| Source | A commit or pull request triggers the pipeline |
| Build | Source code is compiled or packaged into an artifact |
| Test | Unit tests, integration tests, linting, and static analysis run |
| Package | The artifact is containerised or packaged and pushed to a registry |
| Deploy | The artifact is deployed to a target environment (staging or production) |
| Monitor | Metrics, logs, and alerts confirm the deployment is healthy |
Each stage acts as a gate. If a stage fails, the pipeline stops and the change does not proceed.
GitHub Actions Quick Reference#
A GitHub Actions workflow is a YAML file in .github/workflows/.
name: CI Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
env:
NODE_ENV: test
deploy:
needs: build-and-test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- name: Deploy
run: ./deploy.sh
Key keywords
| Keyword | Purpose |
|---|---|
on | Trigger events (push, pull_request, schedule, workflow_dispatch) |
jobs | Top-level units of work, run in parallel by default |
steps | Sequential tasks within a job |
uses | Reference a pre-built action |
run | Execute a shell command |
env | Set environment variables for a step or job |
needs | Declare a dependency on another job (forces sequential order) |
Pipeline Patterns#
Feature branch CI — Each feature branch gets its own pipeline run. Branches are tested in isolation before merging to the main branch. Prevents broken code from reaching production.
Trunk-based delivery — All developers push small, frequent commits directly to a single main branch. Feature flags hide incomplete work. This reduces merge conflicts and keeps the pipeline fast.
GitOps — The desired state of infrastructure and deployments is declared in a Git repository. A controller (such as Argo CD or Flux) continuously reconciles the running state to match what is in Git. The Git commit history becomes your audit log.
Shift Left#
“Shift left” means moving testing, security checks, and compliance validation earlier in the development process — toward the developer’s local machine and the pull request stage, rather than waiting for a dedicated QA or security phase at the end.
In practice this means:
- Unit and integration tests run on every pull request
- Linters and code formatters run in pre-commit hooks
- Static application security testing (SAST) runs in CI before merge
- Dependency vulnerability scanning runs automatically on every build
- Infrastructure code (Terraform) is linted and validated before it is applied
Infrastructure as Code Concepts#
Declarative vs imperative
- Declarative: you describe the end state and the tool figures out how to get there. Terraform and Kubernetes manifests are declarative.
- Imperative: you write the steps to take. Ansible playbooks and shell scripts are imperative.
Why state matters — Terraform keeps a state file that records the current known state of your infrastructure. Without it, Terraform cannot calculate what changes are needed. Keep state files in remote backends (S3, GCS, Azure Blob) and never edit them by hand.
Drift detection — When someone makes a manual change to infrastructure outside of Terraform (e.g., in the console), the live state diverges from the declared state. Running terraform plan detects drift. GitOps tools like Argo CD continuously detect and can auto-remediate drift.
The Three Pillars of Observability#
| Pillar | What it captures | Examples |
|---|---|---|
| Metrics | Numeric measurements over time | CPU %, request rate, error count, latency |
| Logs | Timestamped records of events | Application log lines, audit logs, error traces |
| Traces | End-to-end journey of a single request through services | Distributed tracing spans across microservices |
RED method (for services)
- Rate — requests per second
- Errors — error rate as a proportion of requests
- Duration — latency distribution (p50, p95, p99)
USE method (for infrastructure resources)
- Utilisation — what percentage of capacity is being used
- Saturation — how much work is queued waiting for the resource
- Errors — error events from the resource
Deployment Strategies#
| Strategy | Description |
|---|---|
| Rolling update | Replace instances gradually, a few at a time; lowest resource overhead |
| Blue/green | Run two full environments; switch traffic instantly; easy rollback |
| Canary | Route a small percentage of traffic to the new version; expand if metrics look good |
| Feature flags | Deploy code to all users but activate the feature only for a controlled group |
On-Call Basics#
Alert fatigue — When too many alerts fire, engineers start ignoring them. Every alert should be actionable and map to a runbook. Alerts that are always ignored should be removed or demoted to warnings.
What makes a good runbook
- What alert triggered this runbook
- What the service does (brief context)
- Steps to diagnose the problem
- Steps to mitigate or resolve it
- Escalation path if the above steps do not work
- Links to dashboards, logs, and related runbooks
Post-mortems — A structured review after an incident. The goal is to understand what happened and prevent recurrence, not to assign blame. A blameless post-mortem assumes people acted with good intentions and asks “what did the system allow?” rather than “who made the mistake?”.
Common DevOps Interview Questions#
| Question | Short answer |
|---|---|
| What is the difference between CI and CD? | CI = automatically build and test on every commit. CD = automatically deploy that tested build to an environment. |
| What is infrastructure as code? | Managing infrastructure through version-controlled configuration files rather than manual console actions. |
| How do you store secrets in a pipeline? | Use a secrets manager (Vault, AWS Secrets Manager) or the CI platform’s encrypted secrets store. Never put secrets in code or environment files committed to Git. |
| What is the difference between blue/green and canary? | Blue/green switches all traffic at once; canary routes a small percentage first and promotes gradually. |
| What is drift in Terraform? | When live infrastructure differs from what is declared in code, usually due to manual changes. |
| What are the three pillars of observability? | Metrics, logs, and traces. |
| What does “shift left” mean? | Moving testing and security checks earlier in the development process to catch issues sooner. |