DevOps Cheatsheet: Tools, Pipelines, and Key Concepts

This page is a quick reference for DevOps engineers. It covers the tools, pipeline stages, deployment patterns, and observability concepts you will encounter most often in a cloud-based DevOps role.

Core DevOps Tool Categories#

Category	Common Tools
Source control	Git, GitHub, GitLab, Bitbucket
CI/CD	GitHub Actions, GitLab CI, Jenkins, CircleCI, Azure Pipelines
Containers	Docker, Kubernetes, Helm
Infrastructure as Code	Terraform, Pulumi, Ansible, CloudFormation
Monitoring & observability	Prometheus, Grafana, Datadog, CloudWatch, New Relic
Secrets management	HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault
Artifact registry	Docker Hub, AWS ECR, GCP Artifact Registry, Azure ACR

CI/CD Pipeline Stages#

A typical CI/CD pipeline moves code from a developer’s commit to a running production deployment through a defined set of stages.

Stage	What happens
Source	A commit or pull request triggers the pipeline
Build	Source code is compiled or packaged into an artifact
Test	Unit tests, integration tests, linting, and static analysis run
Package	The artifact is containerised or packaged and pushed to a registry
Deploy	The artifact is deployed to a target environment (staging or production)
Monitor	Metrics, logs, and alerts confirm the deployment is healthy

Each stage acts as a gate. If a stage fails, the pipeline stops and the change does not proceed.

GitHub Actions Quick Reference#

A GitHub Actions workflow is a YAML file in .github/workflows/.

name: CI Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'
      - name: Install dependencies
        run: npm ci
      - name: Run tests
        run: npm test
        env:
          NODE_ENV: test

  deploy:
    needs: build-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Deploy
        run: ./deploy.sh

Key keywords

Keyword	Purpose
`on`	Trigger events (push, pull_request, schedule, workflow_dispatch)
`jobs`	Top-level units of work, run in parallel by default
`steps`	Sequential tasks within a job
`uses`	Reference a pre-built action
`run`	Execute a shell command
`env`	Set environment variables for a step or job
`needs`	Declare a dependency on another job (forces sequential order)

Pipeline Patterns#

Feature branch CI — Each feature branch gets its own pipeline run. Branches are tested in isolation before merging to the main branch. Prevents broken code from reaching production.

Trunk-based delivery — All developers push small, frequent commits directly to a single main branch. Feature flags hide incomplete work. This reduces merge conflicts and keeps the pipeline fast.

GitOps — The desired state of infrastructure and deployments is declared in a Git repository. A controller (such as Argo CD or Flux) continuously reconciles the running state to match what is in Git. The Git commit history becomes your audit log.

Shift Left#

“Shift left” means moving testing, security checks, and compliance validation earlier in the development process — toward the developer’s local machine and the pull request stage, rather than waiting for a dedicated QA or security phase at the end.

In practice this means:

Unit and integration tests run on every pull request
Linters and code formatters run in pre-commit hooks
Static application security testing (SAST) runs in CI before merge
Dependency vulnerability scanning runs automatically on every build
Infrastructure code (Terraform) is linted and validated before it is applied

Infrastructure as Code Concepts#

Declarative vs imperative

Declarative: you describe the end state and the tool figures out how to get there. Terraform and Kubernetes manifests are declarative.
Imperative: you write the steps to take. Ansible playbooks and shell scripts are imperative.

Why state matters — Terraform keeps a state file that records the current known state of your infrastructure. Without it, Terraform cannot calculate what changes are needed. Keep state files in remote backends (S3, GCS, Azure Blob) and never edit them by hand.

Drift detection — When someone makes a manual change to infrastructure outside of Terraform (e.g., in the console), the live state diverges from the declared state. Running terraform plan detects drift. GitOps tools like Argo CD continuously detect and can auto-remediate drift.

The Three Pillars of Observability#

Pillar	What it captures	Examples
Metrics	Numeric measurements over time	CPU %, request rate, error count, latency
Logs	Timestamped records of events	Application log lines, audit logs, error traces
Traces	End-to-end journey of a single request through services	Distributed tracing spans across microservices

RED method (for services)

Rate — requests per second
Errors — error rate as a proportion of requests
Duration — latency distribution (p50, p95, p99)

USE method (for infrastructure resources)

Utilisation — what percentage of capacity is being used
Saturation — how much work is queued waiting for the resource
Errors — error events from the resource

Deployment Strategies#

Strategy	Description
Rolling update	Replace instances gradually, a few at a time; lowest resource overhead
Blue/green	Run two full environments; switch traffic instantly; easy rollback
Canary	Route a small percentage of traffic to the new version; expand if metrics look good
Feature flags	Deploy code to all users but activate the feature only for a controlled group

On-Call Basics#

Alert fatigue — When too many alerts fire, engineers start ignoring them. Every alert should be actionable and map to a runbook. Alerts that are always ignored should be removed or demoted to warnings.

What makes a good runbook

What alert triggered this runbook
What the service does (brief context)
Steps to diagnose the problem
Steps to mitigate or resolve it
Escalation path if the above steps do not work
Links to dashboards, logs, and related runbooks

Post-mortems — A structured review after an incident. The goal is to understand what happened and prevent recurrence, not to assign blame. A blameless post-mortem assumes people acted with good intentions and asks “what did the system allow?” rather than “who made the mistake?”.

Common DevOps Interview Questions#

Question	Short answer
What is the difference between CI and CD?	CI = automatically build and test on every commit. CD = automatically deploy that tested build to an environment.
What is infrastructure as code?	Managing infrastructure through version-controlled configuration files rather than manual console actions.
How do you store secrets in a pipeline?	Use a secrets manager (Vault, AWS Secrets Manager) or the CI platform’s encrypted secrets store. Never put secrets in code or environment files committed to Git.
What is the difference between blue/green and canary?	Blue/green switches all traffic at once; canary routes a small percentage first and promotes gradually.
What is drift in Terraform?	When live infrastructure differs from what is declared in code, usually due to manual changes.
What are the three pillars of observability?	Metrics, logs, and traces.
What does “shift left” mean?	Moving testing and security checks earlier in the development process to catch issues sooner.