Docker for Cloud Engineers: A Practical Working Guide

Docker is the packaging format that makes cloud deployments consistent. This page is about how cloud engineers use Docker day-to-day — not container theory, but the specific skills you need to write, build, push, and debug container images in a real working environment.

What Docker actually does in cloud work

A container packages an application together with everything it needs to run: its code, runtime, libraries, and configuration. That bundle runs the same way everywhere — on your laptop, in a CI pipeline, and in production on a cloud container service.

The cloud angle: most cloud container services (AWS ECS, GCP Cloud Run, Azure Container Apps, Kubernetes on any platform) expect you to give them a container image. Docker is the standard tool for building and managing those images. Even if you never SSH into a container in production, you will be building and pushing images regularly.

The distinction between images and containers matters for communication:

An image is the built artefact — a read-only template stored in a registry.
A container is a running instance of an image — an active process with its own isolated filesystem and network namespace.

Writing Dockerfiles that work well

A Dockerfile is a set of instructions for building an image. Each instruction creates a layer in the image. Understanding layers is important — it affects build speed, image size, and cache behaviour.

A simple but solid Node.js Dockerfile:

FROM node:20-alpine

WORKDIR /app

# Copy package files first — this layer is cached unless dependencies change
COPY package*.json ./
RUN npm ci --only=production

# Copy application code
COPY . .

EXPOSE 3000
CMD ["node", "src/index.js"]

The reason to copy package.json before copying the rest of the code: Docker rebuilds from the changed layer onwards. If the first COPY only includes package files, Docker can reuse the cached npm install layer as long as your dependencies have not changed. This makes builds much faster during development.

Mistake to avoid: Running COPY . . before installing dependencies. This invalidates the cache on every code change and forces a full reinstall every time, which can add minutes to builds.

Multi-stage builds: smaller images in production

Multi-stage builds are one of the most useful Docker features and one of the most underused. The idea: use one image to build your application (which needs compilers, build tools, and dev dependencies) and a second, smaller image to run it (which only needs the runtime).

A multi-stage build for a Go application:

# Stage 1: build
FROM golang:1.22-alpine AS builder

WORKDIR /app
COPY go.* ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 go build -o /app/server ./cmd/server

# Stage 2: run
FROM scratch

COPY --from=builder /app/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

The final image in this example is built on scratch — it contains only the compiled binary. A typical Go binary image built this way is 10–20 MB instead of 300+ MB for the build image. Smaller images mean faster pulls, smaller attack surface, and lower storage costs.

The same pattern works for compiled languages (Rust, Java, C++), or for frontend applications where you build with Node and serve the static output with Nginx.

Container registries: push, pull, and tag

A container registry stores your images. Every major cloud provider has one: Amazon ECR, Google Artifact Registry, Azure Container Registry. Docker Hub is the public default. In most organisations you will use a private registry so images are not publicly accessible.

The typical workflow for pushing an image to a registry:

# Build the image and tag it
docker build -t my-app:1.0.0 .

# Tag for a specific registry
docker tag my-app:1.0.0 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-app:1.0.0

# Authenticate to ECR
aws ecr get-login-password --region eu-west-1 \
  | docker login --username AWS --password-stdin \
    123456789.dkr.ecr.eu-west-1.amazonaws.com

# Push
docker push 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-app:1.0.0

Tagging conventions matter. Using latest as your only tag is a common mistake — you lose the ability to roll back to a previous version. Most teams tag images with the Git commit SHA, a version number, or both: 1.0.0 for a release tag and abc1234 for the commit hash. The CI pipeline usually handles this automatically.

Running containers locally for testing

Local Docker use is mostly about testing your image before pushing it. The commands you reach for most often:

# Run a container interactively (useful for debugging)
docker run -it my-app:1.0.0 /bin/sh

# Run with environment variables and port mapping
docker run -p 3000:3000 -e DATABASE_URL=postgres://... my-app:1.0.0

# Run in background
docker run -d --name my-app-test -p 3000:3000 my-app:1.0.0

# View logs
docker logs my-app-test

# Execute a command in a running container
docker exec -it my-app-test /bin/sh

# Stop and remove
docker stop my-app-test && docker rm my-app-test

A scenario that comes up regularly: an image runs fine on your laptop but fails in production. The usual causes are environment variables not being set (the container is missing configuration it expects), port mismatches, or the image being built for the wrong CPU architecture (AMD64 vs ARM64 — relevant if you are building on an Apple Silicon Mac).

What managed cloud services abstract away

When you deploy to AWS ECS, Cloud Run, or a managed Kubernetes service, the platform handles a lot of container management for you. Understanding what it abstracts helps you know when something is your responsibility and when it is the platform’s.

You manage	The platform manages
The Dockerfile and image content	Container scheduling and placement
Image versioning and tagging	Health checks and restart on failure
Environment variables and secrets	Network routing to healthy instances
Resource limits (CPU/memory)	Node provisioning (on managed services)
Security scanning (often via CI)	TLS termination (on most services)

Cloud Run is the extreme end of abstraction — you give it a container image and it handles everything else. ECS with EC2 launch type is the other end — you manage the EC2 instances, scaling policies, and more. Knowing where your service sits on this spectrum tells you what problems are yours to debug.

Common image mistakes and how to avoid them

Large images are the most common avoidable problem. They slow down deployments, cost more to store, and sometimes hit time limits in CI. A few practices that keep images lean:

Use minimal base images. alpine variants are much smaller than full distributions. node:20-alpine is around 50 MB versus 900+ MB for node:20. Use distroless or scratch for compiled languages.
Clean up in the same layer you install. If you apt-get install, follow it with apt-get clean && rm -rf /var/lib/apt/lists/* in the same RUN instruction — not a separate one.
Use a .dockerignore file. Without it, COPY . . includes everything — node_modules, .git, local env files, test fixtures. All of that goes into the image and the build context.
Do not include secrets in images. Environment variables, API keys, and certificates should be injected at runtime, not baked into the image. If you include a secret in an image layer, it stays in that layer even if you delete it in a later step.

Useful tool: docker image inspect my-app:1.0.0 shows layer sizes. dive is a third-party tool that lets you explore image layers interactively and find what is taking up space.