Kubernetes for Cloud Engineers: What You Do With It Day-to-Day

Most Kubernetes content is written for cluster administrators or people studying for the CKA exam. This page is different: it covers what a cloud engineer does with Kubernetes day-to-day when the cluster already exists and your job is to deploy, monitor, and debug applications running on it.

Your role versus the cluster admin’s role

There is an important distinction that many beginners miss: the person who runs the Kubernetes control plane and the person who deploys applications to it are often different roles.

A cluster administrator provisions the cluster, manages node pools, handles upgrades, configures networking (CNI plugins), and deals with control plane issues. On most cloud platforms (GKE, EKS, AKS), the cloud provider manages much of this.

A cloud engineer working with Kubernetes writes and applies manifests, deploys new application versions, investigates why pods are crashing, manages namespaces, and configures autoscaling. This is what most cloud engineering job descriptions mean when they say “Kubernetes experience required.”

You do not need to know how to build a cluster from scratch to be effective with Kubernetes at work. You do need to be comfortable with kubectl and YAML manifests.

kubectl commands you will use every day

The verbs are always the same: get, describe, apply, delete, logs, exec. The resources change. Here are the most useful combinations:

# List resources in a namespace
kubectl get pods -n production
kubectl get deployments -n production
kubectl get services -n production

# Get a quick view across all namespaces
kubectl get pods -A

# Detailed info about a specific resource (events are at the bottom — very useful for debugging)
kubectl describe pod my-app-7d9f5b8c4-xk2p9 -n production

# Watch pods as they start up or restart
kubectl get pods -n production -w

# Read logs from a running pod
kubectl logs my-app-7d9f5b8c4-xk2p9 -n production

# Follow logs in real time
kubectl logs -f my-app-7d9f5b8c4-xk2p9 -n production

# Get logs from the previous container instance (useful after a crash)
kubectl logs my-app-7d9f5b8c4-xk2p9 -n production --previous

# Open a shell inside a running pod
kubectl exec -it my-app-7d9f5b8c4-xk2p9 -n production -- /bin/sh

The describe command is the most underused. The Events section at the bottom tells you what Kubernetes has been doing with a resource — failed image pulls, insufficient CPU, failed health checks, scheduling issues. When a pod is not starting, this is the first place to look.

Reading and writing YAML manifests

A Kubernetes manifest is a YAML file describing the desired state of a resource. Understanding the structure means you can read what someone else has written, spot mistakes, and write new manifests without copying blindly from examples.

Every Kubernetes resource has four top-level fields: apiVersion, kind, metadata, and spec. The spec is where most of the substance is.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
  labels:
    app: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: 123456.dkr.ecr.eu-west-1.amazonaws.com/my-app:1.2.0
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: production
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "256Mi"
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10

The things most often wrong in manifests written by engineers new to Kubernetes: missing resources (which means the scheduler cannot make good placement decisions), missing health probes (which means Kubernetes sends traffic to pods that are not ready), and label selector mismatches between the deployment and its pod template.

Services and how traffic reaches your pods

A Service gives a stable DNS name and IP address to a set of pods. Pods come and go — their IP addresses change every restart. A Service stays stable.

The three types you encounter most:

  • ClusterIP — accessible only inside the cluster. Default type. Used for internal communication between services.
  • NodePort — exposes the service on a port on every node. Mostly used for testing, not production.
  • LoadBalancer — provisions a cloud load balancer and exposes the service publicly. This is how most production workloads receive external traffic.

An Ingress sits in front of multiple services and routes traffic based on host names or URL paths. If your cluster has an Ingress controller (NGINX, Traefik, or the cloud-native equivalent), you will write Ingress resources instead of exposing every service as a LoadBalancer.

# Check if a service has endpoints (i.e. pods are matching the selector)
kubectl get endpoints my-service -n production

# This is the first thing to check when "service not reachable" errors occur
# If ENDPOINTS shows <none>, the selector is not matching any pods

Namespaces and what they mean in practice

Namespaces are logical partitions within a cluster. Most teams use them to separate environments (dev, staging, production), teams, or applications. Resources in one namespace cannot directly access resources in another without explicit network policies or service account bindings.

In practice, the most important thing about namespaces is remembering to include -n namespace-name in your commands. If you run kubectl get pods and see nothing, you are probably looking at the wrong namespace. kubectl get pods -A shows everything across all namespaces when you are not sure where something lives.

Context and namespace shortcuts: kubectl config set-context —current —namespace=production sets the default namespace for your current context so you do not have to type -n production on every command.

HPA: horizontal pod autoscaling

Kubernetes can automatically scale the number of pod replicas based on CPU or memory usage. The Horizontal Pod Autoscaler (HPA) watches metrics and adjusts the replica count to keep utilisation within a target range.

# Check the current state of an HPA
kubectl get hpa -n production

# Output example:
# NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS
# my-app   Deployment/my-app   45%/70%   2         10        3

The TARGETS column shows current usage versus the target threshold. If current usage is above the threshold, the HPA will add replicas. If below, it will scale down (with a cooldown period to avoid flapping).

For HPA to work, the pods must have resource requests set in their manifest. This is one reason why missing resource requests cause problems — the HPA cannot calculate utilisation percentage without a request value to compare against.

What managed Kubernetes hides from you

GKE, EKS, and AKS all manage the control plane for you. The things you do not have to worry about on a managed service include:

  • etcd backup and restore
  • API server availability and scaling
  • Control plane upgrades (though you still choose when node upgrades happen)
  • Certificate rotation for cluster components

What you still own on a managed service: node group sizing and configuration, application manifests and deployments, namespace organisation, RBAC rules for your team, persistent volume provisioning, and network policies.

Trade-off: Managed services are simpler to operate but have less flexibility. If you need unusual CNI configurations, specific kernel versions, or very tight control over node configuration, self-managed Kubernetes (or distributions like Rancher/OpenShift) gives more control. For most application teams, managed is the right choice.

Debugging pods that will not start

A systematic approach when a pod is stuck in Pending, CrashLoopBackOff, or ImagePullBackOff:

StatusWhat it meansWhere to look
PendingPod not scheduled onto a nodekubectl describe pod → Events; check resource requests vs available capacity
ImagePullBackOffCannot pull the container imageCheck image name and tag; check registry credentials
CrashLoopBackOffContainer starts then exits repeatedlykubectl logs --previous; container is likely crashing on startup
OOMKilledContainer exceeded memory limitIncrease memory limit or fix memory leak
ErrorContainer exited with non-zero statuskubectl logs to see stderr output

Realistic scenario: You deploy a new version and pods enter CrashLoopBackOff. Step one: kubectl logs pod-name —previous to see what the previous container instance printed before dying. Step two: kubectl describe pod pod-name to check Events. Step three: check if the application is reading environment variables or secrets it expects — often a missing secret or misconfigured env var causes the application to exit immediately on startup.