Kubernetes for Cloud Engineers: What You Do With It Day-to-Day
Most Kubernetes content is written for cluster administrators or people studying for the CKA exam. This page is different: it covers what a cloud engineer does with Kubernetes day-to-day when the cluster already exists and your job is to deploy, monitor, and debug applications running on it.
Your role versus the cluster admin’s role
There is an important distinction that many beginners miss: the person who runs the Kubernetes control plane and the person who deploys applications to it are often different roles.
A cluster administrator provisions the cluster, manages node pools, handles upgrades, configures networking (CNI plugins), and deals with control plane issues. On most cloud platforms (GKE, EKS, AKS), the cloud provider manages much of this.
A cloud engineer working with Kubernetes writes and applies manifests, deploys new application versions, investigates why pods are crashing, manages namespaces, and configures autoscaling. This is what most cloud engineering job descriptions mean when they say “Kubernetes experience required.”
You do not need to know how to build a cluster from scratch to be effective with Kubernetes at work. You do need to be comfortable with kubectl and YAML manifests.
kubectl commands you will use every day
The verbs are always the same: get, describe, apply, delete, logs, exec. The resources change. Here are the most useful combinations:
# List resources in a namespace
kubectl get pods -n production
kubectl get deployments -n production
kubectl get services -n production
# Get a quick view across all namespaces
kubectl get pods -A
# Detailed info about a specific resource (events are at the bottom — very useful for debugging)
kubectl describe pod my-app-7d9f5b8c4-xk2p9 -n production
# Watch pods as they start up or restart
kubectl get pods -n production -w
# Read logs from a running pod
kubectl logs my-app-7d9f5b8c4-xk2p9 -n production
# Follow logs in real time
kubectl logs -f my-app-7d9f5b8c4-xk2p9 -n production
# Get logs from the previous container instance (useful after a crash)
kubectl logs my-app-7d9f5b8c4-xk2p9 -n production --previous
# Open a shell inside a running pod
kubectl exec -it my-app-7d9f5b8c4-xk2p9 -n production -- /bin/shThe describe command is the most underused. The Events section at the bottom tells you what Kubernetes has been doing with a resource — failed image pulls, insufficient CPU, failed health checks, scheduling issues. When a pod is not starting, this is the first place to look.
Reading and writing YAML manifests
A Kubernetes manifest is a YAML file describing the desired state of a resource. Understanding the structure means you can read what someone else has written, spot mistakes, and write new manifests without copying blindly from examples.
Every Kubernetes resource has four top-level fields: apiVersion, kind, metadata, and spec. The spec is where most of the substance is.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: production
labels:
app: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: 123456.dkr.ecr.eu-west-1.amazonaws.com/my-app:1.2.0
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10The things most often wrong in manifests written by engineers new to Kubernetes: missing resources (which means the scheduler cannot make good placement decisions), missing health probes (which means Kubernetes sends traffic to pods that are not ready), and label selector mismatches between the deployment and its pod template.
Services and how traffic reaches your pods
A Service gives a stable DNS name and IP address to a set of pods. Pods come and go — their IP addresses change every restart. A Service stays stable.
The three types you encounter most:
- ClusterIP — accessible only inside the cluster. Default type. Used for internal communication between services.
- NodePort — exposes the service on a port on every node. Mostly used for testing, not production.
- LoadBalancer — provisions a cloud load balancer and exposes the service publicly. This is how most production workloads receive external traffic.
An Ingress sits in front of multiple services and routes traffic based on host names or URL paths. If your cluster has an Ingress controller (NGINX, Traefik, or the cloud-native equivalent), you will write Ingress resources instead of exposing every service as a LoadBalancer.
# Check if a service has endpoints (i.e. pods are matching the selector)
kubectl get endpoints my-service -n production
# This is the first thing to check when "service not reachable" errors occur
# If ENDPOINTS shows <none>, the selector is not matching any podsNamespaces and what they mean in practice
Namespaces are logical partitions within a cluster. Most teams use them to separate environments (dev, staging, production), teams, or applications. Resources in one namespace cannot directly access resources in another without explicit network policies or service account bindings.
In practice, the most important thing about namespaces is remembering to include -n namespace-name in your commands. If you run kubectl get pods and see nothing, you are probably looking at the wrong namespace. kubectl get pods -A shows everything across all namespaces when you are not sure where something lives.
Context and namespace shortcuts: kubectl config set-context —current —namespace=production sets the default namespace for your current context so you do not have to type -n production on every command.
HPA: horizontal pod autoscaling
Kubernetes can automatically scale the number of pod replicas based on CPU or memory usage. The Horizontal Pod Autoscaler (HPA) watches metrics and adjusts the replica count to keep utilisation within a target range.
# Check the current state of an HPA
kubectl get hpa -n production
# Output example:
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# my-app Deployment/my-app 45%/70% 2 10 3The TARGETS column shows current usage versus the target threshold. If current usage is above the threshold, the HPA will add replicas. If below, it will scale down (with a cooldown period to avoid flapping).
For HPA to work, the pods must have resource requests set in their manifest. This is one reason why missing resource requests cause problems — the HPA cannot calculate utilisation percentage without a request value to compare against.
What managed Kubernetes hides from you
GKE, EKS, and AKS all manage the control plane for you. The things you do not have to worry about on a managed service include:
- etcd backup and restore
- API server availability and scaling
- Control plane upgrades (though you still choose when node upgrades happen)
- Certificate rotation for cluster components
What you still own on a managed service: node group sizing and configuration, application manifests and deployments, namespace organisation, RBAC rules for your team, persistent volume provisioning, and network policies.
Trade-off: Managed services are simpler to operate but have less flexibility. If you need unusual CNI configurations, specific kernel versions, or very tight control over node configuration, self-managed Kubernetes (or distributions like Rancher/OpenShift) gives more control. For most application teams, managed is the right choice.
Debugging pods that will not start
A systematic approach when a pod is stuck in Pending, CrashLoopBackOff, or ImagePullBackOff:
| Status | What it means | Where to look |
|---|---|---|
Pending | Pod not scheduled onto a node | kubectl describe pod → Events; check resource requests vs available capacity |
ImagePullBackOff | Cannot pull the container image | Check image name and tag; check registry credentials |
CrashLoopBackOff | Container starts then exits repeatedly | kubectl logs --previous; container is likely crashing on startup |
OOMKilled | Container exceeded memory limit | Increase memory limit or fix memory leak |
Error | Container exited with non-zero status | kubectl logs to see stderr output |
Realistic scenario: You deploy a new version and pods enter CrashLoopBackOff. Step one: kubectl logs pod-name —previous to see what the previous container instance printed before dying. Step two: kubectl describe pod pod-name to check Events. Step three: check if the application is reading environment variables or secrets it expects — often a missing secret or misconfigured env var causes the application to exit immediately on startup.
Summary
- Most cloud engineers work with Kubernetes as an application platform, not as cluster administrators — focus on kubectl and manifests
kubectl describeis the most useful diagnostic command — the Events section tells the story- Always set resource requests and readiness probes in your pod specs; missing these causes real operational problems
- Managed Kubernetes (GKE, EKS, AKS) handles the control plane — you still own your application manifests and RBAC
- Pod status codes (CrashLoopBackOff, Pending, ImagePullBackOff) each point to a specific category of problem