Linux for Cloud Engineers: The Skills That Actually Matter

You do not need to be a Linux system administrator to be a good cloud engineer. But you do need to be fluent enough at the command line that it does not slow you down. This page focuses on the Linux knowledge that comes up repeatedly in cloud work — grouped by situation, not by command.

Why Linux matters even for serverless engineers

Cloud runs on Linux. Virtual machines, containers, Kubernetes nodes, CI/CD agents, managed databases — the underlying operating system is almost always Linux. Even if your daily work involves deploying serverless functions and managed services, you will hit Linux eventually.

A few scenarios where Linux knowledge saves time:

  • SSH into an EC2 instance to diagnose why an application is not starting
  • Read logs in a container by running a shell inside it
  • Write a CI pipeline step that processes files or manipulates environment variables
  • Debug a permissions issue on a file that a service is trying to read
  • Check if a process is running and consuming too much memory on a VM

Each of these requires basic Linux fluency. You do not need to manage kernel modules or compile software from source. You do need to move around a filesystem, read processes, manage files, and understand permissions.

Filesystem and file operations

Most of your time on a Linux command line is spent navigating the filesystem and working with files. These are the commands that do real work:

# Where am I, what is here
pwd
ls -lah           # -l for details, -a for hidden files, -h for human-readable sizes

# Navigation
cd /var/log
cd ~              # home directory
cd -              # last directory

# Finding things
find /var/log -name "*.log" -mtime -1   # logs modified in the last 24 hours
find /etc -name "*.conf" -type f

# File operations
cp -r src/ dest/
mv old-name.txt new-name.txt
rm -rf directory/                       # be careful — no undo
mkdir -p path/to/new/dir

# Reading files
cat /etc/hosts
less /var/log/syslog     # paginated — q to quit, / to search
tail -n 100 /var/log/app.log
tail -f /var/log/app.log  # follow in real time
grep "ERROR" /var/log/app.log
grep -r "database" /etc/app/    # recursive search

File permissions: understanding and fixing them

Linux permissions determine who can read, write, or execute a file. Every file has an owner (a user), a group, and a set of permission bits. Permission problems are one of the most common causes of services failing to start in cloud environments.

Reading a permission string like -rwxr-xr—:

  • First character: file type (- for file, d for directory, l for symlink)
  • Next three (rwx): owner permissions — read, write, execute
  • Next three (r-x): group permissions — read, no write, execute
  • Last three (r—): others — read only
# Change permissions
chmod 644 config.json        # owner rw, group r, others r
chmod 600 ~/.ssh/id_rsa      # owner rw only — required for SSH keys
chmod +x script.sh           # make executable for all

# Change ownership
chown ubuntu:ubuntu /app     # user:group
chown -R www-data /var/www   # recursive

# Check who you are and what groups you belong to
whoami
id
groups

Real scenario: You deploy an application and it cannot read its config file. The error says “Permission denied”. Check: who is the process running as? (ps aux | grep my-app). Who owns the config file? (ls -l /etc/app/config.json). If the process user does not have read permission on the file, fix it with chown or chmod.

Process management

Understanding running processes matters when you are diagnosing why a server is slow, why a service is not responding, or whether an application is actually running.

# What is running
ps aux                        # all processes, detailed
ps aux | grep nginx           # find a specific process
top                           # live process monitor (q to quit)
htop                          # nicer version of top, if installed

# Kill a process
kill 1234                     # send SIGTERM (graceful)
kill -9 1234                  # send SIGKILL (force)
pkill nginx                   # kill by process name

# Check what is listening on a port
ss -tlnp                      # show TCP listeners with process names
ss -tlnp | grep :80           # is anything listening on port 80?

ss -tlnp is one of the most useful diagnostic commands in cloud work. When a service claims to be running but connections are failing, checking whether it is actually listening on the expected port (and interface) solves the problem quickly.

systemd and journalctl

Most modern Linux distributions use systemd to manage services. If an application is deployed as a systemd service — which is common on virtual machines — you need to know how to start, stop, and inspect it.

# Service management
systemctl status nginx
systemctl start nginx
systemctl stop nginx
systemctl restart nginx
systemctl reload nginx        # reload config without full restart
systemctl enable nginx        # start automatically on boot
systemctl disable nginx

# Read logs from a service
journalctl -u nginx           # all logs for nginx
journalctl -u nginx -f        # follow in real time
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx -n 100    # last 100 lines
journalctl -u nginx -p err    # errors only

journalctl is particularly useful because it captures everything a service has printed to stdout and stderr, along with kernel messages, in one place. It is the first place to look when a service fails to start: journalctl -u service-name -e jumps to the end of the log where the recent failure will be.

SSH key management

SSH key pairs are the standard way to authenticate to Linux servers in cloud environments. Password authentication is usually disabled. Understanding how to generate, use, and manage SSH keys is a basic requirement.

# Generate a new key pair
ssh-keygen -t ed25519 -C "your-email@example.com"
# Creates: ~/.ssh/id_ed25519 (private key) and ~/.ssh/id_ed25519.pub (public key)

# Connect to a server
ssh ubuntu@10.0.1.50
ssh -i ~/.ssh/my-key.pem ec2-user@ec2-12-34-56-78.compute.amazonaws.com

# Copy your public key to a server
ssh-copy-id -i ~/.ssh/id_ed25519.pub ubuntu@10.0.1.50

# SSH config file (~/.ssh/config) — simplifies repeated connections
# Host myserver
#   HostName 10.0.1.50
#   User ubuntu
#   IdentityFile ~/.ssh/id_ed25519
# Then you just: ssh myserver

Key permissions matter. SSH will refuse to use a private key if it has overly permissive permissions. The private key must be readable only by you: chmod 600 ~/.ssh/id_ed25519. The .ssh directory itself should be chmod 700.

Shell scripting basics

You do not need to be a bash expert, but being able to write a basic script to automate a task saves time and is expected at most cloud engineering levels.

#!/bin/bash
set -euo pipefail    # exit on error, unset variables, pipe failures

# Variables
ENVIRONMENT=${1:-staging}    # first argument, default to "staging"
BUCKET_NAME="my-app-${ENVIRONMENT}"

# Conditionals
if [ "$ENVIRONMENT" == "production" ]; then
  echo "Deploying to production..."
else
  echo "Deploying to ${ENVIRONMENT}..."
fi

# Loops
for region in eu-west-1 us-east-1; do
  echo "Syncing to ${region}"
  aws s3 sync ./dist "s3://${BUCKET_NAME}-${region}"
done

# Check command succeeded
if aws s3 ls "s3://${BUCKET_NAME}" > /dev/null 2>&1; then
  echo "Bucket exists"
else
  echo "Bucket not found" >&2
  exit 1
fi

The set -euo pipefail line at the top is important: -e exits on error so your script does not silently continue after a failure, -u treats unset variables as errors, and -o pipefail makes pipe failures visible. Scripts without these options can do unexpected things when commands fail.