Python for Cloud Engineers: Automation, SDKs, and Serverless

Python is the most widely used language for cloud automation. This page is not a Python tutorial — it covers the specific patterns, libraries, and habits that make Python useful for cloud engineering work: interacting with cloud APIs, writing serverless functions, processing data, and automating repetitive tasks.

Python versus bash: when to use which

Both Python and bash are valid tools for cloud automation. The decision is mostly about complexity and maintainability.

Reach for bash when: chaining together CLI commands, writing a quick deploy script, working with environment variables and file paths, gluing tools together
Reach for Python when: calling cloud APIs, parsing JSON or YAML, handling error cases properly, writing something others will need to read and maintain, building anything non-trivial

In practice, Python is the better choice for most cloud automation because cloud SDKs have first-class Python support, the error handling is better than bash, and Python code is easier to test. Bash is still the right choice for simple pipeline steps.

boto3: the AWS SDK for Python

boto3 is the official AWS SDK for Python. It gives you a Python interface to every AWS service. The two main ways to use it are clients (low-level, returns raw API responses as dictionaries) and resources (higher-level, object-oriented wrapper around common services).

import boto3

# List S3 buckets
s3 = boto3.client('s3')
response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

# Upload a file
s3.upload_file('local-file.txt', 'my-bucket', 'path/in/bucket/file.txt')

# Read a file from S3
obj = s3.get_object(Bucket='my-bucket', Key='path/in/bucket/file.txt')
content = obj['Body'].read().decode('utf-8')

A pattern you will use constantly: paginating results. AWS API calls return a maximum number of results per request and a continuation token if there are more. The correct way to handle this is with paginators, not by hardcoding a limit:

# Wrong — only gets first page of results
response = s3.list_objects_v2(Bucket='my-bucket')
objects = response['Contents']   # might be incomplete

# Correct — handles pagination automatically
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='my-bucket')
objects = [obj for page in pages for obj in page.get('Contents', [])]

Google Cloud SDK for Python

GCP’s Python libraries follow a similar pattern to boto3 but with a more modern, consistent API. The libraries are split by service: google-cloud-storage, google-cloud-bigquery, google-cloud-pubsub, and so on.

from google.cloud import storage

# List buckets
client = storage.Client()
buckets = list(client.list_buckets())
for bucket in buckets:
    print(bucket.name)

# Upload a file to GCS
bucket = client.bucket('my-bucket')
blob = bucket.blob('path/in/bucket/file.txt')
blob.upload_from_filename('local-file.txt')

# Download a file
blob.download_to_filename('downloaded-file.txt')

Authentication in GCP Python code follows Application Default Credentials (ADC). When running locally with gcloud auth application-default login, the SDK picks up your credentials automatically. In production (Cloud Run, GKE, etc.), the service account attached to the resource is used. You rarely need to handle credentials explicitly.

Writing Lambda and Cloud Functions

Serverless functions (AWS Lambda, GCP Cloud Functions, Azure Functions) are Python functions with a specific entry point signature. The function receives an event payload and a context object, does its work, and returns a response.

import json
import boto3
import os

# Lambda function that processes S3 events
def handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    s3 = boto3.client('s3')

    try:
        obj = s3.get_object(Bucket=bucket, Key=key)
        data = json.loads(obj['Body'].read())

        # Process the data
        result = process_data(data)

        # Write output
        s3.put_object(
            Bucket=os.environ['OUTPUT_BUCKET'],
            Key=f"processed/{key}",
            Body=json.dumps(result),
            ContentType='application/json'
        )

        return {'statusCode': 200, 'body': 'Success'}

    except Exception as e:
        print(f"Error processing {key}: {e}")
        raise  # Re-raise so Lambda marks the invocation as failed


def process_data(data):
    # Your business logic here
    return {'processed': True, 'count': len(data)}

Key things to get right with serverless Python functions:

Use environment variables (os.environ) for configuration — never hardcode bucket names, table names, or API endpoints
Re-raise or propagate exceptions — swallowing errors silently makes debugging very hard
Keep dependencies minimal — Lambda has a size limit, and large packages slow cold starts
Do not put credentials in the function code — use the execution role’s permissions

Working with JSON and YAML

Cloud engineering involves a lot of JSON and YAML — API responses, configuration files, Terraform outputs, Kubernetes manifests. Python handles both cleanly.

import json
import yaml

# JSON
data = {'name': 'my-app', 'version': '1.0.0', 'replicas': 3}

# Serialize to string
json_string = json.dumps(data, indent=2)

# Deserialize from string
parsed = json.loads(json_string)

# Read/write files
with open('config.json', 'w') as f:
    json.dump(data, f, indent=2)

with open('config.json', 'r') as f:
    loaded = json.load(f)

# YAML (requires: pip install pyyaml)
with open('manifest.yaml', 'r') as f:
    manifest = yaml.safe_load(f)    # use safe_load, not load

# Modify and write back
manifest['spec']['replicas'] = 5
with open('manifest-updated.yaml', 'w') as f:
    yaml.dump(manifest, f, default_flow_style=False)

Always use yaml.safe_load rather than yaml.load. The unsafe version allows arbitrary Python object deserialisation from YAML, which is a security risk if you are processing YAML from untrusted sources.

Environment variables and configuration patterns

Cloud applications read their configuration from environment variables. Python has a simple pattern for this, but getting it right matters for both security and reliability.

import os
from dataclasses import dataclass

@dataclass
class Config:
    database_url: str
    bucket_name: str
    environment: str
    debug: bool = False

def load_config() -> Config:
    # os.environ[key] raises KeyError if missing — good for required vars
    # os.environ.get(key, default) returns a default — good for optional vars
    return Config(
        database_url=os.environ['DATABASE_URL'],
        bucket_name=os.environ['BUCKET_NAME'],
        environment=os.environ.get('ENVIRONMENT', 'production'),
        debug=os.environ.get('DEBUG', 'false').lower() == 'true',
    )

config = load_config()

Using os.environ[key] (not .get) for required variables means the application fails immediately on startup with a clear error if the variable is missing — rather than failing later in a confusing way when it first tries to use the value.

Portfolio project idea: Write a Python script that uses boto3 or the GCP SDK to query cloud resources and produce a report — for example, list all EC2 instances across all regions and their costs, or find all S3 buckets that have public access enabled. This kind of script is genuinely useful and demonstrates real cloud SDK knowledge.

CLI scripting with argparse

Cloud automation scripts often need to accept arguments from the command line. Python’s built-in argparse module handles this cleanly and automatically generates help text.

import argparse

def main():
    parser = argparse.ArgumentParser(
        description='Deploy application to a cloud environment'
    )
    parser.add_argument('environment', choices=['dev', 'staging', 'production'])
    parser.add_argument('--version', required=True, help='Image version to deploy')
    parser.add_argument('--dry-run', action='store_true', help='Preview changes without applying')
    parser.add_argument('--region', default='eu-west-1')

    args = parser.parse_args()

    if args.dry_run:
        print(f"[DRY RUN] Would deploy {args.version} to {args.environment} in {args.region}")
        return

    deploy(args.environment, args.version, args.region)

if __name__ == '__main__':
    main()

The if name == ‘main’ guard means the script only runs when called directly, not when imported as a module. This is the standard pattern and makes the script testable.