Python for Cloud Engineers: Automation, SDKs, and Serverless
Python is the most widely used language for cloud automation. This page is not a Python tutorial — it covers the specific patterns, libraries, and habits that make Python useful for cloud engineering work: interacting with cloud APIs, writing serverless functions, processing data, and automating repetitive tasks.
Python versus bash: when to use which
Both Python and bash are valid tools for cloud automation. The decision is mostly about complexity and maintainability.
- Reach for bash when: chaining together CLI commands, writing a quick deploy script, working with environment variables and file paths, gluing tools together
- Reach for Python when: calling cloud APIs, parsing JSON or YAML, handling error cases properly, writing something others will need to read and maintain, building anything non-trivial
In practice, Python is the better choice for most cloud automation because cloud SDKs have first-class Python support, the error handling is better than bash, and Python code is easier to test. Bash is still the right choice for simple pipeline steps.
boto3: the AWS SDK for Python
boto3 is the official AWS SDK for Python. It gives you a Python interface to every AWS service. The two main ways to use it are clients (low-level, returns raw API responses as dictionaries) and resources (higher-level, object-oriented wrapper around common services).
import boto3
# List S3 buckets
s3 = boto3.client('s3')
response = s3.list_buckets()
for bucket in response['Buckets']:
print(bucket['Name'])
# Upload a file
s3.upload_file('local-file.txt', 'my-bucket', 'path/in/bucket/file.txt')
# Read a file from S3
obj = s3.get_object(Bucket='my-bucket', Key='path/in/bucket/file.txt')
content = obj['Body'].read().decode('utf-8')A pattern you will use constantly: paginating results. AWS API calls return a maximum number of results per request and a continuation token if there are more. The correct way to handle this is with paginators, not by hardcoding a limit:
# Wrong — only gets first page of results
response = s3.list_objects_v2(Bucket='my-bucket')
objects = response['Contents'] # might be incomplete
# Correct — handles pagination automatically
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='my-bucket')
objects = [obj for page in pages for obj in page.get('Contents', [])]Google Cloud SDK for Python
GCP’s Python libraries follow a similar pattern to boto3 but with a more modern, consistent API. The libraries are split by service: google-cloud-storage, google-cloud-bigquery, google-cloud-pubsub, and so on.
from google.cloud import storage
# List buckets
client = storage.Client()
buckets = list(client.list_buckets())
for bucket in buckets:
print(bucket.name)
# Upload a file to GCS
bucket = client.bucket('my-bucket')
blob = bucket.blob('path/in/bucket/file.txt')
blob.upload_from_filename('local-file.txt')
# Download a file
blob.download_to_filename('downloaded-file.txt')Authentication in GCP Python code follows Application Default Credentials (ADC). When running locally with gcloud auth application-default login, the SDK picks up your credentials automatically. In production (Cloud Run, GKE, etc.), the service account attached to the resource is used. You rarely need to handle credentials explicitly.
Writing Lambda and Cloud Functions
Serverless functions (AWS Lambda, GCP Cloud Functions, Azure Functions) are Python functions with a specific entry point signature. The function receives an event payload and a context object, does its work, and returns a response.
import json
import boto3
import os
# Lambda function that processes S3 events
def handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3 = boto3.client('s3')
try:
obj = s3.get_object(Bucket=bucket, Key=key)
data = json.loads(obj['Body'].read())
# Process the data
result = process_data(data)
# Write output
s3.put_object(
Bucket=os.environ['OUTPUT_BUCKET'],
Key=f"processed/{key}",
Body=json.dumps(result),
ContentType='application/json'
)
return {'statusCode': 200, 'body': 'Success'}
except Exception as e:
print(f"Error processing {key}: {e}")
raise # Re-raise so Lambda marks the invocation as failed
def process_data(data):
# Your business logic here
return {'processed': True, 'count': len(data)}Key things to get right with serverless Python functions:
- Use environment variables (
os.environ) for configuration — never hardcode bucket names, table names, or API endpoints - Re-raise or propagate exceptions — swallowing errors silently makes debugging very hard
- Keep dependencies minimal — Lambda has a size limit, and large packages slow cold starts
- Do not put credentials in the function code — use the execution role’s permissions
Working with JSON and YAML
Cloud engineering involves a lot of JSON and YAML — API responses, configuration files, Terraform outputs, Kubernetes manifests. Python handles both cleanly.
import json
import yaml
# JSON
data = {'name': 'my-app', 'version': '1.0.0', 'replicas': 3}
# Serialize to string
json_string = json.dumps(data, indent=2)
# Deserialize from string
parsed = json.loads(json_string)
# Read/write files
with open('config.json', 'w') as f:
json.dump(data, f, indent=2)
with open('config.json', 'r') as f:
loaded = json.load(f)
# YAML (requires: pip install pyyaml)
with open('manifest.yaml', 'r') as f:
manifest = yaml.safe_load(f) # use safe_load, not load
# Modify and write back
manifest['spec']['replicas'] = 5
with open('manifest-updated.yaml', 'w') as f:
yaml.dump(manifest, f, default_flow_style=False)Always use yaml.safe_load rather than yaml.load. The unsafe version allows arbitrary Python object deserialisation from YAML, which is a security risk if you are processing YAML from untrusted sources.
Environment variables and configuration patterns
Cloud applications read their configuration from environment variables. Python has a simple pattern for this, but getting it right matters for both security and reliability.
import os
from dataclasses import dataclass
@dataclass
class Config:
database_url: str
bucket_name: str
environment: str
debug: bool = False
def load_config() -> Config:
# os.environ[key] raises KeyError if missing — good for required vars
# os.environ.get(key, default) returns a default — good for optional vars
return Config(
database_url=os.environ['DATABASE_URL'],
bucket_name=os.environ['BUCKET_NAME'],
environment=os.environ.get('ENVIRONMENT', 'production'),
debug=os.environ.get('DEBUG', 'false').lower() == 'true',
)
config = load_config()Using os.environ[key] (not .get) for required variables means the application fails immediately on startup with a clear error if the variable is missing — rather than failing later in a confusing way when it first tries to use the value.
Portfolio project idea: Write a Python script that uses boto3 or the GCP SDK to query cloud resources and produce a report — for example, list all EC2 instances across all regions and their costs, or find all S3 buckets that have public access enabled. This kind of script is genuinely useful and demonstrates real cloud SDK knowledge.
CLI scripting with argparse
Cloud automation scripts often need to accept arguments from the command line. Python’s built-in argparse module handles this cleanly and automatically generates help text.
import argparse
def main():
parser = argparse.ArgumentParser(
description='Deploy application to a cloud environment'
)
parser.add_argument('environment', choices=['dev', 'staging', 'production'])
parser.add_argument('--version', required=True, help='Image version to deploy')
parser.add_argument('--dry-run', action='store_true', help='Preview changes without applying')
parser.add_argument('--region', default='eu-west-1')
args = parser.parse_args()
if args.dry_run:
print(f"[DRY RUN] Would deploy {args.version} to {args.environment} in {args.region}")
return
deploy(args.environment, args.version, args.region)
if __name__ == '__main__':
main()The if name == ‘main’ guard means the script only runs when called directly, not when imported as a module. This is the standard pattern and makes the script testable.
Summary
- Python is the primary language for cloud automation — prefer it over bash for anything non-trivial
- boto3 and the GCP Python SDK have consistent patterns; learn pagination to avoid truncated results
- Serverless functions need correct error handling, environment variables for config, and no hardcoded credentials
- Use
os.environ[key](not.get) for required config values so failures happen at startup - Always use
yaml.safe_load— the unsafe version is a security risk