How to Build a Cloud Monitoring Portfolio Project

Monitoring portfolio projects are rare in junior portfolios and disproportionately impressive when they exist. Most engineers build systems without adding observability — so demonstrating that you think about what happens after deployment signals a maturity that stands out. This guide covers how to build a monitoring project that goes beyond a “Hello World” dashboard.

Why a monitoring project signals seniority

Anyone can provision an EC2 instance or write a Lambda function. Fewer people understand how to observe a system once it is running. A monitoring project demonstrates:

You think about systems in operation, not just at deployment
You understand the difference between a metric, a log, and a trace
You can define what “working” means before you can define “broken”
You understand on-call realities — alerts that fire too often are as bad as no alerts

This thinking is expected at mid-level and senior cloud and SRE roles. Having evidence of it at junior level is a genuine advantage.

What to build: choose your tool based on your target role

The tool choice should match your target role:

Tool stack	Best for
Prometheus + Grafana (self-hosted)	SRE, DevOps, and platform engineering roles
CloudWatch (AWS)	AWS-specialist cloud engineering roles
Cloud Monitoring + Cloud Logging (GCP)	GCP-specialist cloud engineering roles
Datadog or New Relic (managed)	Companies using SaaS observability platforms

Prometheus and Grafana are the most portable and the most commonly expected knowledge in interviews — even at companies using Datadog, understanding what Prometheus does is expected. If you are not sure which to choose, start with Prometheus and Grafana.

Building with Prometheus and Grafana

What to run Prometheus on

Deploy Prometheus and Grafana either:

On a single VM (a t2.micro/e2-micro is sufficient for a portfolio project)
Inside a Kubernetes cluster using the kube-prometheus-stack Helm chart (the better option if you already have a Kubernetes project)

The Kubernetes deployment is more complex but demonstrates both Kubernetes and observability skills simultaneously. It is also the production-realistic approach — most companies running Kubernetes use the kube-prometheus-stack.

What to monitor

Monitor an application that you have already built (a REST API, a serverless function, or a web service). At minimum, instrument the application to expose:

Request rate (requests per second by endpoint and HTTP method)
Error rate (rate of 4xx and 5xx responses)
Request latency (p50, p95, p99 percentiles)

These three — request rate, error rate, latency — are the RED method. The RED method is the standard framework for monitoring request-driven services and is a common interview topic.

Additionally, collect system-level metrics using the node_exporter (for VMs) or kube-state-metrics (for Kubernetes): CPU usage, memory usage, disk I/O.

Building dashboards

Create at least two Grafana dashboards:

An application dashboard: request rate, error rate, and latency panels using the RED method
A system dashboard: CPU, memory, and disk panels for the infrastructure running the application

Define dashboards as code using Grafana’s JSON export or a tool like Grafonnet or Terraform’s Grafana provider. Do not rely on dashboards configured manually through the UI — they cannot be reproduced from the repository.

Writing real alerting rules

The alerting section is what most monitoring portfolio projects get wrong. Generic rules like “alert when CPU exceeds 90% for 5 minutes” are better than nothing, but they show no understanding of what the alert means in context.

A good alerting rule has three parts:

Condition — the metric expression and threshold
Duration — how long the condition must be true before alerting (avoids flapping)
Severity and runbook — how urgent is this? What should a human do?

Define at least three alerting rules. For each one, document:

Why you chose that threshold (5% error rate rather than 1% or 20%?)
Why you chose that duration window (5 minutes rather than 1 minute?)
What action the alert requires from an on-call engineer

This runbook documentation is the part that will actually impress an SRE or senior engineer reviewing your portfolio. It shows you understand that alerts are not just technical — they require a human response, and that response should be defined in advance.

Adding centralised logging

Metrics tell you something is wrong. Logs tell you why. Add centralised logging to make this a fuller observability project:

With Prometheus/Grafana: add Loki for log aggregation and configure it as a Grafana data source
With AWS: ship application logs to CloudWatch Logs and create a metric filter that counts error log lines
With GCP: use Cloud Logging with a log-based metric and alert

At minimum, configure structured logging in your application (JSON output rather than plain text lines) and create one log-based metric or search query that surfaces error patterns.

Everything as code

Your monitoring setup should be deployable from the repository. This means:

Prometheus configuration files in version control
Alert rules defined as code (Prometheus alerting rules in YAML)
Grafana dashboards exported as JSON and committed to the repository
The entire stack provisioned with Terraform or a Helm chart

A monitoring setup that requires clicking through a Grafana UI to recreate is not portfolio-ready.

What to document in the README

Why you chose Prometheus/Grafana (or CloudWatch, or another tool) for this project
The three alerting rules and the reasoning behind each threshold and duration
The runbook action for each alert — what a human should do when it fires
How to deploy the full stack from scratch using the repository
What you would add in a production environment: longer metric retention, alertmanager routing to PagerDuty or Slack, distributed tracing with OpenTelemetry, SLO-based alerting

For guidance on combining this with a Kubernetes project, see how to build a Kubernetes portfolio project.