From the Field

Hard-won lessons in cloud architecture, DevOps, and infrastructure at scale.

// INFO_POST 08/06/26

Production Hardening for EKS Observability (Part 5 of 5)

The final layer: seven production alert rules in Amazon Managed Prometheus, SNS and PagerDuty routing, fully private clusters via VPC endpoints, and a go-live checklist.

Read Post

// INFO_POST 08/06/26

EKS Distributed Tracing: OpenTelemetry + X-Ray (Part 4 of 5)

How the tracing module works: an OpenTelemetry Collector on every node, tail-based sampling that always keeps errors and slow requests, and IRSA-scoped export to X-Ray.

Read Post

// INFO_POST 07/06/26

EKS Logging: Fluent Bit, CloudWatch vs OpenSearch (Part 3 of 5)

How the logging module works: Fluent Bit on every node, one variable to switch CloudWatch Logs or OpenSearch, IRSA-scoped shipping, and a real parsing pipeline.

Read Post

// INFO_POST 19/05/26

EC2 Rightsizing: Stop Paying for Compute You Don't Use

Most AWS accounts run EC2 instances 2-4x larger than needed. AWS Compute Optimizer tells you which ones. Here's how to read the recommendations and act safely.

Read Post

// INFO_POST 14/04/26

Amazon Managed Prometheus + Grafana on EKS (Part 2 of 5)

How the metrics module works: AMP workspace, AMG with SSO, IRSA-scoped ingestion, pre-built dashboards, and the two-phase Terraform apply that ties it together.

Read Post

// INFO_POST 10/04/26

Metrics, Logs & Traces on EKS with Terraform (Part 1 of 5)

A modular, production-grade Terraform stack for EKS observability. Metrics, logs, traces, alerting. All AWS-native. All open source. No static credentials.

Read Post