Production Hardening for EKS Observability (Part 5 of 5)
The final layer: seven production alert rules in Amazon Managed Prometheus, SNS and PagerDuty routing, fully private clusters via VPC endpoints, and a go-live checklist.
Hard-won lessons in cloud architecture, DevOps, and infrastructure at scale.
The final layer: seven production alert rules in Amazon Managed Prometheus, SNS and PagerDuty routing, fully private clusters via VPC endpoints, and a go-live checklist.
How the tracing module works: an OpenTelemetry Collector on every node, tail-based sampling that always keeps errors and slow requests, and IRSA-scoped export to X-Ray.
How the logging module works: Fluent Bit on every node, one variable to switch CloudWatch Logs or OpenSearch, IRSA-scoped shipping, and a real parsing pipeline.
Most AWS accounts run EC2 instances 2-4x larger than needed. AWS Compute Optimizer tells you which ones. Here's how to read the recommendations and act safely.
How the metrics module works: AMP workspace, AMG with SSO, IRSA-scoped ingestion, pre-built dashboards, and the two-phase Terraform apply that ties it together.
A modular, production-grade Terraform stack for EKS observability. Metrics, logs, traces, alerting. All AWS-native. All open source. No static credentials.
Real market rates for AWS consulting in Australia in 2026, broken down by engagement type, seniority, and scope. No vague ranges. No sales pitch.
A hands-on guide to deploying your first production-ready Kubernetes cluster on AWS using EKS, including networking, IAM, and best practices.
Discover how AWS ECR Pull Through Cache allows you to securely access container images in isolated networks without direct internet connectivity.
Learn how to build secure, daemonless, and rootless Docker containers using Buildah within AWS CodeBuild.
Practical AWS notes, platform engineering deep dives, and infrastructure strategy sent straight to your inbox.
No Spam. Only Signal.