Showing posts with the label DevOps

Fixing Nginx TIME_WAIT Socket Exhaustion: A Kernel Tuning Guide

We recently diagnosed a production outage during a traffic spike where the monitoring dashboards showed CPU utilization hovering at 15% and ample free memory, yet the Nginx API gateway actively reje…
Fixing Nginx TIME_WAIT Socket Exhaustion: A Kernel Tuning Guide

Production Docker: Dropping Alpine for Distroless to kill CVEs and bloated layers

If you run a vulnerability scan on your "slim" production images right now, the results might terrify you. I recently audited a fleet of microservices running on standard debian:bullseye-…
Production Docker: Dropping Alpine for Distroless to kill CVEs and bloated layers

Prometheus Storage Full? Scaling to S3 with Thanos Sidecar

It started with a classic paging alert at 3:14 AM: DiskUsageHigh: 95% on prometheus-data . We were running a standard Prometheus setup on Kubernetes, collecting metrics from about 400 microservices…
Prometheus Storage Full? Scaling to S3 with Thanos Sidecar

Prometheus HA: De discos llenos a retención infinita con Thanos Sidecar

Hace dos semanas, nuestro clúster de producción en Kubernetes (v1.28, ejecutándose sobre instancias AWS m5.xlarge) disparó una alerta crítica a las 3:00 AM: DiskPressure en el nodo que alojaba nue…
Prometheus HA: De discos llenos a retención infinita con Thanos Sidecar
OlderHomeNewest