Showing posts with the label DevOps

Fixing Nginx TIME_WAIT Socket Exhaustion: A Kernel Tuning Guide

We recently diagnosed a production outage during a traffic spike where the monitoring dashboards showed CPU utilization hovering at 15% and ample fr…
Fixing Nginx TIME_WAIT Socket Exhaustion: A Kernel Tuning Guide

Production Docker: Dropping Alpine for Distroless to kill CVEs and bloated layers

If you run a vulnerability scan on your "slim" production images right now, the results might terrify you. I recently audited a fleet of …
Production Docker: Dropping Alpine for Distroless to kill CVEs and bloated layers

Prometheus Storage Full? Scaling to S3 with Thanos Sidecar

It started with a classic paging alert at 3:14 AM: DiskUsageHigh: 95% on prometheus-data . We were running a standard Prometheus setup on Kubernete…
Prometheus Storage Full? Scaling to S3 with Thanos Sidecar

Prometheus HA: De discos llenos a retención infinita con Thanos Sidecar

Hace dos semanas, nuestro clúster de producción en Kubernetes (v1.28, ejecutándose sobre instancias AWS m5.xlarge) disparó una alerta crítica a las…
Prometheus HA: De discos llenos a retención infinita con Thanos Sidecar
OlderHomeNewest