Why not just use CPU or memory for autoscaling?

Because CPU% and RSS are lagging and often misleading. They don’t capture contention (throttling, run-queue) or backpressure (queue growth, lag slope). Scale on signals tied to service time and concurrency like active requests, queue depth per pod, or consumer lag.

Do we need ML to forecast demand?

No. A simple model (Little’s Law + linear fit + Holt-Winters seasonality) usually beats black-box ML because it’s explainable and debuggable in incident reviews. Add complexity only when residuals demand it.

How do we avoid flapping when scaling on custom metrics?

Use stabilization windows, minimum replicas, and conservative policy steps. Smooth with 1–5 minute rates, and clamp on known saturation cliffs. Validate in staging with a small load test.

What about databases? Autoscaling won’t save us there.

Correct. DBs are capacity-planned, not autoscaled. Watch connection pool saturation, lock waits, and storage IOPS headroom. Pre-scale replicas, partition hot tables, and warm caches before peak windows. Tie app-level backpressure to protect the DB.

Reliability-observability · Oct 1, 2025 · 9 minute read

Capacity Planning That Doesn’t Lie: Predict Scale With Leading Indicators, Not Dashboards

If your model needs a slide deck to explain, it won’t save your on-call. Here’s the lean blueprint we use to predict capacity needs and wire telemetry into autoscaling, triage, and rollouts that don’t blow up at 2 a.m.

Back to all posts

Capacity Planning That Doesn’t Lie: Predict Scale With Leading Indicators, Not Dashboards

Key takeaways

Implementation checklist