How many panels should my on-call dashboard have?

Aim for 6–8 per service. If you can’t decide which to drop, you haven’t defined SLOs tightly enough. Everything else belongs on an “Explore” dashboard, not the on-call view.

What if we don’t have clean metrics yet?

Start by instrumenting SLIs via `OpenTelemetry` or your SDKs. You can still set up burn-rate alerts on existing request/error counters and tail latency histograms while you refine semantics.

Won’t we miss context if we cut charts?

You won’t if you add drill-down links to logs, traces, and dependency dashboards. Keep deep-dive dashboards separate; the on-call view is about decisions, not exploration.

How do we pick SLO targets?

Use historical performance and business tolerance. Start with conservative targets (e.g., 99.9% availability, p99 < 400ms for critical flows), then iterate quarterly with product on user impact and cost.

Can we do this without Kubernetes?

Yes. The approach is platform-agnostic. You can drive the same outcomes with VMs and a deployment tool that supports health checks plus Prometheus scraping and feature flags.

Reliability-observability · Oct 1, 2025 · 10 minute read

The Dashboard Diet: Fewer Charts, Clear Thresholds, Faster Decisions

Q: Can we do this without Kubernetes?

Yes. The approach is platform-agnostic. You can drive the same outcomes with VMs and a deployment tool that supports health checks plus Prometheus scraping and feature flags.

Turn your dashboards from museum walls into decision systems. Cut the noise, surface leading indicators, and wire telemetry to triage and rollouts.

Back to all posts

The Dashboard Diet: Fewer Charts, Clear Thresholds, Faster Decisions

Key takeaways

Implementation checklist