Dashboards Developers Don’t Hate: A Paved Road for DX Metrics That Actually Moves the Needle
You don’t need 80 panels and a data-wrangling PhD. Here’s the simple, trustworthy DX dashboard that improves MTTR, lead time, and satisfaction—with real wiring, not bespoke science projects.
If your DX dashboard requires a runbook to read, it’s not measuring developer experience—it’s creating more toil.Back to all posts
The dashboard that lied to us
A few years back, I walked into a unicorn where every wall had a dashboard and every developer had two MacBooks. All the graphs were green. Meanwhile, PRs were marinating for days, CI was a roulette wheel, and onboarding took six weeks. The “DX dashboard” was a bespoke Python ETL pulling half-broken CSVs into a BI tool, maintained by one wizard who’d long since moved on. Leadership swore developer experience was improving because deploys per day were up. Turned out half were rollbacks.
I’ve seen this movie at banks, ad-tech, and hot AI startups. The pattern: too many panels, no shared definitions, and custom plumbing nobody wants to touch. Here’s what actually works: a paved road with a handful of outcome metrics, wired to your existing systems, versioned like code, reviewed like a product.
What to measure: outcomes, not activity
You don’t need to reinvent metrics. Blend DORA (delivery outcomes) with SPACE (human experience) and stop there.
- Lead time for changes (DORA):
PR opened -> merged -> deployed to prod
median and p90. - Deployment frequency (DORA): successful prod releases per day/week.
- Change failure rate (DORA): percentage of prod changes that trigger rollback/incident.
- MTTR (DORA): median time from incident start to resolved.
- PR cycle time (SPACE-ish): open-to-merge, broken down into “time to first review” and “time in review”.
- CI duration & flaky test rate: median pipeline time; percentage of re-runs that pass without code changes.
- DX pulse (SPACE): monthly 5-question survey: flow, cognitive load, tooling satisfaction, environment stability, and
eNPS
.
Rules I’ve learned the hard way:
- Keep it to 7–10 metrics. Add one only if you delete one.
- No vanity counts: lines of code, story points, commit counts—delete on sight.
- Define events at the edges of the path to prod:
PR opened
,first review
,merged
,deploy started/finished
,incident created/resolved
. - Tag by team/service using your service catalog (
Backstage
orOpsLevel
). Dashboards without ownership context turn into museum pieces.
The paved-road architecture (no bespoke science projects)
Favor boring, proven components. Your platform team should be curators, not tool inventors.
- Display:
Grafana
(OSS/Cloud) orLooker Studio
if your org lives in Google. One URL. Pinned to Slack. - Storage:
BigQuery
orClickHouse Cloud
. Use native connectors; avoid hand-rolled ETL. - Sources:
GitHub
/GitLab
for PRs/reviews/merges.CircleCI Insights
/GitHub Actions
/GitLab CI
for pipeline timing and flake.ArgoCD
/Spinnaker
/Octopus
for deploys.PagerDuty
/Incident.io
for incidents and MTTR.Sentry
/Honeycomb
for change-failure signals (release adoption, error spikes).
- Identity & taxonomy: map repos/services to teams via
Backstage
catalog or a simpleservices.yaml
. - IaC: define dashboards via
terraform
and version them inplatform/observability
.
If your metrics pipeline requires a runbook to understand, it’s not a paved road. It’s a toll road.
Minimal wiring beats perfect data. You can get to “usefully actionable” in a week without a data lake.
- Use vendor APIs on a nightly schedule first; stream later if needed.
- Store raw events with a
team
andservice
tag. Derive metrics with SQL views. - Publish definitions in the repo. If someone can’t reproduce a number from SQL, don’t show it.
Implementation in a week: the seven-panel starter
Here’s the starter pack we deploy at GitPlumbers. No yak shaving required.
- Pick the surface:
Grafana Cloud
andBigQuery
are the fastest path for most. - Create a
services.yaml
mapping repos to services and teams. - Ingest events using vendor connectors or small scheduled jobs:
GitHub
GraphQL for PR timestamps and reviews.CircleCI Insights
orGitHub Actions
API for pipeline duration and re-runs.ArgoCD
application sync webhooks for deploys.PagerDuty
Incidents API for MTTR.Sentry
Releases API for change failure signals.
- Define SQL views for the core metrics:
-- BigQuery: PR cycle time p50/p90 by day
CREATE OR REPLACE VIEW dx.pr_cycle_time AS
SELECT
DATE(pr.created_at) AS day,
ANY_VALUE(repo) AS repo,
APPROX_QUANTILES(TIMESTAMP_DIFF(pr.merged_at, pr.created_at, HOUR), 100)[OFFSET(50)] AS p50_hours,
APPROX_QUANTILES(TIMESTAMP_DIFF(pr.merged_at, pr.created_at, HOUR), 100)[OFFSET(90)] AS p90_hours
FROM dx.github_pull_requests pr
WHERE pr.base_ref = 'main'
AND pr.merged_at IS NOT NULL
GROUP BY day, repo;
-- Change failure rate: deploys that triggered an incident within 2 hours
CREATE OR REPLACE VIEW dx.change_failure_rate AS
WITH deploys AS (
SELECT d.id, d.service, d.deployed_at
FROM dx.deploy_events d
),
incidents AS (
SELECT i.service, i.started_at
FROM dx.pagerduty_incidents i
)
SELECT
d.service,
COUNTIF(EXISTS(
SELECT 1 FROM incidents i
WHERE i.service = d.service
AND i.started_at BETWEEN d.deployed_at AND TIMESTAMP_ADD(d.deployed_at, INTERVAL 2 HOUR)
)) / COUNT(*) AS failure_rate
FROM deploys d
GROUP BY d.service;
- Codify the dashboard with Terraform (use prebuilt JSON):
resource "grafana_dashboard" "dx_starter" {
config_json = file("dashboards/dx_starter.json")
folder = grafana_folder.platform.id
}
- Add the monthly DX pulse: five Likert questions in
Typeform
orGoogle Forms
. Pipe results intoBigQuery
via native connectors, keyed by team. - Publish the README: definitions, data lineage, and a “how to reproduce” query for each panel.
The seven panels we ship:
- PR cycle time (p50/p90) with a “time to first review” split.
- Review response time (hours until first comment/approval).
- CI duration p50/p90 and flaky rate (% of re-runs that pass).
- Deployment frequency per service per day.
- Change failure rate (deploys that triggered incidents/rollbacks).
- MTTR p50 by service and team.
- DX pulse:
eNPS
and satisfaction with code reviews/tooling (monthly).
Before/after: real numbers, real trade-offs
At a Series C fintech with ~70 engineers, the first pass was a massacre of complexity. They had:
- Three dashboards (Datadog, Grafana, internal BI) with conflicting numbers.
- A bespoke Python ETL running on a snowflake EC2 box.
- 40+ panels, none owned, none trusted.
We replaced it with the paved road above. Two sprints later:
- PR p50 cycle time dropped from 26h -> 12h after instituting a reviewer SLA and Slack nudges.
- CI p90 went from 38m -> 17m after caching and test parallelization; flaky rate from 12% -> 3%.
- Deployment frequency doubled simply by moving to trunk-based and cutting long-lived branches.
- MTTR improved from 140m -> 55m after adding
runbook_url
andowner
to the service catalog. - DX eNPS moved from -10 -> +12 in two months.
- Infra spend went down because we killed the custom ETL EC2 and used
BigQuery
scheduled queries (approx $150/month net).
What we didn’t do:
- No Kafka. No stream processors. No custom front-ends. We resisted the siren song of “real-time everything.” Nightly is fine for most orgs.
- No scorecards for individual engineers. Team-level only. You want trust, not Hunger Games.
Another example: a public SaaS with 300+ engineers had dashboards everywhere. We consolidated to a single Grafana
home with seven panels and tied the “time to first review” SLO to manager goals. Result: p50 first review time fell from 9h -> 2.5h in three weeks. No new tooling. Just visibility + Slack reminders with /remind
and ownership mapping via Backstage
.
Dashboard design principles that keep you honest
Make it boring. Make it obvious. Make it actionable.
- Single source of truth: one URL in Slack bookmarks. Everything else points there.
- Team slices by default: top-level filters for
team
,service
,repo
. No more averages across the org. - p50/p90 always: medians for sanity, p90 to expose pain. Means lie in fat-tailed workflows.
- Annotations: show releases, policy changes, and incidents right on charts to tell the story.
- Link to action: each panel links to the backlog query (e.g., “PRs waiting >24h”), runbook, or the “Fix Flaky Tests” epic.
- Redlines as SLOs: draw thresholds: “PR cycle p50 under 12h”, “CI p90 under 20m.”
- Ownership: every panel has an owner and a metric definition file. Broken numbers page the owner (lightly).
Anti-patterns I’ve seen (and killed):
- Gamifying metrics (leaderboards) → people optimize the number, not the outcome.
- Activity masquerading as outcomes: commit counts, story points, PRs/week.
- Unverifiable transforms: hidden math in BI tools with no code review.
- Bespoke ETL that only one person understands. When they leave, your dashboard does too.
- Mixing prod and non-prod deploys. It will inflate your “frequency” and fool you.
Rollout and keeping it alive
Dashboards decay without ritual. Treat DX like product, not poster art.
- Weekly 30-min review: Eng + PM + SRE. Look at seven panels, pick one experiment (e.g., enable
actions/cache
, tightenCODEOWNERS
, addmerge queue
). - Monthly DX pulse: share results org-wide with one concrete follow-up. Keep surveys to five questions; rotate one.
- Quarterly hygiene: prune a panel for every new one, revalidate definitions, rotate the panel owner.
- Slack integration: post top deltas to
#dx-updates
and nudge reviewers when PRs breach SLOs. Use bots you already have (GitHub
,Linear
,Jira
). - Roadshow: 15-minute demos at team meetings showing how the dashboard answers real questions.
If you’re starting from a dashboard zoo, begin by turning monitors off. Ship the paved road, then re-add only what earns its place.
At GitPlumbers, we usually pair with a platform lead for two weeks to stand up the starter pack, then coach teams through three improvement cycles. It’s not magic—just ruthless simplicity, good definitions, and steady cadence.
Key takeaways
- Start with a paved-road dashboard: one place, seven panels, wired to your existing tools.
- Measure outcomes (DORA + SPACE), not activity. Keep it under 10 metrics.
- Instrument the path to production before you survey feelings—then correlate the two.
- Avoid bespoke ETL. Prefer native APIs and turnkey connectors; codify definitions in Terraform.
- Review the dashboard weekly with Eng + Product; kill vanity panels ruthlessly.
- Use simple before/after experiments (flags, CI cache, review SLAs) to prove value.
Implementation checklist
- Pick one display surface (Grafana or Looker Studio) and one storage (BigQuery or ClickHouse).
- Define seven starter metrics: PR cycle time, review response time, CI duration, flaky test rate, deploy frequency, MTTR, change failure rate.
- Create a monthly 5-question DX pulse (SPACE-aligned) with `eNPS` and map to teams.
- Wire data via native APIs: `GitHub`, `GitLab`, `CircleCI Insights`, `ArgoCD`, `PagerDuty`, `Sentry`—no custom scrapers.
- Codify dashboards with `terraform` and version them in `platform/observability` repo.
- Publish a README with metric definitions, SLOs, and alert thresholds.
- Run a 30-minute weekly review and a monthly retro to prune panels and pick one improvement experiment.
Questions we hear from teams
- How often should we update and review the dashboard?
- Nightly is enough for most teams. Add near-real-time only for deploy frequency and incident panels if you truly need it. Run a weekly 30-minute review to pick one experiment, and a monthly DX pulse to capture sentiment.
- How do we avoid Goodhart’s Law?
- Use outcomes, not activity metrics. Track p50 and p90, slice by team, and pair numbers with qualitative feedback from the DX pulse. Never tie individual compensation to these metrics; use them to guide system improvements.
- What if we’re a small team without BigQuery or Grafana?
- Use `Looker Studio` with `Google Sheets` connectors for a scrappy start, or `Grafana Cloud Free` with `GitHub` and `CircleCI Insights`. Keep the seven panels and codify definitions in the repo README until you grow into IaC.
- How do we calculate change failure rate without perfect incident data?
- Use deploy-triggered rollbacks, `Sentry` release error spikes, or `feature-flag off` events as proxies. Start with a simple two-hour window after deploy; refine later as incident hygiene improves.
- Should we track individual engineer metrics?
- No. Team-level metrics drive systemic improvements. Individual scorecards breed sandbagging and fear, which worsens delivery and satisfaction.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.