DX Dashboards Developers Trust: Paved‑Road Metrics Without the Surveillance Creep
Stop guessing. Ship a dashboard that shows where your developers actually lose time—using the tools you already run (Git provider, CI, ArgoCD, Prometheus, Grafana). No bespoke data lake, no creepy tracking.
If your DX dashboard requires a PhD to explain, it's a vanity mirror, not an instrument panel.Back to all posts
We kept guessing. The dashboard stopped the guessing.
I’ve watched too many orgs debate “developer productivity” in Slack like it’s astrology. One client (70 devs, B2B SaaS) had engineers swearing that code reviews were the bottleneck. Ops said CI was slow. Product blamed “shiny-toy refactors.” Nobody had numbers; just vibes.
We shipped a DX dashboard in a week using their existing stack: GitHub, GitHub Actions, ArgoCD, Prometheus, and Grafana. No new SaaS. No IDE spyware. The data embarrassed us all: p95 CI duration was 14 minutes with a 9% flake rate on two test suites. PR idle time, not review time, was the real drag—PRs sat unassigned for 11 hours on average. After we focused on those two, everything else got easier.
Point: if your dashboard isn’t showing where time actually burns, your roadmap is luck and politics. Build the instrument panel; then fly the plane.
Measure the work, not the workers
Keep it simple. If you can’t explain it in one sentence to a staff engineer, it’s a vanity metric.
- Cycle time (first commit to prod): “How long until value hits prod?”
- PR idle time: “How long between PR ready-for-review and first human touch?”
- CI p95 duration: “How long do devs wait for a green?” p95, not average.
- CI flake rate: “How often do tests fail for non-deterministic reasons?”
- Deploy frequency: “How often do we ship?” Per service and per team.
- MTTR (prod incidents): “How long to recover?”
- Monthly ENPS + 2 questions: “Would you recommend this dev environment?” with free-text: “What slowed you down last sprint?” and “What should we stop?”
What to skip (ask me how I know):
- Keystrokes/IDE telemetry/screen time. That’s surveillance theater. It’ll burn trust and tell you nothing useful.
- Story points and commit counts. Theater 2.0. Correlates poorly with value.
- Funnel pages of charts. One page, RAG thresholds, per-team filters. That’s it.
Ship a dashboard in a week with what you already own
Here’s a paved-road setup I’ve used repeatedly. No greenfield data lake. Minimal yak-shaving.
Pull Git provider events
Use
gh(or GitLab API) to extract PRs and reviews nightly. Land raw JSON in object storage or a warehouse.# GitHub PRs (closed/merged) with pagination gh api -X GET \ repos:OWNER:REPO/pulls \ -F state=closed --paginate \ -H "Accept: application/vnd.github+json" \ | jq -c '{ number: .number, author: .user.login, created_at, merged_at, closed_at, draft: .draft, ready_for_review_at: (.requested_reviewers[0]?.created_at // .created_at), labels: [.labels[].name] }' > prs.ndjson # Reviews (first-touch timestamps) gh api -X GET repos:OWNER:REPO/pulls/{pull_number}/reviews --paginate \ | jq -c '{pull: .pull_request_url, reviewer: .user.login, submitted_at}' > reviews.ndjsonScrape CI and deployment metrics
- GitHub Actions: run
actions-exporterto Prometheus. Jenkins/CircleCI have exporters too. - ArgoCD exposes
/metricswith sync status and durations. - Ship all into Prometheus and label by
repo,workflow,service.
Example PromQL for CI p95 and flake rate:
# CI p95 duration by workflow (last 24h) histogram_quantile(0.95, sum by (workflow, le) (rate(ci_workflow_duration_seconds_bucket[24h])) ) # CI flake rate: reruns that succeed after a failure within 24h sum by (workflow) (increase(ci_job_reruns_total[24h])) / sum by (workflow) (increase(ci_jobs_total[24h]))ArgoCD deploy frequency and MTTR (paired with incident labels):
# Successful syncs per service per day sum by (app) (increase(argocd_app_sync_total{phase="Succeeded"}[1d])) # MTTR if you expose incident start/end as counters rate(incidents_resolved_seconds_sum[7d]) / rate(incidents_resolved_seconds_count[7d])- GitHub Actions: run
Transform once, in SQL
Drop JSON into your warehouse (BigQuery, Postgres, or ClickHouse). Define metric contracts with
dbtso changes are versioned.Example
dbtmodel for PR lead time and idle time in BigQuery:-- models/pr_metrics.sql with prs as ( select repo, number, author, timestamp(created_at) as created_at, timestamp(merged_at) as merged_at, timestamp(ready_for_review_at) as rfr_at from raw.github_prs where merged_at is not null ), first_review as ( select pull_number as number, min(timestamp(submitted_at)) as first_review_at from raw.github_reviews group by 1 ) select p.repo, p.number, p.author, datetime_diff(p.merged_at, p.created_at, hour) as lead_time_hours, datetime_diff(fr.first_review_at, p.rfr_at, hour) as pr_idle_hours from prs p left join first_review fr using(number)Visualize in Grafana
- Hook Prometheus + warehouse datasource.
- One dashboard, four rows: Flow (cycle time), Code Review (idle), CI (p95 + flake), Delivery (deploys + MTTR).
- Use thresholds to flag regressions.
Example panel (JSON excerpt) for CI p95 with thresholds:
{ "title": "CI p95 by workflow", "type": "timeseries", "targets": [{ "expr": "histogram_quantile(0.95, sum by (workflow, le) (rate(ci_workflow_duration_seconds_bucket[24h])))" }], "fieldConfig": { "defaults": { "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null }, { "color": "yellow", "value": 600 }, { "color": "red", "value": 900 } ] } } } }Wire a monthly ENPS pulse
Keep it under 30 seconds. Post in Slack, anonymous collection (Google Form or Polly). Publish the results and what you’re changing.
curl -X POST https://slack.com/api/chat.postMessage \ -H "Authorization: Bearer $SLACK_BOT_TOKEN" \ -H "Content-type: application/json" \ -d '{ "channel": "#eng", "text": "ENPS: Would you recommend our dev environment to a friend? 0-10. Reply to the form: <https://forms.gle/xyz>\nAlso: 1) What slowed you down last sprint? 2) What should we stop doing?" }'
That’s it. One pipe, one model, one dashboard. You can add Backstage analytics later if you truly need it.
Before/after: the week we earned trust
A real snapshot from that 70-dev org after 30 days of focusing on the data:
- CI p95 dropped from 14m to 5m after we parallelized tests and cached Docker layers. Cost: +$480/month in runners. ROI: devs reclaimed ~40 hours/week.
- CI flake rate fell from 9% to 1.5% by quarantining two suites and fixing a clock-skew race in a Python test. Visible in the flake panel within a day.
- PR idle time halved from 11h to 5h by auto-assigning reviewers via codeowners and Slack reminders.
- Deploy frequency doubled for three services after we unblocked canary in Argo Rollouts and added
rollback_on_failure: true. - ENPS rose from 12 to 30. Top free-text complaint (“CI roulette”) disappeared in the next pulse.
We didn’t buy anything. We didn’t rewrite the world. We just believed the graph and moved the levers that matter.
Design principles that keep you honest
- Favor paved-road defaults. Use
gh, built-in CI exporters,argocd-metrics,Prometheus,Grafana. Avoid building a bespoke event collector unless you’ve maxed out the basics. - One transformation layer. Define metrics in
dbtorSQLso changes are code-reviewed and diffed. - Privacy by design. Aggregate at team/service level. No individual leaderboards. It poisons the well.
- SLOs for platform. Publish targets:
CI p95 < 10m,flake rate < 2%,deploy success > 99%,MTTR < 30m for P2. Tie on-call rotations and backlog to these. - Red/Amber/Green thresholds. Predefine what yellow vs red means. Avoid dashboard bikeshedding.
- Drilldowns, not dashboards sprawl. One landing page with links to focused drilldowns (CI, Review, Delivery).
Surveys that don’t feel like HR homework
I’ve seen the 30-question quarterly “developer happiness” survey doomscroll into irrelevance. Go light and frequent:
- ENPS 0–10 monthly, anonymous, with two open questions.
- Tag the pulse to changes. If you just changed CI runners or the dev container, ask a one-off “Did X help?” question.
- Close the loop. Publish outcomes: “We saw CI flake at 9%, we quarantined Suite A, flake is 2.1%.”
- Correlate sentiment with objective metrics. If ENPS dips but CI is green, look at PR idle or local dev environment.
If your survey results don’t trigger a backlog change within two sprints, stop surveying and fix your feedback loop.
What not to build (and what to kill fast)
- No IDE spyware. Don’t measure keystrokes. You’ll lose trust and gain nothing.
- No custom event tap on every service before you’ve mined Git/CI/Argo. Your change data captures can wait.
- No DORA cargo-culting without definitions. If you can’t explain how you compute Lead Time in 2 sentences, pause.
- No 20-graph dashboards. They’ll look impressive and change nothing.
- Kill vanity metrics. Commit counts, story points, “lines of code changed.” Delete and move on.
A one-week plan to get this live
Day 1–2
- Pick 5 metrics: cycle time, PR idle, CI p95, flake rate, deploy freq.
- Export Git provider PRs/reviews to your warehouse. Set up CI and ArgoCD exporters into Prometheus.
Day 3–4
- Create
dbtmodels for PR metrics; validate with two known repos. - Add PromQL panels for CI p95 and flake; add deploy frequency.
Day 5
- Publish Grafana with team filters and thresholds; announce ENPS pulse; commit to two fixes driven by the data.
Week 2+
- Tackle the top two bottlenecks only; publish impact; iterate models as needed.
- Optional: add Backstage usage metrics, local build time sample (survey), or feature flag rollout health (LaunchDarkly metrics).
Key takeaways
- Track a small set of outcome metrics: cycle time, PR idle time, CI duration/flake rate, deploy frequency, and MTTR—plus a monthly ENPS pulse.
- Prefer paved‑road data sources you already have (Git provider, CI, ArgoCD, Prometheus). Avoid bespoke event collectors and IDE snooping.
- Define clear metric contracts in SQL/PromQL and version them. If the definition changes, everyone sees the diff.
- Tie dashboards to platform SLOs (e.g., CI p95 < 10m, Deployment success > 99%) to drive action, not vanity graphs.
- Ship a first dashboard in one week with incremental accuracy. Use it to kill your top two bottlenecks, then iterate.
Implementation checklist
- Pick 5 metrics max to start: cycle time, PR idle time, CI p95, CI flake rate, deploy frequency.
- Instrument data sources you already run: Git provider API, CI exporter, ArgoCD metrics, Prometheus.
- Create one transform layer (dbt/SQL) with versioned metric definitions.
- Provision a single Grafana dashboard with RAG thresholds and per‑team filters.
- Run a monthly ENPS pulse with two open‑text questions. Publish results and follow-ups.
- Attach platform SLOs to metrics and make them trackable in Grafana.
- Review weekly: fix top bottlenecks; delete vanity graphs.
Questions we hear from teams
- How do we avoid turning this into developer surveillance?
- Aggregate by team/service and time window (e.g., weekly). No individual leaderboards, no keystrokes, no IDE plugins. Publish definitions and use the data to improve platforms, not performance-review individuals.
- Do we need a data lake or a new SaaS to start?
- No. Use Git provider APIs, CI/ArgoCD Prometheus exporters, and your existing warehouse (BigQuery/Postgres). One transform layer (dbt/SQL), Grafana for visuals. You can always layer a vendor later if you outgrow this.
- Where do DORA metrics fit?
- Delivery frequency and MTTR are straight DORA. Lead time is your cycle time model (first commit to prod). Change failure rate can be derived from deployment failures and incident tags. Keep definitions explicit and versioned.
- How do we handle monorepos and multi-team ownership?
- Tag by directory or service label with CODEOWNERS and CI labels. In transforms, map paths to teams and services. Provide a team filter on the dashboard so ownership lines are clear without bespoke tracking.
- What’s the maintenance cost?
- Under a day per month if you keep to paved-road tooling. New repo? It’s just more Git events. New service? It’s just more Prom metrics. The expensive part is the fixes the dashboard reveals—worth it.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
