Stop the drift: ADRs and paved roads beat bespoke tooling every time

I’ve watched teams burn quarters rewriting YAML because nobody wrote down the decision and every service is a snowflake. Here’s the boring system that works: ADRs to capture why, paved roads to make the right thing the easy thing.

> The paved road isn’t about control. It’s about making the smart choice the lazy choice.
Back to all posts

The refactor that died in Slack (and how we revived it)

Two years ago I walked into a team with four ingress controllers across three clusters, four flavors of CI, and five Node baselines. Everyone agreed to standardize… in a slide deck that got lost when the manager changed roles. Fast forward: a security patch in node:14 became a company-wide fire drill. Every service was a snowflake. The refactor dragged nine months because no one could answer a simple question: Why did we pick this stack, and what’s the blessed replacement?

We turned it around in three moves:

  • Wrote ADRs to capture the decisions and their context.
  • Shipped a paved-road template for services with GitHub Actions, Dockerfile, Helm, and default observability.
  • Enforced light guardrails and tracked adoption with scorecards.

Six weeks later, 80% of services were on the paved road. The next refactor (OpenTelemetry exporter change) took three months, not nine. Boring? Yes. Effective? Absolutely.

ADRs: stop tribal knowledge leaks

ADRs are the minimum viable governance. They don’t need committees or Confluence archaeology. A 10–20 minute Markdown note checked into Git is enough.

What works in practice:

  • Location: docs/adr/ADR-0007-adopt-otel-collector.md in the repo; cross-link to a platform repo ADR when org-wide.
  • Naming: incremental number + verb + object. Avoid bikeshedding the format.
  • Workflow: PR includes an ADR (or links to one). 24–48 hour review. Merged ADR sets the paved-road default. No ADR? No platform change.

A lightweight template:

# ADR 0007: Adopt OpenTelemetry Collector as default tracing pipeline

- Status: Accepted
- Date: 2025-04-12
- Owners: @platform-team
- Context
  We have inconsistent tracing: Jaeger direct from apps, Zipkin in legacy, vendor agent in payments. Difficult to debug cross-service latency; cost sprawl.
- Decision
  Route app traces to OpenTelemetry Collector via OTLP/HTTP. Export to Tempo (prod) and Jaeger (dev). Ship a default sidecar-free config.
- Consequences
  + Unified traces, one egress, vendor swap without app changes.
  - Apps must upgrade `opentelemetry-*` libs and set `OTEL_EXPORTER_OTLP_ENDPOINT`.
- Alternatives Considered
  Vendor agent per node; direct-to-vendor from apps.
- Rollout Plan
  1) Update paved-road Helm chart and base image. 2) Codemod for env var. 3) Scorecard to track adoption.

Cheap automation helps. We often add a CI guard that insists significant platform changes reference an ADR in the PR title, e.g. [ADR-0007].

# .github/workflows/adr-check.yml
name: require-adr-id
on:
  pull_request:
    types: [opened, edited, synchronize]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - name: Require ADR id in PR title
        run: |
          title="${{ github.event.pull_request.title }}"
          if ! echo "$title" | grep -Eq 'ADR-[0-9]+'; then
            echo "PR title must include ADR-#### for platform changes"; exit 1; fi

Use adr-tools or log4brains if you want index pages, but don’t overcomplicate it. The win is having the why in the same repo as the code.

Paved roads: make the right thing the fast thing

A paved road (golden path if you’re Spotify-pilled) is a versioned template plus tooling that makes default choices automatic. You don’t need a giant platform team. You need one good template per mainstream stack.

What goes into a service paved road (Node example):

  • Dockerfile using an approved base, non-root user, and multi-stage build.
  • GitHub Actions workflow with lint → test → build → scan → sign → push → deploy.
  • Helm chart (or Kustomize) wired for Prometheus, Tempo/OTel, and Kubernetes best practices.
  • Renovate config to keep dependencies fresh.

Before (three bespoke services):

  • 3 different Docker bases (node:12, node:14, random Alpine).
  • 2 CI providers (CircleCI, Jenkins), flaky caches, no image signing.
  • Helm charts hand-edited per repo, no default probes or resources.

After (paved road):

  • Single Docker base ghcr.io/acme/node-base:18.19, signed with cosign.
  • Single actions workflow under 8 minutes, deterministic caches.
  • Helm chart exposes standard metrics and tracing, SLOs defined.

Representative snippets from the paved road:

# Dockerfile
FROM ghcr.io/acme/node-base:18.19 as deps
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm i --frozen-lockfile

FROM ghcr.io/acme/node-base:18.19 as build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN pnpm build

FROM gcr.io/distroless/nodejs18-debian12
WORKDIR /app
COPY --from=build /app/dist ./dist
USER 10001:10001
CMD ["/app/dist/server.js"]
# .github/workflows/service.yml
name: ci
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 18 }
      - uses: pnpm/action-setup@v4
      - run: pnpm i --frozen-lockfile
      - run: pnpm lint && pnpm test --ci
      - run: pnpm build
      - uses: aquasecurity/trivy-action@0.21.0
        with: { scan-type: fs, severity: CRITICAL,HIGH }
      - name: Build and push image
        run: |
          echo $CR_PAT | docker login ghcr.io -u $GITHUB_ACTOR --password-stdin
          docker build -t ghcr.io/acme/${{ github.event.repository.name }}:${{ github.sha }} .
          docker push ghcr.io/acme/${{ github.event.repository.name }}:${{ github.sha }}
      - uses: sigstore/cosign-installer@v3
      - run: cosign sign --key env://COSIGN_KEY ghcr.io/acme/${{ github.event.repository.name }}:${{ github.sha }}
      - name: Deploy via ArgoCD
        run: argocd app sync ${{ github.event.repository.name }} --prune

Scaffolding it with Backstage or Cookiecutter keeps it consistent:

# backstage template (excerpt)
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: node-service
spec:
  owner: platform
  steps:
    - id: fetch
      action: fetch:template
      input:
        url: ./skeleton
    - id: publish
      action: publish:github
      input:
        repoUrl: github.com?owner=acme&repo={{ parameters.name }}

Guardrails: keep teams on the road without a police state

You don’t need draconian gates. You need a few policy-as-code checks and one make verify everyone runs locally and in CI.

  • Kubernetes policy: conftest with OPA to enforce base images, no privileged pods, required probes.
  • Infrastructure: tflint, tfsec, checkov for Terraform.
  • App layer: centralized eslint-config, pre-commit hooks, buf for Protobuf.

Example OPA policy for Kubernetes:

# policy/deployments.rego
package k8s

deny[msg] {
  input.kind == "Deployment"
  c := input.spec.template.spec.containers[_]
  not startswith(c.image, "ghcr.io/acme/node-base:")
  msg = "Containers must use approved base image"
}

deny[msg] {
  input.kind == "Deployment"
  input.spec.template.spec.containers[_].securityContext.privileged == true
  msg = "Privileged containers are not allowed"
}
# Makefile target wired in CI
verify:
	conftest test -p policy k8s/*.yaml
	tflint --config=./.tflint.hcl
	tfsec .
	npm run lint && npm test

If your guardrails block more than ~5% of PRs, your paved road is wrong or your rules are too strict. Fix the defaults first.

Safe refactors at scale: ADR → pave → codemod → scorecard

Refactors die when they rely on optimism and calendar time. The loop that works:

  1. ADR: capture intent, scope, and rollback.
  2. Paved road update: new template, base image, and Helm config.
  3. Codemod/automation: make 80% of changes mechanical.
  4. Scorecard: visible progress per service; give teams a path and a deadline.

A concrete example: switching from direct Jaeger to OTLP via Collector.

  • ADR sets the decision and rollout plan.
  • Paved road’s Helm chart adds default OTLP exporter.
  • Codemod updates environment variables and deps.
  • Scorecard shows which services are still on the old exporter.

Codemod (TypeScript) example to migrate uuid usage to crypto.randomUUID() as part of a runtime upgrade:

// codemods/uuid-to-randomuuid.ts
import { API, FileInfo, Identifier } from 'jscodeshift'
export default function transformer(file: FileInfo, api: API) {
  const j = api.jscodeshift
  const root = j(file.source)

  // Replace import { v4 as uuidv4 } from 'uuid'
  root.find(j.ImportDeclaration, { source: { value: 'uuid' } })
    .forEach(path => j(path).remove())

  // Add: import { randomUUID } from 'crypto'
  root.find(j.Program).forEach(p => {
    p.value.body.unshift(j.importDeclaration([
      j.importSpecifier(j.identifier('randomUUID'))
    ], j.literal('crypto')))
  })

  // Replace uuidv4() calls with randomUUID()
  root.find(j.CallExpression, { callee: { name: 'uuidv4' } })
    .replaceWith(() => j.callExpression(j.identifier('randomUUID'), []))

  return root.toSource()
}

Backstage scorecard (conceptual) to track adoption:

# catalog-info.yaml (annotation-based checks)
metadata:
  annotations:
    scorecards.gitplumbers.io/has-otel: "true"
    scorecards.gitplumbers.io/on-paved-road: "true"

Or use a simple dashboard that reads repo signals (presence of Dockerfile label, workflow file hash, chart version) and renders green/yellow/red.

Finally, let Renovate do the nagging. Paved-road repos include default Renovate rules:

{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "extends": ["config:base"],
  "packageRules": [
    { "matchDatasources": ["docker"], "groupName": "base-image", "schedule": ["after 9pm on sunday"] },
    { "matchUpdateTypes": ["minor", "patch"], "automerge": true }
  ]
}

Cost/benefit: the boring math

  • Without ADRs/paved roads: Every team spends 1–2 days per quarter rediscovering decisions; refactors require war rooms; incident MTTR is longer because telemetry varies. Velocity looks okay until security or compliance show up.
  • With ADRs/paved roads: One platform engineer maintains templates and guardrails. Teams click “Create service,” get a working baseline in minutes, and spend time on business logic.

A recent engagement (composite of two clients):

  • Paved road adoption: 15% → 78% in 8 weeks.
  • Average CI time: 18m → 8m after standardizing caches and build steps.
  • Refactor lead time (Node 16 → 18): 9 months → 3 months using codemods and scorecards.
  • MTTR on app incidents: 45m → 27m after standardizing logging/tracing.

The trade-off: the paved road won’t fit every edge case. That’s fine. Give teams an “escape hatch” with constraints: document the deviation in an ADR and plan the cost to re-merge later.

Lessons learned (and what I’d do differently)

  • Version your paved road (v1, v2, …) and publish a changelog. Upgrades become normal PRs, not migrations.
  • Don’t build a framework. Ship templates and modules. When we tried to ship a platform framework, we created tight coupling and slowed delivery.
  • Make exceptions boring: an ADR in the service repo explaining the deviation and an expiry date. Revisit it quarterly.
  • Guardrails before guidance: people read errors; they skim docs. Lint over lecture.
  • Scorecards visible to leadership: adoption moves when directors see red turning green.
  • Retire old roads: archive templates, mark deprecated chart versions, and remove old CI examples.

If your platform feels like a product with versions, release notes, and support windows—you’re doing it right.

structuredSections':[{

Related Resources

Key takeaways

  • ADRs capture the why so refactors don’t stall when people rotate or memory fades.
  • Paved roads make the good path fast and boring; bespoke tooling multiplies cognitive and maintenance costs.
  • Guardrails (policy-as-code, linters, CI gates) keep the fleet on the road without becoming cops.
  • Safe refactors at scale require an ADR, a paved-road update, codemods, and a scorecard to track adoption.
  • Measure drift with visible scorecards and invest in defaults, not frameworks.

Implementation checklist

  • Adopt a lightweight ADR template and make it part of PR hygiene.
  • Publish a paved-road template per primary stack (service, job, lib) and version it.
  • Automate guardrails with OPA/Conftest, linters, and a single `make verify` target.
  • Plan refactors with codemods and runbooks; track progress with scorecards in Backstage.
  • Enable Renovate/Dependabot and pin base images and modules to paved-road defaults.
  • Report monthly on paved-road adoption, CI time, and drift hotspots.

Questions we hear from teams

What’s the minimum viable ADR process?
A Markdown template in the repo, a PR label like `architecture`, and a 24–48 hour review SLA. Merge the ADR, then ship the change in the paved road. No committees.
How do we handle teams that truly can’t use the paved road?
Let them deviate with an ADR explaining why, the cost to maintain the snowflake, and a review date. Provide a thin compatibility layer if they’re on critical path (e.g., a different base image) but keep guardrails.
Backstage vs Cookiecutter vs Projen/Nx?
Pick one that matches your stack and culture. Backstage is great for discovery and scorecards. Cookiecutter is dead simple for scaffolding. Projen works well in TypeScript shops. Don’t mix three—your paved road should be one lane.
How do we measure drift?
Scorecards that evaluate repos on signals: template version, workflow hash, chart version, presence of ADRs, policy passes. Publish a weekly dashboard. Aim for 70%+ on-road in 8–12 weeks.
What about legacy monoliths?
Create a paved road for the monolith, too: standardized build, observability, and deploy. Use ADRs to document module boundaries and change plans. Safe refactors still follow ADR → pave → codemod → scorecard.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to GitPlumbers about paved roads that stick See our reference paved-road templates

Related resources