How do I convince teams to use the paved road without a mandate?

Make the paved road obviously faster. Ship templates that get a service to staging in hours, not days. Publish side-by-side metrics (lead time, build minutes). Add nice-to-haves (auto dashboards, default alerts) that teams don’t want to rebuild. Keep the door open for exceptions via RFCs—but make the paved road the path of least resistance.

Is Backstage required for a just‑enough platform?

No. It helps at scale, but repo templates plus good docs are enough to start. We typically add Backstage once you hit 50–100 services and need a catalog, scorecards, and scaffolding UX.

What about multi-cloud or hybrid?

Treat each runtime as a product. If you truly must be multi-cloud, keep the paved road consistent (CI, policy, observability) and swap the deploy adapter (EKS vs. GKE vs. AKS). Don’t chase lowest common denominator abstractions early; they slow everyone down.

How do we keep security on side without gatekeeping?

Move controls left with policy-as-code (OPA/Conftest), SBOMs, and required checks. Give security visibility in PRs and GitOps repos. Reserve manual approvals for sensitive changes (e.g., PCI boundary), not routine deploys.

Platform-productivity · Oct 4, 2025 · 9 minute read

The Platform That Did Less and Shipped More: A Just‑Enough Paved Road for Unblocking Product Teams

Stop empire-building. Give teams a boring, reliable paved road with sane defaults, guardrails, and zero tickets. That’s the “just‑enough platform” that actually moves the needle.

Alex Kim

Principal, Platform & SRE, GitPlumbers

20 years building and fixing platforms at scale. Ex-Stripe, ex-Atlassian, once shipped a monolith that refused to die. Helps teams adopt paved roads, not platform empires.

Do the boring thing, repeatably. Your teams will thank you.

Back to all posts

The anti-pattern I keep seeing: Platform as empire

I’ve watched smart teams burn a year building a bespoke "internal platform"—custom CLI, in-house UI, microservices mesh everywhere—only to land in the same place: product teams still file tickets to ship. The platform team becomes a help desk with root. The business still misses quarters.

What actually worked at the places that broke out—fintechs on AWS, SaaS companies on GKE, a health-tech on Azure—wasn’t a grand platform. It was a just‑enough paved road: boring, well-lit defaults, with guardrails you don’t notice unless you try to do something reckless.

What “just‑enough platform” actually means

A just‑enough platform isn’t a product you go buy or a 12-month program. It’s a minimal set of paved-road decisions that make the common path smooth and the weird path possible.

Standardize the deploy path: Git push → CI builds and scans → image published → GitOps sync to envs. No bespoke CLIs, no ticket approvals.
Offer two runtimes max: e.g., k8s (ArgoCD) for long-running services and Fargate/Cloud Run for simple web/cron. Everything else is “you build, you run” with support SLA clearly lower.
Self-service scaffolding: Backstage templates or GitHub repo templates generate a working service with Dockerfile, Helm/Kustomize, and a ci.yaml that calls a reusable workflow.
Guardrails, not gates: org policies, OPA/Conftest checks, and required CI checks. No "Platform Jira" for basic deploys.
Out-of-the-box observability: logs, metrics, traces via OpenTelemetry exporters baked into templates; dashboards pre-wired in Grafana and alerts in PagerDuty.
OIDC everywhere: ephemeral credentials from CI to cloud. Kill long-lived keys.

Trade-offs (the honesty section):

You won’t please every team. That’s fine. Exceptions get a written RFC and an expiry date.
You’ll delay shiny toys—service mesh, Crossplane, custom UIs—until the basics are boring. Shipping product beats platform vanity metrics.
You’ll pick winners: two languages, two runtimes. That’s leadership, not authoritarianism.

Before/After: what changed when we did less

At a B2B SaaS (120 engineers, AWS + Kubernetes), we replaced a platform that tried to do everything with a paved road that did the least possible well.

Before

Ticket-driven deploys: platform merged release branches and kubectl’d prod.
Jenkinsfile per repo; 14 variations of "build and push" logic; static AWS keys in org secrets.
Three runtime patterns (EKS, Fargate, Lambda), five base images, four scanning tools; no shared templates.
Onboarding to first prod deploy: 4–6 weeks. Lead time for changes: 3+ days. MTTR: 2h+. Build minutes per change: ~70.

After (90 days)

GitHub Actions with a single reusable workflow; OIDC to AWS; images scanned with trivy; SBOM generated.
ArgoCD app-of-apps managed envs; per-team folders in iac-envs repo; no manual prod access.
Two blessed runtimes: k8s (EKS) and Fargate for simple jobs/cron. Everything else required an RFC.
Backstage for repo templates and service catalog; no custom UI.
Onboarding to first prod deploy: 2–3 days. Lead time: < 4 hours. MTTR: ~25 minutes. Build minutes per change: ~38. Change failure rate dropped 30%.
Platform team ticket volume dropped 60%. Infra spend per service down 12% by consolidating images and right-sizing defaults.

The paved road, in concrete pieces

You don’t need much. Here’s the minimal reference that’s worked at multiple clients.

A service template per language

Each template ships with everything wired to the reusable CI, image scanning, and GitOps.

# bootstrap a new service (no UI required)
gh repo create acme/my-service --template acme/templates/service-nodejs --public=false

# service-template/.github/workflows/ci.yaml
name: ci
on:
  push:
    branches: [main]
  pull_request:
permissions:
  contents: read
  id-token: write
jobs:
  call:
    uses: acme/platform-workflows/.github/workflows/ci-reusable.yaml@v2
    with:
      app_name: my-service

# service-template/deploy/helm/values.yaml
image:
  repository: 1234567890.dkr.ecr.us-east-1.amazonaws.com/my-service
  tag: "{{ .Values.git.sha }}"
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"
securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false

A single reusable CI workflow

# platform-workflows/.github/workflows/ci-reusable.yaml
name: reusable-ci
on:
  workflow_call:
    inputs:
      app_name:
        required: true
        type: string
jobs:
  build-test-scan:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version-file: '.nvmrc'
      - run: npm ci
      - run: npm test -- --ci
      - name: Build image
        run: docker build -t ${{ inputs.app_name }}:${{ github.sha }} .
      - name: Scan image
        uses: aquasecurity/trivy-action@0.20.0
        with:
          image-ref: ${{ inputs.app_name }}:${{ github.sha }}
          ignore-unfixed: true
      - name: Generate SBOM
        uses: anchore/sbom-action@v0
        with:
          artifact-name: sbom-${{ inputs.app_name }}-${{ github.sha }}.spdx.json
      - name: Configure AWS OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::1234567890:role/gha-oidc-deploy
          aws-region: us-east-1
      - name: Login to ECR
        uses: aws-actions/amazon-ecr-login@v2
      - name: Push image
        run: |
          IMAGE=$(aws ecr describe-repositories --repository-names ${{ inputs.app_name }} --query 'repositories[0].repositoryUri' --output text)
          docker tag ${{ inputs.app_name }}:${{ github.sha }} $IMAGE:${{ github.sha }}
          docker push $IMAGE:${{ github.sha }}
      - name: Update Helm values (GitOps)
        run: |
          sed -i "s/tag: .*/tag: \"${{ github.sha }}\"/" deploy/helm/values.yaml
          git config user.email "bot@acme.com"
          git config user.name "ci-bot"
          git commit -am "chore: bump image ${{ inputs.app_name }}@${{ github.sha }}"
          git push

GitOps with ArgoCD

# iac-envs/apps/team-foo/my-service.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-service
spec:
  project: team-foo
  source:
    repoURL: https://github.com/acme/my-service
    path: deploy/helm
    targetRevision: main
    helm:
      valueFiles:
        - values.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: team-foo
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Guardrails via policy-as-code

# policy/k8s_security.rego
package k8s.security

deny[msg] {
  input.kind.kind == "Deployment"
  not input.spec.template.spec.securityContext.runAsNonRoot
  msg := "Deployment must set securityContext.runAsNonRoot=true"
}

deny[msg] {
  input.kind.kind == "Deployment"
  containers := input.spec.template.spec.containers
  some i
  not containers[i].resources.requests.cpu
  msg := "Containers must specify resource requests"
}

# validate in CI
yq ea '. as $item ireduce ({}; . * $item )' deploy/helm/templates/*.yaml \
  | conftest test -p policy -

Light-touch onboarding with Backstage (optional)

# backstage/templates/service-nodejs.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: service-nodejs
spec:
  owner: platform
  type: service
  steps:
    - id: fetch
      action: fetch:template
      input:
        url: ./content
    - id: publish
      action: publish:github
      input:
        repoUrl: github.com?owner=acme&repo={{ parameters.name }}

You can do the same with plain GitHub repo templates if Backstage is overkill for your size right now.

Guardrails without gates: keep humans out of the happy path

Approval queues and platform tickets feel “safe” and burn calendar time. Put safety in code and policy instead.

Branch protections: require status checks, enforce CODEOWNERS for sensitive folders, disallow force pushes across the org. Manage with Terraform’s github provider so drift is visible in PRs.
Sensible defaults: base images with USER nonroot, platform package that wires OpenTelemetry and HTTP timeouts. Ship it in the template, not a wiki.
Runtime SLOs: e.g., P95 <= 400ms, Availability 99.9%. Teams own their SLOs; platform owns the shared reliability. Alert on SLO burn, not CPU percent.
Exception process: the door for off-road exists, but requires an RFC with a rollback plan and a sunset date. No permanent snowflakes.

Quick example enforcing branch protection as code:

# terraform/github/branch_protection.tf
resource "github_branch_protection" "main" {
  repository_id  = github_repository.repo.node_id
  pattern        = "main"
  required_status_checks {
    strict   = true
    contexts = ["ci", "trivy", "conftest"]
  }
  required_pull_request_reviews {
    required_approving_review_count = 1
    require_code_owner_reviews      = true
  }
  enforce_admins = true
}

What to standardize vs. what to leave alone

Standardize the boring, high-leverage stuff. Leave product architecture and iteration speed to teams.

Standardize:
- CI/release workflow (GitHub Actions reusable workflows, OIDC to cloud)
- Deploy mechanism (ArgoCD GitOps, helm/kustomize)
- Base images, security scanners (trivy), SBOM generation
- Observability (OpenTelemetry exporter, Grafana dashboards, log schema)
- Secrets management (SSO + AWS IAM/GCP IAM, Vault if you must)
- Environments and namespaces conventions
Decentralize:
- Service boundaries and data models
- Language within a small allowed set (e.g., Go and Node.js)
- Feature flags strategy (LaunchDarkly, OpenFeature) per team’s needs
- Rollout tactics (blue/green vs. canary) within supported patterns

Things we explicitly chose not to centralize (at first):

Service mesh: we waited until mTLS and traffic shaping justified the complexity. Istio came later, with a single managed profile.
Crossplane/infra CRDs: started with Terraform modules. Introduced Crossplane once teams were fluent in GitOps.
Custom platform UIs: GitHub + docs + Backstage templates beat a one-off portal we’d have to maintain forever.

Rollout playbook: 30 / 60 / 90

You don’t need an org reorg. You need a tight, incremental rollout and hard metrics.

First 30 days
- Pick two teams as design partners. Lock in two runtimes and one language template.
- Stand up the reusable CI workflow, OIDC, and ArgoCD. Ship the Node.js template.
- Instrument DORA metrics: lead time, deployment frequency, change failure rate, MTTR.
Days 31–60
- Migrate 5–8 services onto the paved road. Kill static keys.
- Enforce policy-as-code; add required checks to org.
- Publish a migration guide and an RFC template for exceptions.
Days 61–90
- Expand templates (Go + cron jobs). Add baseline dashboards and SLOs.
- Shut down legacy deploy paths (Jenkins, manual kubectl).
- Review metrics in a public doc. Celebrate wins; publish misses and next steps.

Targets I’ve used that didn’t get me laughed out of the room:

Onboarding time to first prod deploy: < 5 days
Lead time for changes: < 1 day
MTTR: < 45 minutes
Ticket count to platform team: -50% within 2 quarters

Signs you’ve nailed “just‑enough”—and what to do next

Engineers can spin up a service in an hour and ship to staging by lunch without asking anyone for permission.
The platform backlog is mostly templates and guardrails, not bespoke integrations.
Auditors are bored—in a good way—because SBOMs, OIDC, and policy checks show up in PRs.
You have one or two polite rebels. Their RFCs are good. You learn from each other.

Next steps once the basics are boring:

Add canary automation (Argo Rollouts), progressive delivery, and traffic mirroring if your incident review demands it.
Introduce Backstage Tech Insights for scorecards (SLO coverage, runtime parity).
Consider a service mesh when you hit real problems (zero-trust, multi-tenant MTLS, traffic shaping)—not before.

Do the boring thing, repeatably. Your teams will thank you. And your CFO will actually send you a holiday card.

If you want a pair of steady hands, this is the kind of work GitPlumbers does every week—standing up just‑enough platforms that unblock product teams without creating a new bottleneck.

Related Resources

Key takeaways

A just‑enough platform standardizes the 20% that unlocks 80% of delivery: pipelines, deploy flow, observability, and baseline security.
Favor a paved road with templates and reusable workflows over bespoke CLIs and ticket-driven “platform as a gate.”
Guardrails beat gates: policy-as-code, required checks, and minimal runtime choices reduce cognitive load without stifling autonomy.
Ship a thin slice in 30–60 days, then iterate. Measure lead time, MTTR, and onboarding time to keep the platform honest.
Centralize defaults and guardrails. Decentralize repo ownership, service design, and day‑2 operations inside clear SLOs.

Implementation checklist

Define two blessed runtimes (e.g., `k8s` and `Fargate` or `Cloud Run`). Kill all other snowflakes.
Publish one service template per language with `Dockerfile`, `Helm`/`Kustomize`, and a `ci.yaml` that calls a reusable workflow.
Adopt GitHub Actions OIDC to your cloud; remove static cloud keys from repos and CI.
Install ArgoCD and manage deploys via GitOps; no manual kubectl in prod.
Enforce guardrails with OPA/Conftest and org-level branch protections; no ticket approvals for common paths.
Stand up a lightweight Backstage (or plain repo templates) for self‑service scaffolding.
Instrument platform KPIs: onboarding time, build minutes per change, change failure rate, MTTR, infra spend per service.

Questions we hear from teams

How do I convince teams to use the paved road without a mandate?: Make the paved road obviously faster. Ship templates that get a service to staging in hours, not days. Publish side-by-side metrics (lead time, build minutes). Add nice-to-haves (auto dashboards, default alerts) that teams don’t want to rebuild. Keep the door open for exceptions via RFCs—but make the paved road the path of least resistance.
Is Backstage required for a just‑enough platform?: No. It helps at scale, but repo templates plus good docs are enough to start. We typically add Backstage once you hit 50–100 services and need a catalog, scorecards, and scaffolding UX.
What about multi-cloud or hybrid?: Treat each runtime as a product. If you truly must be multi-cloud, keep the paved road consistent (CI, policy, observability) and swap the deploy adapter (EKS vs. GKE vs. AKS). Don’t chase lowest common denominator abstractions early; they slow everyone down.
How do we keep security on side without gatekeeping?: Move controls left with policy-as-code (OPA/Conftest), SBOMs, and required checks. Give security visibility in PRs and GitOps repos. Reserve manual approvals for sensitive changes (e.g., PCI boundary), not routine deploys.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to a GitPlumbers advisor See our Platform Accelerator playbook