How big should our technical debt budget be?

Start at 15–20% of engineering capacity per sprint. If you’re running hot (MTTR > 1h, change fail rate > 15%, error budgets burning), bump to 25–35% for two quarters and review. Pair with 1–2 quarterly debt epics funded like product features.

How do we prevent engineers from relabeling features as debt?

Define a taxonomy, require a brief Debt RFC for anything >2 days, and have EM/PM co-approve. Audits during the monthly portfolio review catch drift.

What if product refuses to give time to debt?

Make it a leadership policy tied to SLOs and risk. When error budgets burn, features freeze. Put PMs in the monthly portfolio review so they see the ROI.

How do we quantify ROI credibly?

Automate DORA and SLO metrics, quantify incident minutes saved, and track cloud and licensing cost reductions. Use a simple model, show baselines, and report over 30/60/90 days.

We have a lot of AI-generated code. Where do we start?

Run CodeQL and Sonar to identify duplication and low maintainability hotspots, create a weekly ‘AI cleanup’ lane, and enforce quality gates in CI. Tie cleanup to observed bugs and incident patterns for immediate ROI.

Can we do this without adding headcount?

Yes. Use a fixed capacity budget, automate detection (Renovate/Dependabot, Sonar, CodeQL), and be disciplined about quarterly investments. You’ll reinvest the time you save.

Culture · Nov 22, 2025 · 10 minute read

The Tech Debt Budget Your CFO Won’t Kill: Turning Cleanup into ROI Your Board Can Read

Stop negotiating debt work sprint by sprint. Set a real budget, measure returns like a portfolio, and make it part of how you lead—not a side quest engineers beg for.

Evan Marshall

Partner, GitPlumbers (ex-Netlify, Capital One, AWS)

Evan has spent 20 years untangling legacy systems and calming on-call rotations. He’s led platform and SRE teams through PCI audits, SOX-driven rewrites, and more failed ‘stabilization sprints’ than he cares to admit. These days he helps teams turn tech debt into a funded portfolio with measurable ROI.

Treat technical debt like a portfolio with policy guardrails, not a side quest you beg permission to do.

Back to all posts

The moment you realize “we’ll fix it later” is a lie

I’ve watched teams ship a hero release, then sink six months later under flaky tests, runaway build times, and an unpredictable on-call. At one fintech, our change fail rate hit 34% and MTTR ballooned past 7 hours. The CTO kept asking for a “one-time stabilization sprint.” There is no such thing. What actually worked was turning debt into a funded budget with ROI we could show the CFO and the board.

This is how you do it without theater or new buzzwords.

Define a debt taxonomy and keep a ledger

You can’t budget what you can’t name. Create a simple taxonomy so finance and engineering speak the same language.

Reliability: flaky tests, missing circuit breakers, noisy alerts, runbook gaps
Security/Compliance: outdated libs, missing SBOM, misconfigured IAM, SOX audit gaps
Performance/Scalability: N+1 queries, hot partitions, unbounded queues, cache misses
Infrastructure: drifted Terraform, orphaned resources, manual runbooks
Data: schema debt, missing lineage, inconsistent IDs, batch failures
Developer Experience: CI slowness, IDE/project setup pain, flakey local envs
AI-generated code cleanup: vibe-coded patches, duplicate logic, ambiguous contracts

Maintain a debt ledger that ties each item to an owner, system, cost, and expected return. Don’t overthink the tool—Jira, Linear, or Azure Boards is fine. Just be consistent.

-- Jira JQL examples
project = CORE AND labels in (debt, reliability) AND statusCategory != Done

-- Time-bound portfolio view
project = CORE AND labels = debt AND updated >= -30d ORDER BY priority DESC

If it’s not in the ledger, it doesn’t exist. If it’s in the ledger, it needs an exit criterion and a date.

Set the budget like a portfolio, not a pity ask

Stop negotiating debt on the fly. Establish two guardrails that leadership signs:

Capacity budget: 15–25% of dev capacity every sprint goes to debt. Non-negotiable. You can flex by team maturity—greenfield at 10–15%, legacy heavy at 25–35%.
Quarterly investments: 1–2 funded debt epics per quarter that are too big for sprint slices (e.g., migrate legacy auth to OIDC, eliminate a snowflake pipeline, containerize the last VM holdouts).

How we enforce it in reality:

Each team maintains a Debt Kanban column visible in standup.
Every PR labeled type:debt counts toward the sprint budget. If the ratio drops below the floor, we escalate in the sprint review.
The CTO and VP Eng publish a one-page Debt Policy and back it in QBRs. Product partners see the policy in planning, so there’s no surprise.

# .github/workflows/tech-debt-budget.yml
name: Enforce Tech Debt Budget
on:
  pull_request:
    types: [opened, edited, labeled, unlabeled, synchronize]
jobs:
  budget:
    runs-on: ubuntu-latest
    steps:
      - name: Compute Debt Ratio (last 50 merged PRs)
        uses: actions/github-script@v7
        with:
          script: |
            const prs = await github.paginate(github.rest.pulls.list, {
              owner: context.repo.owner,
              repo: context.repo.repo,
              state: 'closed',
              per_page: 50
            });
            const merged = prs.filter(p => p.merged_at);
            const debt = merged.filter(p => (p.labels||[]).some(l => /type:debt/i.test(l.name)));
            const ratio = (debt.length / Math.max(1, merged.length));
            core.setOutput('ratio', ratio);
      - name: Fail if below policy floor
        if: ${{ steps.budget.outputs.ratio < 0.15 }}
        run: |
          echo "Debt PR ratio below 15% policy floor. Add `type:debt` work or escalate." && exit 1

Measure ROI with numbers finance respects

No CFO cares that “the codebase feels better.” Show deltas that hit margin, risk, and speed. Pick 3–5 metrics and automate them.

Delivery (DORA):
- Lead time for changes
- Deployment frequency
- Change fail rate
- MTTR
Reliability (SLO): error budget burn, incident minutes, paging volume per service
Cost: build minutes, cloud waste (orphaned EBS, idle RDS), license consolidation
Developer Experience: CI time, flaky test rate, PR review latency, local env setup time

A simple ROI model is enough:

ROI = (Hours saved × Loaded hourly rate) + (Incidents avoided × Cost per incident) + (Cloud cost reduction) - (Investment)

Example: We replaced a bespoke queue with SQS + proper DLQs.

Investment: 3 engineers × 3 weeks × $140/hr loaded ≈ $50k
Outcomes after 60 days:
- MTTR from 3h → 45m (saved 15 incident-hours/month at ~$1,000/hr = $15k)
- Change fail rate from 22% → 9% (saved ~20 rollbacks; ~80 eng-hours = $11k)
- Cloud cost down $4k/month (waste eliminated)
60-day ROI: ~$15k + $11k + $8k - $50k = -$16k (expected) but 6-month ROI: ~$108k - $50k = $58k positive

Automate the feeds:

# Error budget burn (4h window) for checkout service
(1 - sum(rate(http_request_duration_seconds_bucket{service="checkout",le="0.5"}[4h]))
  / sum(rate(http_request_duration_seconds_count{service="checkout"}[4h])))
  * on() group_left() (scalar(1) * 1)

-- BigQuery over GH Archive: median PR lead time last 30 days
SELECT
  APPROX_QUANTILES(TIMESTAMP_DIFF(merged_at, created_at, HOUR), 2)[OFFSET(1)] AS median_lead_hours
FROM `githubarchive.day.*`
WHERE repo.name = 'yourorg/yourrepo'
  AND type = 'PullRequestEvent'
  AND action = 'closed'
  AND merged = TRUE
  AND _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
                        AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())

Rituals and leadership behaviors that make it stick

Policies die without muscle memory. Here’s the cadence I’ve seen work at places like Capital One, Shopify, and a FAANG I can’t name without PR.

Weekly (15 min): Debt standup per team. Review top 3 ledger items, unblock, assign owners.
Sprint review: Show the “debt % of capacity” and before/after metrics. If below the floor, the EM explains why and how we catch up.
Monthly portfolio (45–60 min): CTO, VP Eng, Product, Finance. Review ROI on top 10 investments. Decide to continue/kill/spin up. Reallocate budget openly.
Quarterly demo day: Engineers demo a debt win live. We measure applause in milliseconds of MTTR removed.

Leadership behaviors that matter:

Make it policy, not preference. Publish a one-pager and repeat it in QBRs.
Tie to OKRs and SLOs. Example key result: “Reduce checkout MTTR from 2h → 30m by Q3.”
Protect on-call. If error budgets are burning, freeze features and fix reliability first. No exceptions.
Celebrate debt work. Promote engineers for removing entire classes of incidents, not just shipping features.

Template your PRs for clarity:

# Debt RFC PR Template

- Category: Reliability | Security | Performance | Infra | Data | DevEx | AI Cleanup
- Problem: What pain, with links to incidents/dashboards
- Change: What we’re doing, what we’re deleting
- Exit criteria: Definition of done and rollback plan
- ROI: Before/after metrics, expected timeline, cost
- Owner: Team, on-call rotation, runbook link

Tooling examples that won’t derail your teams

You don’t need a platform team the size of Netflix. Use what you have, wire it together, and automate the boring parts.

Jira/Linear: Labels debt, ai-cleanup, reliability. Dashboards grouped by service and owner.
SonarQube/CodeQL/Snyk: Track vulnerability and maintainability trends; link to debt epics. Flag suspicious PRs from AI assist tools for review.
GitHub Actions: Enforce labeling, run Renovate/Dependabot nightly, block merges when type:debt quota not met (policy floor, not hard stop forever).
Backstage: A catalog entity annotation for debt score; show SLOs and ownership.

# Backstage catalog-info.yaml snippet
metadata:
  name: checkout-service
  annotations:
    tech-debt/score: "B-"
    tech-debt/owner: "payments-platform"
    slos/error-budget: "99.9%"

Terraform: Tag debt investments so Finance can see cost allocation.

# Terraform tags to attribute cleanup cost
resource "aws_instance" "ci_runner" {
  ami           = var.ami
  instance_type = "m6i.large"
  tags = {
    Project     = "devex-ci-speedup"
    CostCenter  = "ENG-DEBT"
    Owner       = "platform"
  }
}

Datadog/Prometheus/Grafana: Dashboards for MTTR, change fail rate, error budget burn. Tie widgets to debt epics via links.

# Quick flaky-test triage report
pytest -q --maxfail=1 --durations=25 | tee test_report.txt
rg "FLAKY|flaky" test_report.txt | wc -l

AI code cleanup: If you’ve got “vibe code” from overzealous AI pair programmers, designate a weekly sweep. Use CodeQL duplicate code queries and Sonar maintainability thresholds to flag PRs.

# .github/workflows/ai-vibe-cleanup.yml
name: Vibe Code Cleanup Gate
on: [pull_request]
jobs:
  codeql-dup:
    uses: github/codeql-action/init@v3
  sonar:
    uses: sonarsource/sonarqube-scan-action@v1
    with:
      args: -Dsonar.qualitygate.wait=true -Dsonar.maintainability.rating= B

What good looks like in 2 quarters (and the traps)

After 2 quarters at a global retailer, we saw:

Debt budget at 20% sustained across 8 teams
MTTR: 2h → 38m; Change fail rate: 19% → 8%
Deployment frequency: 2/week → 8/week; Lead time: 4d → 1.2d
CI time: 28m → 11m; Cloud waste reduced by $42k/month
Incident minutes down 63%; Pager noise down 55%

Common traps I’ve personally stepped on:

Budget theater: Teams relabel feature work as debt. Fix with reviews and an approval checklist.
Zombie epics: Long-running tickets with no exit criteria. Require milestones and kill switches.
Gold-plating: Engineers rewrite for sport. Demand a business case and a 90-day check-in.
Death by metrics: Too many charts, no decisions. Pick 3–5 and wire them to actions.
No product partner. Debt traded behind closed doors will get cut. Bring PMs and Finance to the portfolio review.

If your org is already drowning—incidents, compliance heat, or AI-generated code regressions—start with a crisis sprint to stop the bleeding, then lock in the budget.

If you want a partner who’s done this in the wild

At GitPlumbers, we don’t sell refactors for sport. We build the budget, instrumentation, and rituals so your teams can fix systems and ship faster—safely. We’ve cleaned up “vibe-coded” services, retired snowflake VMs with Terraform + ArgoCD, and moved error budgets back in the green without pausing feature velocity.

If you need help standing this up, we’ll start with your metrics, not ours. Then we’ll tune the budget to your realities—compliance, quarter-end peaks, and whatever your CFO calls “non-negotiable.”

Related Resources

Key takeaways

Set a fixed, leadership-backed technical debt budget (capacity-based and capex-like) and stop re-litigating it every sprint.
Treat debt like a portfolio: track investment size, expected return, and time horizon; measure ROI using DORA, SLO, and cost deltas.
Ritualize the work: 15-minute weekly debt standup, monthly portfolio review with CFO/CTO, quarterly demo day tied to OKRs.
Force visibility with a debt ledger and taxonomy; label and cost-allocate work across Jira, GitHub, and cloud accounts.
Automate measurement: dashboards that show cycle time, incident minutes, error budget burn, and infra cost before/after.
Lead from the front: executives protect the budget, require pre-merge quality gates, and celebrate debt paydown like features.

Implementation checklist

Define a debt taxonomy: reliability, security, performance, infra, data, UX, AI-generated code cleanup.
Set budget guardrails: 15–25% of capacity each sprint + 1–2 quarterly, funded debt epics.
Create a cross-functional debt board with clear owners and exit criteria.
Automate labels in Jira/GitHub and enforce via CI (block merges if debt budget violated).
Dashboard ROI: DORA metrics, SLO error budget, incident minutes, cloud cost, license spend.
Institute rituals: weekly 15-min, monthly portfolio review, quarterly debt demo.
Tie to OKRs and SLOs; freeze features when SLOs burn.
Publish a one-page debt policy and PR template for debt RFCs.

Questions we hear from teams

How big should our technical debt budget be?: Start at 15–20% of engineering capacity per sprint. If you’re running hot (MTTR > 1h, change fail rate > 15%, error budgets burning), bump to 25–35% for two quarters and review. Pair with 1–2 quarterly debt epics funded like product features.
How do we prevent engineers from relabeling features as debt?: Define a taxonomy, require a brief Debt RFC for anything >2 days, and have EM/PM co-approve. Audits during the monthly portfolio review catch drift.
What if product refuses to give time to debt?: Make it a leadership policy tied to SLOs and risk. When error budgets burn, features freeze. Put PMs in the monthly portfolio review so they see the ROI.
How do we quantify ROI credibly?: Automate DORA and SLO metrics, quantify incident minutes saved, and track cloud and licensing cost reductions. Use a simple model, show baselines, and report over 30/60/90 days.
We have a lot of AI-generated code. Where do we start?: Run CodeQL and Sonar to identify duplication and low maintainability hotspots, create a weekly ‘AI cleanup’ lane, and enforce quality gates in CI. Tie cleanup to observed bugs and incident patterns for immediate ROI.
Can we do this without adding headcount?: Yes. Use a fixed capacity budget, automate detection (Renovate/Dependabot, Sonar, CodeQL), and be disciplined about quarterly investments. You’ll reinvest the time you save.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to GitPlumbers about setting a real tech debt budget See how we instrument ROI with DORA and SLOs