The Quality Gate That Paid For Itself In One Sprint: Paved-Road Defaults That Stop Tech Debt At The PR
If your CI doesn’t block low-signal code from merging, you’re paying compounding interest on technical debt. Here’s the minimal, boring set of automated gates that actually work—and the ones I’ve watched blow up teams.
Stop debating code style in PRs. Make the bot reject it so humans can talk about risk, complexity, and business value.Back to all posts
The PR that looked fine… until prod caught fire
I watched a team at a fintech merge a “simple refactor” on a Friday. No tests failed (because there weren’t many), lint warnings were “non-blocking,” and CI only checked that Docker built. Monday morning we had a lovely incident: a transaction reconciliation job was silently dropping rows due to an unchecked NaN. MTTR was 9 hours because we had zero safety rails in CI. I’ve seen this movie at SaaS unicorns and 50-person shops. Different logos, same pattern: if your quality gates don’t block merges, they’re a suggestion—aka technical debt with a bow on it.
Why this matters more in the AI-assisted era
AI-assisted commits are faster—and sloppier when unguarded. I’ve cleaned up too many “vibe coded” PRs where the AI hallucinated APIs, skipped edge cases, and copied insecure examples. When gates are strong, AI becomes a power tool. When they’re weak, you’re doing unpaid interest-only payments on tech debt.
Without blocking gates: lead time looks great until escaped defects, rework, and on-call pain eat your roadmap.
With boring, automated gates: devs ship confidently, and you stop arguing about style and nits. PR reviews focus on behavior and risk, not whitespace.
The minimal paved road that works (Day 1)
Here’s the set I deploy first. No bespoke magic. Just platform defaults and off-the-shelf linters.
Tests + coverage threshold: fail if coverage on changed lines drops (diff coverage) and enforce a floor (start at 60–70%, ratchet up).
Lint + format:
eslint --max-warnings 0andprettier --check(orruff/black,golangci-lint,rustfmt/clippy).Type-check:
tsc --noEmit,mypy,go vet—catch “obviously wrong” before runtime.SAST + SCA: CodeQL (or GitLab SAST) and Dependabot (or GitLab Dependency Scanning). Low false positives, blocking only for high/critical.
Branch protection: required checks on
mainand release branches. No admin overrides without a change record.
Keep it one command for dev ergonomics:
# Makefile
.PHONY: check
check: test lint type format
.PHONY: test
test:
npx jest --ci --coverage --runInBand
.PHONY: lint
lint:
npx eslint . --max-warnings 0
.PHONY: type
type:
npx tsc --noEmit
.PHONY: format
format:
npx prettier --check .Wire it into CI so it actually blocks
Don’t let this live in Confluence. Codify it in CI and branch protection. Default to GitHub Actions; GitLab CI is just as fine. Keep stages parallel and fast.
# .github/workflows/quality.yml
name: quality-gates
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
node-ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- name: Lint
run: npx eslint . --max-warnings 0
- name: Format check
run: npx prettier --check .
- name: Type check
run: npx tsc --noEmit
- name: Test with coverage
run: npx jest --ci --coverage
- name: Enforce coverage floor (70%)
run: |
COVER=$(node -e "console.log(require('./coverage/coverage-summary.json').total.lines.pct)")
THRESHOLD=${COVERAGE_THRESHOLD:=70}
awk -v c=$COVER -v t=$THRESHOLD 'BEGIN{ if (c+0 < t+0) { printf \"Coverage %.2f < %.2f\\n\", c, t; exit 1 } else { printf \"Coverage %.2f >= %.2f\\n\", c, t } }'Add SAST and SCA without yak shaving:
# .github/workflows/security.yml
name: security
on:
push:
branches: [main]
schedule:
- cron: '0 3 * * *'
jobs:
codeql:
uses: github/codeql-action/.github/workflows/codeql.yml@v3
with:
languages: javascript
deps:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npm audit --audit-level=highLock it down with branch protection (GitHub):
# require checks to pass on main
gh api -X PUT repos/:owner/:repo/branches/main/protection \
-f required_status_checks.strict=true \
-f required_status_checks.contexts[]='quality-gates' \
-f enforce_admins=true \
-f required_pull_request_reviews.dismiss_stale_reviews=true \
-f restrictions='null'GitLab? Same idea.
# .gitlab-ci.yml
stages: [lint, test, security]
lint:
stage: lint
image: node:20
script:
- npm ci
- npx eslint . --max-warnings 0
- npx prettier --check .
unit:
stage: test
image: node:20
script:
- npm ci
- npx tsc --noEmit
- npx jest --ci --coverage
coverage: '/All files.*\s(\d+\.\d+)%/'
sast:
stage: security
extends: .sast # enable built-in GitLab SAST templatesWhat to measure (and what to ignore)
I’ve seen teams chase a “Sonar quality gate” score for a quarter and still ship outages. Focus on signal that moves business outcomes.
Must have: tests pass, coverage floor and no regression on diffs, zero lint errors, type-check OK, SAST/SCA clean for high/critical.
Nice to have: cyclomatic complexity within team-defined bounds, bundle size budget on web,
docker scanfor base images.Ignore (at first): generic “code smells,” magic maintainability numbers, automated architecture diagrams. They distract from blocking the obvious foot-guns.
If you use SonarCloud, wire it in but keep rules lean:
# sonar-project.properties
sonar.organization=your-org
sonar.projectKey=your-repo
sonar.sources=src
sonar.tests=src
sonar.test.inclusions=**/*.test.ts
sonar.javascript.lcov.reportPaths=coverage/lcov.info
sonar.qualitygate.wait=trueBefore/After: the boring gates that changed the curve
Real example: 80-engineer SaaS, monorepo (Node/TS + Go services). Baseline was “lint optional,” no coverage gates, manual dependency bumps quarterly. We paved the road (above configs) and ratcheted thresholds over 6 weeks.
Cost (Month 1): ~2 engineer-days to wire gates in monorepo + 1 day per laggard service; +5–7 minutes CI time per PR; GitHub Advanced Security license for 80 seats; minor flaky test quarantine work.
Outcomes (Quarter 1):
- Escaped defect rate down 38% (pager duty incidents attributable to regressions).
- Mean PR cycle time up +11 minutes median (because checks run), but overall lead time to prod down 22% thanks to fewer hotfixes and retries.
- MTTR improved 28%—failing tests reproduced issues fast.
- Security vulns time-to-fix dropped from “quarterly batch” to <48 hours via Dependabot/automerge for patch/minor.
- Engineers reported fewer “review bikesheds.” Code style debates vanished because bots refused noncompliance.
Smaller shop, Python backend (FastAPI). They feared gates would slow them down. We started at 60% coverage floor and diff-only checks. Two sprints later they bumped to 75% without drama. One Friday regression that would’ve slipped got caught by mypy on a bad Optional[str] path. That single catch paid for the setup.
Paved road stack by ecosystem (copy/paste)
Pick boring defaults. Exceptions require a written reason and an owner.
TypeScript/Node:
jest+ts-jest,eslint(typescript, security plugins),prettier,tsc --noEmit, CodeQL JS/TS, Dependabot. Optional:knipfor dead code,npm audit --productionin release.Python:
pytest --maxfail=1 --disable-warnings,coverage.py,ruff(lint+format),mypy, CodeQL Python,pip-auditorsafety.Go:
go test ./... -race -coverprofile=cover.out,golangci-lint,go vet,gosec, Dependabot or Renovate. Example gate:
# Go: fail if coverage < 75
COVER=$(go tool cover -func=cover.out | awk '/total:/ {print $3}' | sed 's/%//')
[ "${COVER%.*}" -ge 75 ] || { echo "Coverage $COVER < 75"; exit 1; }Rust:
cargo test,cargo clippy -D warnings,cargo fmt --check,cargo audit.CI patterns: parallelize lint/test/scan; cache dependencies; post PR comments with top failures; optional nightly full scans. Keep total PR check time under 10 minutes. If you blow past that, fix tests or buy bigger runners before you add new gates.
Rollout without mutiny (legacy codebase survival guide)
I’ve rolled this out in codebases old enough to rent cars. It’s doable without halting delivery.
Baseline, then ratchet. Set coverage floor to the current reality (even 0 if you must) but fail on decreased diff coverage. Bump the floor 5% every sprint until you hit your target (usually 80%).
Quarantine flaky tests. Auto-tag flakes, move them to a quarantine job that’s non-blocking. Make de-flaking part of sprint planning. A single flaky test trains the org to ignore red builds.
Stage gates. Week 1: lint/format and type-check blocking. Week 2: unit tests/coverage. Week 3: SAST/SCA. Week 4: performance budgets or complexity checks if needed.
Make it easy locally. One
make checkmirrors CI. Pre-commit hooks are nice, but the server is the source of truth.No bespoke dashboards. Use the platform’s PR checks and repo badges. If you must report up, export metrics to
Prometheusand slap a Grafana board next to your SLOs.Codify ownership. Each gate has an owner team. If it’s noisy or flaky, they fix it or it becomes non-blocking until repaired. No orphaned gates.
AI-specific. Ban “paste from ChatGPT” without tests. Gate PRs with required test plan sections in the template. We’ve added a lightweight Danger rule to block PRs that add code without tests in the same diff.
# .github/pull_request_template.md
## Risk / Test Plan
- [ ] Unit tests added/updated
- [ ] Impacts feature flags? Canary? Rollback plan?What I’d do differently after 20 years
Start with platform-native features. I’ve watched teams lose quarters building custom quality dashboards that nobody trusts.
Keep the gate list short and merciless. More rules != more quality. The right few, enforced, beat a wall of warnings.
Avoid “advisory” gates. If it’s not blocking, it will be ignored. If it’s too noisy to block, fix the noise or delete it.
Put exceptions behind change management. If a director wants to bypass a gate, tie it to a rollback plan and an explicit risk acceptance.
Measure outcomes quarterly and prune. If a gate doesn’t move escaped defects, MTTR, or PR cycle time, it’s cargo cult.
Stop debating code style in PRs. Make the bot reject it so humans can talk about risk, complexity, and business value.
Key takeaways
- Make the paved road boring and mandatory: tests, coverage, lint/format, type-check, SCA, and basic SAST.
- Block merges by default; ratchet quality on diffs so legacy systems can onboard without riots.
- Prefer native platform features (GitHub branch protection, CodeQL, Dependabot) before reaching for Sonar or custom scripts.
- Optimize for lead time and signal: short, parallel checks with fast feedback and a clear owner per gate.
- Track business outcomes: escaped defects, MTTR, PR cycle time—not vanity metrics like generic “code quality score.”
Implementation checklist
- Turn on branch protection with required checks for your main branches.
- Add a single `make check` (or `task`) target that runs test, coverage, lint, format, type-check consistently.
- Block on SAST and SCA with platform defaults (CodeQL, Dependabot/GitLab Dependency Scanning).
- Enforce `--max-warnings 0` and `--check` formatting; stop debating style in PRs.
- Use a ratchet: fail on decreased diff coverage; set a repo-level minimum your team can actually pass today.
- Quarantine flaky tests and auto-deflake before they train devs to ignore red builds.
- Publish results in comments and dashboards; don’t ship slideware. Codify it in the repo (GitOps).
Questions we hear from teams
- What coverage threshold should we start with?
- Pick a floor you can pass today (even 0 in a legacy repo), then ratchet +5% every sprint until you hit your target (80% is typical for business logic; lower is fine for glue code). Always fail on decreased diff coverage so you don’t backslide.
- Aren’t quality gates going to slow delivery?
- You’ll add 5–10 minutes of CI time, yes. But teams consistently report fewer hotfixes and lower MTTR, which reduces overall lead time. The goal is predictable, safe delivery, not raw PR merge speed.
- Do we need SonarQube to have a quality gate?
- No. Start with platform-native checks (lint/format/type, tests/coverage, CodeQL, Dependabot/GitLab SCA) and branch protection. Add SonarCloud/SonarQube later if you need deeper rules—keep rules lean to avoid noise.
- How do we handle AI-generated (vibe) code safely?
- Treat AI like a junior dev on turbo: require tests in the same PR, run the same gates, and add a PR template checkbox for risk/test plans. We’ve rescued teams buried by AI “vibe coding” by enforcing these gates and doing targeted refactors where the bot overfit copy/paste.
- We have a monorepo. Does this still apply?
- Yes. Scope checks by path (e.g., only run Go jobs when `go/` changes) and use a single `make check` per package. Keep jobs parallel and cache dependencies. The rules are the same; orchestration just matters more.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
