Code Review Automation That Doesn’t Grind Delivery to a Halt
Prune the bespoke bots. Standardize on a paved road that keeps PRs under 10 minutes end‑to‑end, while raising the floor on quality.
Fast by default, strict where it counts. If a check doesn’t change a merge decision on 95% of PRs, take it off the critical path.Back to all posts
The PR pileup you’ve lived through
You ship a feature Friday. By Monday, the pull requests are stuck in the same purgatory: slow CI, flaky UI tests, two missing approvals because the only code owner is on PTO, and another round of “nit: rename” comments. Lead time: 3.2 days. Everyone’s frustrated, your main branch is a crime scene of broken builds, and product’s asking why everything takes forever.
I’ve seen this movie at SaaS unicorns and banks alike. The pattern is always the same: bespoke bots, heavyweight scans on every PR, no path-aware defaults, and humans doing what robots should. The fix isn’t more tools—it’s a paved road that’s fast by default and strict where it counts.
Principles: fast by default, strict where it counts
Here’s what actually works when you’re optimizing for quality and speed:
- Paved road over bespoke: Prefer native platform features (GitHub Actions, CODEOWNERS, merge queue) and well‑supported tools (
pre-commit,reviewdog,Semgrep) over custom bots. - Shift left, keep it cheap: Run formatting, lint, and basic tests locally and in a sub‑5‑minute CI lane. Save heavy scanning for when paths warrant it.
- Path-aware checks: Don’t run Terraform checks when only docs changed. Use
pathsfilters and labels to right‑size the pipeline. - Automate the nits: Bots comment on style, size, missing tests. Humans focus on architecture, risk, context, and product fit.
- Keep main green: Use merge queues and required checks. No YOLO merges, no “fix forward” roulette.
- Measure and prune: Track PR cycle time, flake rate, rework, and change failure rate. Kill checks that don’t pay their rent.
If you need a rallying cry: small PRs, fast checks, protected main, humans on the hard stuff.
The paved road: minimal viable guardrails
This is the opinionated setup we roll out at GitPlumbers in under a week. It’s intentionally boring. It works.
- Local fast feedback via
pre-committo catch trivia before CI. - GitHub Actions split into fast checks (always) and deep checks (path‑gated).
- reviewdog to surface issues inline on PRs (no spelunking logs).
- CODEOWNERS with targeted review requirements.
- Merge queue and branch protection to serialize merges and keep
maingreen. - Secret scanning and Semgrep tuned to high‑signal rules.
Example pre-commit config:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-merge-conflict
- repo: https://github.com/psf/black
rev: 24.8.0
hooks:
- id: black
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
- repo: https://github.com/pre-commit/mirrors-eslint
rev: v9.13.0
hooks:
- id: eslint
- repo: https://github.com/zricethezav/gitleaks
rev: v8.18.2
hooks:
- id: gitleaksFast path‑aware CI with annotations:
# .github/workflows/pr-fast.yml
name: pr-fast
on:
pull_request:
types: [opened, synchronize, reopened]
paths-ignore:
- "**/*.md"
- "docs/**"
jobs:
fast-checks:
runs-on: ubuntu-latest
concurrency:
group: pr-${{ github.event.pull_request.number }}-fast
cancel-in-progress: true
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
- uses: actions/setup-python@v5
with: { python-version: '3.12' }
- name: Cache
uses: actions/cache@v4
with:
path: |
~/.cache/pip
~/.npm
key: ${{ runner.os }}-deps-${{ hashFiles('**/package-lock.json', '**/requirements.txt') }}
- name: Pre-commit
uses: pre-commit/action@v3.0.1
- name: ESLint via reviewdog
run: |
npm ci --ignore-scripts
npx eslint . -f json -o eslint.json || true
curl -sL https://raw.githubusercontent.com/reviewdog/reviewdog/master/install.sh | sh -s -- -b ./bin
cat eslint.json | ./bin/reviewdog -f=eslint -name=eslint -reporter=github-pr-review -fail-on-error=true
env:
REVIEWDOG_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Ruff via reviewdog
run: |
pip install ruff
ruff . --output-format=githubDeep checks only when relevant paths change:
# .github/workflows/pr-deep.yml
name: pr-deep
on:
pull_request:
paths:
- "infrastructure/**"
- "Dockerfile"
- "src/**"
jobs:
infra:
if: contains(join(github.event.pull_request.changed_files, ','), 'infrastructure/')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Infra linters
run: |
curl -sSfL https://raw.githubusercontent.com/terraform-linters/tflint/master/install_linux.sh | bash
pip install checkov
tflint --recursive
checkov -d infrastructure/
containers:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Dockerfile lint
run: |
docker run --rm -i hadolint/hadolint < Dockerfile
semgrep:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: returntocorp/semgrep-action@v1
with:
config: p/ci # curated ruleset; tune for signalTargeted ownership:
# CODEOWNERS
/infrastructure/ @platform-team
/src/payments/ @payments-owners
/src/auth/ @security-eng
/docs/ @docs-wgEnable merge queue and branch protection with gh:
# Require 1 approval, code owners, and status checks; enable merge queue
REPO="org/repo"
MAIN="main"
# Protect main
gh api \
-X PUT repos/$REPO/branches/$MAIN/protection \
-f required_pull_request_reviews.required_approving_review_count=1 \
-F required_pull_request_reviews.require_code_owner_reviews=true \
-F enforce_admins=true \
-F required_status_checks.strict=true \
-F required_status_checks.contexts='["pr-fast", "pr-deep"]'
# Merge queue
gh api -X PUT repos/$REPO/branches/$MAIN/queue --silent || trueUse the platform’s features. I’ve ripped out a dozen homegrown “PR gatekeeper” bots because GitHub’s native merge queue and annotations did the job better.
Before/after: the team we rescued last quarter
A payments team came to us with a monorepo and vibes—lots of AI‑generated code, no paved road. Their numbers:
- PR cycle time (open to merge): 3.2 days (p90: 7.1 days)
- Flaky test re-runs per PR: 2.3
- Change failure rate (rollbacks/hotfixes): 18%
- Human comments mostly nits; senior devs stuck in the queue
What we changed in 10 days:
- Installed
pre-commitand enforced via CI; blocked on formatting only. - Split CI into
pr-fast(< 5 minutes) andpr-deep(path‑gated). - Replaced custom reviewer bot with
CODEOWNERSand GitHub merge queue. - Added
reviewdog+Dangerto auto‑comment on size, missing tests, and risky patterns. - Quarantined flaky E2E (Selenium) suite; ran it nightly with a canary subset on PRs.
- Tuned Semgrep to a slim, high‑signal rulepack; moved heavy SAST to nightly.
Their after state (30 days later):
- PR cycle time: 11.4 hours (p90: 22.8 hours)
- Fast checks: ~4m 20s median; deep checks only on 32% of PRs
- Flaky test re-runs: 0.4 (quarantined + retries + testcontainers)
- Change failure rate: 6%
- Senior dev review time spent on architecture and risk, not whitespace
Automation didn’t slow them down—it removed the sludge.
Right‑size checks to the PR shape
Run the right work at the right time. A few practical patterns:
- Size labels to gate deeper scrutiny on mega PRs:
# .github/workflows/size-labeler.yml
name: size-labeler
on: [pull_request]
jobs:
label:
runs-on: ubuntu-latest
steps:
- uses: actions/labeler@v5
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
configuration-path: .github/labels.yml# .github/labels.yml
size/XS: ["*"] # < 50 LOC (default)
size/S: ["*"] # 50–200 LOC
size/M: ["*"] # 200–400 LOC
size/L: ["*"] # 400–800 LOC
size/XL: ["*"] # > 800 LOC- Dangerfile to warn/fail on anti‑patterns:
# Dangerfile
big_pr = git.lines_of_code > 400
warn("Big PR (#{git.lines_of_code} LOC). Consider splitting.") if big_pr
# Require tests when src changes
if git.modified_files.any? { |f| f.start_with?('src/') } &&
git.modified_files.none? { |f| f =~ /test|spec/ }
fail('Code changed without tests. Add or explain.')
end
# Block known bad patterns (vibe code from AI)
fail('Possible insecure eval().') if git.diff.patch =~ /eval\(/- Path filters in workflows to avoid global runs. Keep heavy SAST/DAST nightly, surface findings as PR annotations only when touched.
- Retries + quarantine for flaky E2E; fail on new flakes only.
The rule of thumb: if a check doesn’t change a merge decision on 95% of PRs, move it off the critical path.
Let humans review where humans add value
Bots catch trivia. Humans catch context.
- CODEOWNERS for risk areas only (auth, payments, infra). Don’t require owners for docs or demo apps.
- Review SLO: 90% of PRs get a human review within 4 working hours. Set a rotation. Use Slack reminders.
- Ban nitpicks: Style is automated. Humans focus on naming, architecture, risk, and trade‑offs.
- Checklists for reviewers:
- Does this change increase blast radius?
- Is there an operational plan (metrics, feature flag, rollback)?
- Are SLOs/SLO budgets considered?
- Any data privacy/PII concerns?
- Merge queue to serialize and rebase; fewer “works on my machine” merges.
When AI‑generated code sneaks in, have humans scrutinize boundaries: auth, money flows, data handling. We’ve done a lot of vibe code cleanup—automation can flag, but judgment keeps you safe.
Measure, iterate, and resist bespoke creep
Track these like a product manager tracks conversion:
- PR cycle time (open → merge) and its components (waiting for review, waiting for CI).
- Rework rate (PRs with follow‑up fixes in 7 days).
- Flake rate (failed jobs that pass on retry).
- Change failure rate and MTTR.
Instrument with the platform you have—GitHub Insights, simple gh queries, or a spreadsheet if that’s where you’re at:
# PR cycle time (last 30 days, merged)
gh pr list --state merged --search "merged:>30d" --json number,createdAt,mergedAt \
| jq '.[] | (.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)' \
| datamash mean 1 median 1 p90 1Every quarter, prune. If a check rarely changes an outcome or is too flaky, fix it or move it out of the critical path. Resist the siren song of writing your own mega‑bot. Compose small, boring tools that each do one thing well.
If you’re drowning in AI‑generated code, build a short‑term lane for AI code refactoring and code rescue: Semgrep rules for insecure patterns, secret scanning, and a temporary two‑approver rule on high‑risk subsystems. Then remove the training wheels.
structuredSections:[{
Key takeaways
- Default to a paved road: standard tools, minimal configs, and path-aware checks.
- Front‑load cheap checks locally and in fast CI; reserve heavy scans for riskier changes.
- Use branch protection + merge queues to keep main green without human babysitting.
- Automate the boring review comments; reserve humans for architecture, risk, and context.
- Measure PR cycle time, rework rate, flake rate, and change failure rate—then prune or tune.
- Avoid bespoke bots; compose native capabilities (Actions, annotations, CODEOWNERS, labels).
Implementation checklist
- Set a review SLO (e.g., 90% of PRs merged < 24h; checks < 10m).
- Adopt trunk-based development with small PRs (< 300 LOC changed).
- Install pre-commit and enforce in CI for fast feedback.
- Create CODEOWNERS for high‑risk paths; require code owner review only where needed.
- Wire fast path-aware CI (linters/tests) and defer heavy scanning for risky changes.
- Enable merge queue and branch protection; require status checks to pass.
- Automate size labels and bot comments; ban nitpicks from human review.
- Track PR cycle time, flaky test rate, rework rate; adjust quarterly.
Questions we hear from teams
- Won’t automation replace real code review?
- No. Automation should remove trivial feedback (style, missing tests, obvious bugs) so humans can spend their limited attention on architecture, risk, naming, and alignment. Think of bots as junior reviewers who handle the easy stuff consistently, 24/7.
- How do we handle a monorepo without slowing everything down?
- Use path filters and labels aggressively. Split workflows into fast and deep lanes. Run language/stack‑specific checks only on touched directories. CODEOWNERS should mirror your module boundaries, and merge queues should rebase/serialize to keep main green.
- What about regulated environments (PCI, HIPAA, SOX)?
- Codify policy as code: require code owner reviews on sensitive paths, retain audit logs via GitHub, and run tuned Semgrep and secret scanning. Heavy SAST/DAST can run nightly with PR annotations when relevant paths change. We’ve passed audits with this approach at fintechs.
- Our UI tests are flaky. How do we avoid blocking merges?
- Quarantine known flakes, run a canary subset on PRs, and move the rest to nightly with retries and failure triage. Fail PRs only on new flakes. Consider Playwright + Testcontainers over legacy Selenium if possible; it’s been a solid reliability upgrade for teams we’ve helped.
- We have lots of AI‑generated code. Any special guardrails?
- Yes: add Semgrep rules for insecure patterns, enforce secret scanning (`gitleaks` or GHAS), and require tests for code changes via Danger. Temporarily require two approvals on auth/payments/data modules. Then run a focused vibe code cleanup sprint to stabilize hotspots.
- Do we need SonarQube/Codecov/Everything?
- Only if they pay their rent. Start with fast linters, reviewdog annotations, and coverage thresholds that track delta coverage (Codecov’s patch coverage is useful). Add SonarQube when you need long‑term code health trends, not as a gate on every PR.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
