Code Review Automation That Doesn’t Grind Delivery to a Halt

Prune the bespoke bots. Standardize on a paved road that keeps PRs under 10 minutes end‑to‑end, while raising the floor on quality.

Fast by default, strict where it counts. If a check doesn’t change a merge decision on 95% of PRs, take it off the critical path.
Back to all posts

The PR pileup you’ve lived through

You ship a feature Friday. By Monday, the pull requests are stuck in the same purgatory: slow CI, flaky UI tests, two missing approvals because the only code owner is on PTO, and another round of “nit: rename” comments. Lead time: 3.2 days. Everyone’s frustrated, your main branch is a crime scene of broken builds, and product’s asking why everything takes forever.

I’ve seen this movie at SaaS unicorns and banks alike. The pattern is always the same: bespoke bots, heavyweight scans on every PR, no path-aware defaults, and humans doing what robots should. The fix isn’t more tools—it’s a paved road that’s fast by default and strict where it counts.

Principles: fast by default, strict where it counts

Here’s what actually works when you’re optimizing for quality and speed:

  • Paved road over bespoke: Prefer native platform features (GitHub Actions, CODEOWNERS, merge queue) and well‑supported tools (pre-commit, reviewdog, Semgrep) over custom bots.
  • Shift left, keep it cheap: Run formatting, lint, and basic tests locally and in a sub‑5‑minute CI lane. Save heavy scanning for when paths warrant it.
  • Path-aware checks: Don’t run Terraform checks when only docs changed. Use paths filters and labels to right‑size the pipeline.
  • Automate the nits: Bots comment on style, size, missing tests. Humans focus on architecture, risk, context, and product fit.
  • Keep main green: Use merge queues and required checks. No YOLO merges, no “fix forward” roulette.
  • Measure and prune: Track PR cycle time, flake rate, rework, and change failure rate. Kill checks that don’t pay their rent.

If you need a rallying cry: small PRs, fast checks, protected main, humans on the hard stuff.

The paved road: minimal viable guardrails

This is the opinionated setup we roll out at GitPlumbers in under a week. It’s intentionally boring. It works.

  • Local fast feedback via pre-commit to catch trivia before CI.
  • GitHub Actions split into fast checks (always) and deep checks (path‑gated).
  • reviewdog to surface issues inline on PRs (no spelunking logs).
  • CODEOWNERS with targeted review requirements.
  • Merge queue and branch protection to serialize merges and keep main green.
  • Secret scanning and Semgrep tuned to high‑signal rules.

Example pre-commit config:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.6.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-merge-conflict
  - repo: https://github.com/psf/black
    rev: 24.8.0
    hooks:
      - id: black
  - repo: https://github.com/charliermarsh/ruff-pre-commit
    rev: v0.6.9
    hooks:
      - id: ruff
  - repo: https://github.com/pre-commit/mirrors-eslint
    rev: v9.13.0
    hooks:
      - id: eslint
  - repo: https://github.com/zricethezav/gitleaks
    rev: v8.18.2
    hooks:
      - id: gitleaks

Fast path‑aware CI with annotations:

# .github/workflows/pr-fast.yml
name: pr-fast
on:
  pull_request:
    types: [opened, synchronize, reopened]
    paths-ignore:
      - "**/*.md"
      - "docs/**"

jobs:
  fast-checks:
    runs-on: ubuntu-latest
    concurrency:
      group: pr-${{ github.event.pull_request.number }}-fast
      cancel-in-progress: true
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - uses: actions/setup-python@v5
        with: { python-version: '3.12' }
      - name: Cache
        uses: actions/cache@v4
        with:
          path: |
            ~/.cache/pip
            ~/.npm
          key: ${{ runner.os }}-deps-${{ hashFiles('**/package-lock.json', '**/requirements.txt') }}
      - name: Pre-commit
        uses: pre-commit/action@v3.0.1
      - name: ESLint via reviewdog
        run: |
          npm ci --ignore-scripts
          npx eslint . -f json -o eslint.json || true
          curl -sL https://raw.githubusercontent.com/reviewdog/reviewdog/master/install.sh | sh -s -- -b ./bin
          cat eslint.json | ./bin/reviewdog -f=eslint -name=eslint -reporter=github-pr-review -fail-on-error=true
        env:
          REVIEWDOG_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      - name: Ruff via reviewdog
        run: |
          pip install ruff
          ruff . --output-format=github

Deep checks only when relevant paths change:

# .github/workflows/pr-deep.yml
name: pr-deep
on:
  pull_request:
    paths:
      - "infrastructure/**"
      - "Dockerfile"
      - "src/**"

jobs:
  infra:
    if: contains(join(github.event.pull_request.changed_files, ','), 'infrastructure/')
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Infra linters
        run: |
          curl -sSfL https://raw.githubusercontent.com/terraform-linters/tflint/master/install_linux.sh | bash
          pip install checkov
          tflint --recursive
          checkov -d infrastructure/
  containers:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Dockerfile lint
        run: |
          docker run --rm -i hadolint/hadolint < Dockerfile
  semgrep:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: returntocorp/semgrep-action@v1
        with:
          config: p/ci # curated ruleset; tune for signal

Targeted ownership:

# CODEOWNERS
/infrastructure/      @platform-team
/src/payments/        @payments-owners
/src/auth/            @security-eng
/docs/                @docs-wg

Enable merge queue and branch protection with gh:

# Require 1 approval, code owners, and status checks; enable merge queue
REPO="org/repo"
MAIN="main"

# Protect main
gh api \
  -X PUT repos/$REPO/branches/$MAIN/protection \
  -f required_pull_request_reviews.required_approving_review_count=1 \
  -F required_pull_request_reviews.require_code_owner_reviews=true \
  -F enforce_admins=true \
  -F required_status_checks.strict=true \
  -F required_status_checks.contexts='["pr-fast", "pr-deep"]'

# Merge queue
gh api -X PUT repos/$REPO/branches/$MAIN/queue --silent || true

Use the platform’s features. I’ve ripped out a dozen homegrown “PR gatekeeper” bots because GitHub’s native merge queue and annotations did the job better.

Before/after: the team we rescued last quarter

A payments team came to us with a monorepo and vibes—lots of AI‑generated code, no paved road. Their numbers:

  • PR cycle time (open to merge): 3.2 days (p90: 7.1 days)
  • Flaky test re-runs per PR: 2.3
  • Change failure rate (rollbacks/hotfixes): 18%
  • Human comments mostly nits; senior devs stuck in the queue

What we changed in 10 days:

  1. Installed pre-commit and enforced via CI; blocked on formatting only.
  2. Split CI into pr-fast (< 5 minutes) and pr-deep (path‑gated).
  3. Replaced custom reviewer bot with CODEOWNERS and GitHub merge queue.
  4. Added reviewdog + Danger to auto‑comment on size, missing tests, and risky patterns.
  5. Quarantined flaky E2E (Selenium) suite; ran it nightly with a canary subset on PRs.
  6. Tuned Semgrep to a slim, high‑signal rulepack; moved heavy SAST to nightly.

Their after state (30 days later):

  • PR cycle time: 11.4 hours (p90: 22.8 hours)
  • Fast checks: ~4m 20s median; deep checks only on 32% of PRs
  • Flaky test re-runs: 0.4 (quarantined + retries + testcontainers)
  • Change failure rate: 6%
  • Senior dev review time spent on architecture and risk, not whitespace

Automation didn’t slow them down—it removed the sludge.

Right‑size checks to the PR shape

Run the right work at the right time. A few practical patterns:

  • Size labels to gate deeper scrutiny on mega PRs:
# .github/workflows/size-labeler.yml
name: size-labeler
on: [pull_request]
jobs:
  label:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/labeler@v5
        with:
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          configuration-path: .github/labels.yml
# .github/labels.yml
size/XS: ["*"]                # < 50 LOC (default)
size/S:  ["*"]                # 50–200 LOC
size/M:  ["*"]                # 200–400 LOC
size/L:  ["*"]                # 400–800 LOC
size/XL: ["*"]                # > 800 LOC
  • Dangerfile to warn/fail on anti‑patterns:
# Dangerfile
big_pr = git.lines_of_code > 400
warn("Big PR (#{git.lines_of_code} LOC). Consider splitting.") if big_pr

# Require tests when src changes
if git.modified_files.any? { |f| f.start_with?('src/') } &&
   git.modified_files.none? { |f| f =~ /test|spec/ }
  fail('Code changed without tests. Add or explain.')
end

# Block known bad patterns (vibe code from AI)
fail('Possible insecure eval().') if git.diff.patch =~ /eval\(/
  • Path filters in workflows to avoid global runs. Keep heavy SAST/DAST nightly, surface findings as PR annotations only when touched.
  • Retries + quarantine for flaky E2E; fail on new flakes only.

The rule of thumb: if a check doesn’t change a merge decision on 95% of PRs, move it off the critical path.

Let humans review where humans add value

Bots catch trivia. Humans catch context.

  • CODEOWNERS for risk areas only (auth, payments, infra). Don’t require owners for docs or demo apps.
  • Review SLO: 90% of PRs get a human review within 4 working hours. Set a rotation. Use Slack reminders.
  • Ban nitpicks: Style is automated. Humans focus on naming, architecture, risk, and trade‑offs.
  • Checklists for reviewers:
    • Does this change increase blast radius?
    • Is there an operational plan (metrics, feature flag, rollback)?
    • Are SLOs/SLO budgets considered?
    • Any data privacy/PII concerns?
  • Merge queue to serialize and rebase; fewer “works on my machine” merges.

When AI‑generated code sneaks in, have humans scrutinize boundaries: auth, money flows, data handling. We’ve done a lot of vibe code cleanup—automation can flag, but judgment keeps you safe.

Measure, iterate, and resist bespoke creep

Track these like a product manager tracks conversion:

  • PR cycle time (open → merge) and its components (waiting for review, waiting for CI).
  • Rework rate (PRs with follow‑up fixes in 7 days).
  • Flake rate (failed jobs that pass on retry).
  • Change failure rate and MTTR.

Instrument with the platform you have—GitHub Insights, simple gh queries, or a spreadsheet if that’s where you’re at:

# PR cycle time (last 30 days, merged)
gh pr list --state merged --search "merged:>30d" --json number,createdAt,mergedAt \
  | jq '.[] | (.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)' \
  | datamash mean 1 median 1 p90 1

Every quarter, prune. If a check rarely changes an outcome or is too flaky, fix it or move it out of the critical path. Resist the siren song of writing your own mega‑bot. Compose small, boring tools that each do one thing well.

If you’re drowning in AI‑generated code, build a short‑term lane for AI code refactoring and code rescue: Semgrep rules for insecure patterns, secret scanning, and a temporary two‑approver rule on high‑risk subsystems. Then remove the training wheels.

structuredSections:[{

Related Resources

Key takeaways

  • Default to a paved road: standard tools, minimal configs, and path-aware checks.
  • Front‑load cheap checks locally and in fast CI; reserve heavy scans for riskier changes.
  • Use branch protection + merge queues to keep main green without human babysitting.
  • Automate the boring review comments; reserve humans for architecture, risk, and context.
  • Measure PR cycle time, rework rate, flake rate, and change failure rate—then prune or tune.
  • Avoid bespoke bots; compose native capabilities (Actions, annotations, CODEOWNERS, labels).

Implementation checklist

  • Set a review SLO (e.g., 90% of PRs merged < 24h; checks < 10m).
  • Adopt trunk-based development with small PRs (< 300 LOC changed).
  • Install pre-commit and enforce in CI for fast feedback.
  • Create CODEOWNERS for high‑risk paths; require code owner review only where needed.
  • Wire fast path-aware CI (linters/tests) and defer heavy scanning for risky changes.
  • Enable merge queue and branch protection; require status checks to pass.
  • Automate size labels and bot comments; ban nitpicks from human review.
  • Track PR cycle time, flaky test rate, rework rate; adjust quarterly.

Questions we hear from teams

Won’t automation replace real code review?
No. Automation should remove trivial feedback (style, missing tests, obvious bugs) so humans can spend their limited attention on architecture, risk, naming, and alignment. Think of bots as junior reviewers who handle the easy stuff consistently, 24/7.
How do we handle a monorepo without slowing everything down?
Use path filters and labels aggressively. Split workflows into fast and deep lanes. Run language/stack‑specific checks only on touched directories. CODEOWNERS should mirror your module boundaries, and merge queues should rebase/serialize to keep main green.
What about regulated environments (PCI, HIPAA, SOX)?
Codify policy as code: require code owner reviews on sensitive paths, retain audit logs via GitHub, and run tuned Semgrep and secret scanning. Heavy SAST/DAST can run nightly with PR annotations when relevant paths change. We’ve passed audits with this approach at fintechs.
Our UI tests are flaky. How do we avoid blocking merges?
Quarantine known flakes, run a canary subset on PRs, and move the rest to nightly with retries and failure triage. Fail PRs only on new flakes. Consider Playwright + Testcontainers over legacy Selenium if possible; it’s been a solid reliability upgrade for teams we’ve helped.
We have lots of AI‑generated code. Any special guardrails?
Yes: add Semgrep rules for insecure patterns, enforce secret scanning (`gitleaks` or GHAS), and require tests for code changes via Danger. Temporarily require two approvals on auth/payments/data modules. Then run a focused vibe code cleanup sprint to stabilize hotspots.
Do we need SonarQube/Codecov/Everything?
Only if they pay their rent. Start with fast linters, reviewdog annotations, and coverage thresholds that track delta coverage (Codecov’s patch coverage is useful). Add SonarQube when you need long‑term code health trends, not as a gate on every PR.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to GitPlumbers about paving your code review road See how we cut a team’s PR cycle time from days to hours

Related resources