The Code Review Bot That Didn’t Kill Velocity (and Still Caught the Bug)
Automate the boring, standardize the gates, and keep humans focused on architecture and risk—without turning every PR into a week-long compliance ritual.
If your “quality gates” take 40 minutes, developers will eventually treat them like CAPTCHA: something to click through, not a safety system.Back to all posts
The PR That Looked Fine Until Prod Fell Over
I’ve watched this movie more times than I can count. The PR was “clean”: green checks, two approvals, comments about naming… and then it quietly shipped a behavior change in an edge path. Two hours later, on-call is staring at a Grafana panel that looks like a ski slope.
The postmortem always has the same smell:
- Review focused on style and preferences, not risk.
- Automated checks were either missing (no test coverage) or so slow/noisy that everyone learned to ignore them.
- Someone added a “helpful” bespoke bot months ago, it broke, and now nobody owns it.
When GitPlumbers gets called, it’s rarely “we need more process.” It’s “we need a review system that catches dumb mistakes automatically and frees humans to do real engineering—without slowing delivery.”
What Automation Should (and Shouldn’t) Do
Here’s what actually works in practice: automate objective, repeatable checks and keep humans on judgment calls.
Automate this (high signal, low debate):
format(Prettier,gofmt,terraform fmt) so you stop bikeshedding.lintfor real bugs (eslintrules that catch unsafe async,golangci-lintforerrcheck,staticcheck).unit/integrationtests that run fast and don’t flake.- Build/package sanity (
tsc --noEmit,go test ./..., container build). - Dependency + basic security scanning (Renovate + CodeQL/Semgrep baseline).
- Ownership enforcement (
CODEOWNERS) so the right people see risky areas.
Don’t automate this as a gate (or you’ll create quality theater):
- “Complexity scores” that block merges (they become gaming exercises).
- Sentiment-based “AI reviewer” comments on every PR (noise machine).
- Mandatory design checklists for trivial changes.
The point is not to catch every defect. The point is to shift-left the boring failures and preserve reviewer attention for the stuff automation can’t judge: domain correctness, backwards compatibility, and operational risk.
Paved-Road Defaults: The 80/20 Review Gate
If you want quality without killing throughput, you need a small, consistent set of gates. Same checks for most repos, with a documented exception path.
A solid “paved road” baseline looks like:
- PR template that forces risk disclosure.
- Branch protection requiring:
- 1–2 approvals
- no stale reviews (dismiss on new commits)
- required checks: format/lint/test/build + security scan
- CODEOWNERS for hot paths: auth, billing, data migrations, Terraform.
- Fast CI: p95 completion < 10 minutes.
Here’s a PR template that nudges reviewers toward the right questions:
<!-- .github/pull_request_template.md -->
## What changed?
## Why?
## Risk & rollout
- [ ] Behind a feature flag
- [ ] DB migration
- [ ] Backward compatible change
- [ ] Requires config/secret changes
## Observability
- [ ] Metrics updated
- [ ] Logs updated
- [ ] Alerts/SLO impact considered
## Rollback plan
## Screenshots / traces (if relevant)This is boring—and that’s the point. It consistently surfaces “this needs a careful read” without inventing a new process per team.
Concrete Before/After: From Review Purgatory to Fast Signal
A real pattern we see when teams “add automation” the wrong way:
Before (common failure mode):
- 14 required checks, many redundant.
- p95 CI time: 38 minutes.
- Flaky integration suite blocks merges ~10% of the time.
- Review comments: mostly style, “can you rename this,” or AI-generated nitpicks.
- Median time-to-merge for normal PRs: 2.1 days.
People cope by:
- Making PRs bigger (“if I’m waiting anyway…”).
- Batch-merging risky changes late Friday (don’t do this).
- Clicking “re-run all jobs” until green.
After (paved road, tuned for throughput):
- 5 required checks that finish fast.
- p95 CI time: 8.5 minutes.
- Flaky tests quarantined (non-blocking) while fixed.
- Review focus shifts to: API contracts, migrations, failure modes.
- Median time-to-merge for normal PRs: 6.3 hours.
The “quality” win wasn’t more gates—it was better signal and less waiting. MTTR improved too, because changes shipped smaller and earlier, with clearer rollback notes.
A Minimal GitHub Actions Pipeline That Doesn’t Hate You
Favor what GitHub/GitLab already does well. Avoid building a custom review bot when branch protections + required checks cover 80%.
Here’s a baseline GitHub Actions workflow that’s fast, parallel, and boring:
# .github/workflows/pr-gates.yml
name: pr-gates
on:
pull_request:
branches: [ main ]
concurrency:
group: pr-${{ github.event.pull_request.number }}
cancel-in-progress: true
jobs:
format_lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm run format:check
- run: npm run lint
unit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm test -- --ci
security:
runs-on: ubuntu-latest
permissions:
security-events: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: github/codeql-action/init@v3
with:
languages: javascript
- uses: github/codeql-action/analyze@v3And the CODEOWNERS file that makes sure changes hit the right eyeballs:
# .github/CODEOWNERS
# High-risk areas
/apps/payments/ @fintech-senior-team
/apps/auth/ @identity-platform
/infra/terraform/ @platform-sre
/db/migrations/ @data-platformBranch protection then becomes the “automation wiring harness.” No custom glue code. In GitHub terms:
- Require status checks:
format_lint,unit,security - Require pull request reviews: 1–2
- Dismiss stale approvals on new commits
- Require review from Code Owners
That’s most of the value right there.
Keeping It Fast: The Part Everyone Forgets
I’ve seen “quality gates” die because leadership mandated them, and developers quietly built workarounds. Speed is not a nice-to-have; it’s the condition for adoption.
Here’s what actually keeps automation from becoming a tax:
- Path filtering: don’t run Terraform checks for README edits.
on:
pull_request:
paths:
- 'apps/**'
- 'infra/**'- Cache dependencies (
actions/setup-nodecache,GOMODCACHE, Gradle cache). Easy win. - Split slow suites: make unit tests blocking, keep heavyweight e2e non-blocking until stabilized.
- Quarantine flakes: if a test flakes, treat it like a prod incident. Track it, fix it, then re-promote it to blocking.
- Use a merge queue / merge train when main becomes a traffic jam.
- GitHub Merge Queue prevents “green on PR, red on main” failures.
- GitLab Merge Trains do the same.
The leadership move here is to set a CI performance SLO, like:
- p95 PR checks < 10 minutes
- flake rate < 2% per week
If you can’t meet that, don’t add more gates. Fix the pipeline first.
Where Bespoke Tooling Is Worth It (And How to Not Regret It)
Sometimes you do need a custom check. The trick is to earn it.
Bespoke is justified when:
- You have a real, recurring defect class (example: “we keep breaking protobuf compatibility” or “we ship DB migrations without rollback”).
- The check is deterministic and fast.
- The output is actionable, not nagging.
Example: a simple migration safety check that blocks destructive SQL unless there’s an explicit override label.
# pseudo-check example
# fail if a migration contains DROP COLUMN and PR lacks label "breaking-migration"
if grep -R "DROP COLUMN" db/migrations && ! gh pr view --json labels | jq -e '.labels[].name == "breaking-migration"'; then
echo "Destructive migration detected. Add label breaking-migration and include rollback plan."
exit 1
fiBut here’s what I’ve seen fail: teams build an internal “review platform” with a DSL, a rules engine, and a dashboard. Six months later, it’s unmaintained, CI is slower, and everybody hates it.
A rule of thumb we use at GitPlumbers: if a check can’t be expressed as a normal CI job + branch protection within a day, you probably don’t need it as a gate.
The Goal: Humans Reviewing Design, Not Whitespace
The best code review automation doesn’t feel like bureaucracy. It feels like a paved road:
- The bot handles formatting, lint, tests, and basic security.
- Reviewers spend their time on:
- API and schema compatibility
- failure modes and retries (
circuit breaker, timeouts) - observability (metrics/logging)
- rollout strategy (canary deployment, feature flags)
- blast radius
If your team is drowning in AI-generated code or “vibe coding” PRs, this baseline becomes even more important. Automation won’t make the code good, but it will stop the bleeding: it enforces consistency, catches obvious bugs, and forces risk disclosure so seniors can focus where it matters.
If you want a second set of eyes on your current gates, GitPlumbers typically starts with a 1–2 week “PR pipeline triage”: we measure CI p95, flake rate, rule noise, and cycle time, then simplify the gates to a paved-road default that teams actually use.
Key takeaways
- Keep automation **fast and predictable** (<10 minutes) or developers will route around it.
- Use **native platform controls** (branch protection, required checks, CODEOWNERS) before inventing bots.
- Automate **objective checks**; keep humans on **risk, design, and business behavior**.
- Treat noisy rules as production incidents: measure false positives, fix or delete.
- Prefer a paved-road pipeline with path-based optimizations over per-team bespoke workflows.
Implementation checklist
- Define 4–6 required checks: `format`, `lint`, `unit`, `build`, `dependency/security scan`, `CODEOWNERS` review.
- Set an explicit pipeline performance SLO (example: p95 PR checks < 10 minutes).
- Make branch protection require: status checks + 1–2 approvals + no stale reviews + linear history (optional).
- Add a PR template that forces risk notes: migrations, flags, rollback, observability.
- Use path filters to avoid running Terraform checks on frontend-only changes.
- Introduce a merge queue (GitHub) or merge trains (GitLab) once main starts flaking.
- Review and prune rules monthly; delete checks that aren’t catching real defects.
Questions we hear from teams
- Should we require two approvals for every PR?
- Not by default. Two approvals everywhere is a blunt instrument that slows delivery and doesn’t guarantee quality. Use 1 approval for low-risk areas, and rely on `CODEOWNERS` + targeted protections for high-risk code paths (auth, billing, infra, migrations).
- Is CodeQL/Semgrep worth it if it adds minutes to CI?
- Yes—if you keep it scoped and stable. Start with a baseline ruleset and run it as a required check only if it completes consistently and produces actionable findings. If it’s noisy, make it informational while you tune it.
- How do we handle flaky tests without lowering quality?
- Quarantine flakey tests as non-blocking with visible reporting, create a short SLA to fix them, and re-promote them to blocking. Treat flake rate like an availability metric; otherwise CI becomes a slot machine.
- What about AI reviewers or LLM-based PR comments?
- Use them off the critical path. LLM feedback can be useful for summarization or suggesting test cases, but as a required gate it tends to produce noise and false confidence. Keep your merge gate deterministic.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
