What if our legacy repo makes small PRs impossible?

Start by carving seams: introduce module boundaries and file-by-file lints so changes can be isolated. Enforce small PRs in greenfield areas first, then refactor on a scheduled debt budget. We’ve split monoliths by adding adapter layers and stranglers before enforcing PR limits across the board.

We’re in multiple time zones. How do we hit 24h review SLAs?

Rotate PR Sheriffs across regions and use follow-the-sun handoffs. Automate Slack pings on aged PRs and set expectations that reviews top the queue after standup. A single overlap hour per region is usually enough.

Won’t two approvals slow us down?

Not if PRs are small. In our data, teams with small PRs and two approvals ship more frequently than one-approval teams with large PRs because review confidence is higher and rollbacks are rarer.

Can we do this on GitLab or Bitbucket?

Yes. Swap GitHub Actions with GitLab CI/CD or Bitbucket Pipelines, use their code owners/merge request approvals, and equivalent linters/review bots. The principles—small changes, enforced guardrails, clear SLAs—are tool-agnostic.

Culture · Oct 13, 2025 · 10 minute read

Remote-First Without Rotten PRs: Rituals, Leadership, and Metrics That Keep Code Clean

What actually works to keep code quality high when your engineers aren’t in the same building—complete with guardrails, playbooks, and hard metrics.

Drew Halvorsen

Principal Engineer, GitPlumbers

20 years across fintech and retail. Led platform teams at a Fortune 100 through remote transition, cut CFR in half. Ex-ThoughtWorks, ex-Stripe ecosystem partner. Opinionated about small PRs and big coffees.

In remote orgs, code review is your hallway. If the hallway is blocked, everything smells.

Back to all posts

The remote trap: quiet Slack, noisy incidents

You’ve lived it: everyone’s “heads down” on Zoom, PRs pile up, releases slip, then Friday-night pages scream. I watched a unicorn go fully remote in 2020, keep the standups, and still crater review latency to 3.5 days. MTTR doubled. The fix wasn’t a tool—it was rituals, leadership behaviors, and automated guardrails that made good decisions the easy path.

Here’s the remote-first operating model we deploy at GitPlumbers when an enterprise can’t afford regressions but also can’t force everyone back into the office.

Rituals that keep code review alive when you’re not co-located

Remote teams don’t need more meetings—they need precise, repeatable touchpoints.

Daily async check-in (10 minutes, no meeting): Post a short update in #eng-daily with Yesterday, Today, Risks. Managers skim, not reply-all. Use threads.
PR review SLA: First response in <24h, merge/close in <3 business days. Aged PRs ping reviewers automatically.
Rotating PR Sheriffs: Two engineers per squad dedicate 60–90 minutes/day to reviews. That’s cheaper than a stuck release train.
Weekly Merge Hour: Everyone merges small PRs with a shared Zoom/Slack huddle; reduces merge hell and addresses flaky tests in the moment.
RFCs as ADRs, not novels: Create short docs/adr/ADR-###.md for architectural decisions. Keep them under 1 page.

ADR template:

# ADR-012: Enforce small PRs

- Status: Accepted
- Context: Review latency spiked to 2.4 days; defects correlate with PRs > 600 LOC.
- Decision: Fail CI on PRs > 400 LOC altered (excluding vendor/ and snapshots/). Require 2 approvals on critical paths.
- Consequences: Faster reviews, more merge frequency, slightly more coordination. Exceptions allowed with `#no-size-check` tag and TL approval.

Design office hours, not design meetings: Thursdays 2–4pm local time overlap. Drop in, screenshare, decide. No slide decks.

Leadership moves that actually raise the bar

I’ve seen this fail when leaders delegate quality to the linters. People follow what you measure and model.

Never merge red: Directors and staff engs must model waiting for green checks. No “just this once”. The exception becomes the policy.
Praise the small: Celebrate PRs < 200 LOC and designs that delete code. Surface these in all-hands.
Publish the scoreboard: Each Monday, post last week’s review latency, median PR size, change failure rate, and flaky tests. Make it boringly visible.
Protect focus time: Guard two 2-hour blocks per engineer for deep work. You can’t write tests in 15-minute fragments.
Debt gets a budget: 15% of capacity for defect backlog and test hardening, tracked like features. No phantom “someday”.
Skip-level diff reviews: Once a week, senior leaders scan merged diffs for safety. You spot patterns the team is too close to see.

Automated guardrails: quality that runs while you sleep

The machine should nag so humans don’t have to. Put checks in CI, not on a wiki page.

Pre-commit hooks for local feedback

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/psf/black
    rev: 24.4.2
    hooks:
      - id: black
  - repo: https://github.com/pre-commit/mirrors-eslint
    rev: v9.12.0
    hooks:
      - id: eslint
        additional_dependencies: [eslint@9.12.0]
  - repo: https://github.com/pycqa/isort
    rev: 5.13.2
    hooks:
      - id: isort

Same checks in CI with GitHub Actions + reviewdog

# .github/workflows/ci.yml
name: ci
on:
  pull_request:
  push:
    branches: [main]
jobs:
  lint_test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - run: pip install black isort
      - run: npm ci
      - run: |
          black --check .
          isort --check-only .
          npm run lint
      - uses: reviewdog/action-eslint@v1
        with:
          reporter: github-pr-review
          level: error
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test -- --ci --reporters=jest-junit --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/

Fail big PRs with DangerJS

// dangerfile.ts
import { danger, warn, fail, markdown } from 'danger';

const bigPR = danger.github.pr.additions + danger.github.pr.deletions > 400;
if (bigPR && !danger.github.pr.body?.includes('#no-size-check')) {
  fail('PR too large (>400 LOC changed). Split it or add `#no-size-check` with TL approval.');
}

const changedFiles = danger.git.modified_files.concat(danger.git.created_files);
const testTouched = changedFiles.some(f => f.includes('test') || f.includes('__tests__'));
if (!testTouched) {
  warn('No tests changed. If this is intentional, explain why.');
}

Codify ownership with CODEOWNERS

# CODEOWNERS
/apps/payments/**     @fin-payments @alice
/libs/auth/**         @platform-auth @bob
**/*.tf               @devops-core

Branch protections via Terraform

# github_branch_protection.tf
resource "github_branch_protection" "main" {
  repository_id = github_repository.app.id
  pattern       = "main"

  required_status_checks {
    strict   = true
    contexts = ["ci/test", "lint", "security/sast"]
  }

  required_pull_request_reviews {
    dismiss_stale_reviews           = true
    required_approving_review_count = 2
    require_code_owner_reviews      = true
  }

  enforce_admins = true
}

Deploy safely with flags and canaries: Use LaunchDarkly or OpenFeature, then canary with ArgoCD or your platform’s blue/green. Tie rollouts to SLOs so bad code auto-pauses.

Make quality measurable: dashboards you'll actually watch

If it isn’t measured, it’s folklore. Start with a handful of metrics that correlate with outcomes.

Review latency (PR open -> first review comment): Target median < 24h.
Median PR size (changed LOC): Target < 300. Exceptions are real, but they’re exceptions.
Change failure rate (post-deploy rollback/incident): Target < 15%.
MTTR (service): Target < 60 minutes for P2; set real SLOs.
Flaky test rate: Target < 2% of CI runs with flake reruns.

Grab data from the GitHub GraphQL API, stash in your warehouse, and chart in whatever your org uses (Grafana/Looker).

# PR review latency (simplified)
query($owner: String!, $name: String!) {
  repository(owner: $owner, name: $name) {
    pullRequests(states: OPEN, first: 50, orderBy: {field: CREATED_AT, direction: DESC}) {
      nodes {
        number
        createdAt
        reviews(first: 1, orderBy: {field: UPDATED_AT, direction: ASC}) {
          nodes { updatedAt }
        }
        additions
        deletions
      }
    }
  }
}

Prometheus example for flaky tests (emit a metric from CI):

# Pseudo-emission in CI on flake detection
curl -X POST "$PROM_PUSH/metrics/job/ci" \
  -d "ci_flaky_tests_total{repo=\"app\"} 1"

Pro tip: put the metrics where leadership actually looks. I’ve watched "quality dashboards" die in Confluence. Wire them into the weekly staff deck.

Enterprises aren’t startups: do this with audit, SOX, and budgets

I’ve worked with banks and public retailers where “just push main” gets you escorted out. Remote-first can still be compliant and fast.

GitOps for traceability: All infra/app changes land via PR. Approvals + checks form the change record. ArgoCD syncs; CAB reviews the PR trail, not a Word doc.
Segregation of duties: Use CODEOWNERS to require review from a different team for prod-affecting dirs (infra/, apps/payments/). Terraform branch protections enforce it.
Policy-as-code with OPA: Gate merges on policies—e.g., disallow public S3. Keep policy repo separate and reviewed by security.

# .opa/policies/s3.rego
package s3

violation[msg] {
  input.resource.type == "aws_s3_bucket"
  input.resource.acl == "public-read"
  msg := sprintf("Public S3 bucket not allowed: %s", [input.resource.name])
}

Feature flags for audit: Tie flag changes to tickets and record who flipped what. This is your emergency brake that doesn’t need a redeploy.
Budget reality: Don’t boil the ocean. Pick your critical path (e.g., checkout), harden that first: reviews, tests, flags, canaries. Expand quarterly.

Operating cadence: week-by-week, quarter-by-quarter

Here’s the sequence I use when parachuting into a remote org with quality drift.

Week 1–2: Stabilize PR flow
- Introduce PR SLA and PR Sheriffs.
- Turn on pre-commit + CI linters/tests.
- Add CODEOWNERS and branch protections for main.
Week 3–4: Make risk visible
- Wire Danger size checks.
- Stand up a basic quality dashboard (review latency, PR size, CFR).
- Start Weekly Merge Hour.
Month 2: Harden releases
- Adopt flags on critical paths; establish canary.
- Add OPA policies for obvious footguns.
- Start skip-level diff reviews.
Month 3: Institutionalize
- Quarterly ADR review; pay 15% debt budget.
- Expand guardrails to second-tier services.
- Tie OKRs to quality metrics (e.g., median PR size, CFR).

Expected results (what we see at GitPlumbers when leadership holds the line):

Median PR size drops 35–60% in six weeks.
Review latency falls under 24h for 80%+ of PRs.
Change failure rate improves from ~25% to <12% in a quarter.
MTTR down 30–50% as flags/canaries allow quick reversals.
Team sentiment improves because people aren’t firefighting on Fridays.

Related Resources

Key takeaways

Small, reviewable changes plus strict PR SLAs beat any fancy toolchain.
Codify ownership (`CODEOWNERS`, branch protections) so quality isn’t optional.
Automate the boring: lint, tests, SAST, and PR-size checks in CI—every time.
Track review latency, PR size, and change failure rate like you track revenue.
Leaders must model quality: never merge-red, celebrate small PRs, and pay down debt on cadence.
Remote-first and audited (SOX/ISO) can coexist with GitOps, policy-as-code, and traceable approvals.

Implementation checklist

Set a PR SLA: <24h first response, <3 business days to merge or close.
Enforce max PR size (e.g., 400 LOC) with automation and team norms.
Require two approvals and code-owner review for critical paths.
Adopt pre-commit hooks and run the same checks in CI.
Publish a quality dashboard: review latency, PR size, flaky tests, change failure rate.
Schedule weekly “merge hour” and rotating PR sheriffs.
Use feature flags and canaries to de-risk remote releases.
Make ADRs mandatory for non-trivial changes; keep them short and searchable.

Questions we hear from teams

What if our legacy repo makes small PRs impossible?: Start by carving seams: introduce module boundaries and file-by-file lints so changes can be isolated. Enforce small PRs in greenfield areas first, then refactor on a scheduled debt budget. We’ve split monoliths by adding adapter layers and stranglers before enforcing PR limits across the board.
We’re in multiple time zones. How do we hit 24h review SLAs?: Rotate PR Sheriffs across regions and use follow-the-sun handoffs. Automate Slack pings on aged PRs and set expectations that reviews top the queue after standup. A single overlap hour per region is usually enough.
Won’t two approvals slow us down?: Not if PRs are small. In our data, teams with small PRs and two approvals ship more frequently than one-approval teams with large PRs because review confidence is higher and rollbacks are rarer.
Can we do this on GitLab or Bitbucket?: Yes. Swap GitHub Actions with GitLab CI/CD or Bitbucket Pipelines, use their code owners/merge request approvals, and equivalent linters/review bots. The principles—small changes, enforced guardrails, clear SLAs—are tool-agnostic.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Run a Remote-First Quality Tune-Up Talk to an engineer (not sales)