How do I choose budgets that won’t create noise?

Use current medians on stable hardware as your baseline, then set budgets with 10–20% headroom. Run 3–5 iterations and gate on the median. Tweak thresholds per route; checkout deserves stricter budgets than blog pages.

Should we block merges on synthetic metrics only?

Block on synthetic for immediate feedback (Lighthouse CI, k6) and complement with RUM (e.g., INP from real users via Google Analytics 4 or Elastic RUM) for ongoing validation. If RUM drifts despite green builds, tighten the gate or adjust test realism.

Our stack is a legacy monolith. Is this only for SPAs on Kubernetes?

No. You can run Lighthouse against server-rendered pages and k6 against any HTTP endpoint. We’ve put this in Jenkins for on-prem monoliths and in Buildkite/GitHub Actions for microservices. Start with one route and one API.

What about AI-generated code and perf regressions?

Treat AI suggestions like junior dev PRs: benchmark, profile, and gate. We’ve seen Copilot add “harmless” client-side JSON parsing that doubled bundle size. Add bundle-size checks and Web Vitals gates to catch vibe coding regressions early.

How do we tie improvements to dollars?

Use switchback or A/B tests with holdout groups. Track conversion, retention, and support tickets alongside perf metrics. Even simple before/after analysis with seasonality-aware controls can show revenue impact.

Performance-optimization · Dec 9, 2025 · 9 minute read

The Build That Saves Your UX: Catching Performance Regressions Before Users Feel Them

Perf doesn’t die in a fire; it bleeds out via tiny regressions that nobody notices—until your conversion graph does. Here’s how to make performance a merge blocker with metrics your CFO cares about.

Alex R. Marin

Principal Engineer, GitPlumbers

20 years of shipping and fixing distributed systems—ex-Netflix platform, Shopify performance, early Kubernetes adopter, and habitual breaker of monoliths (and fixer of the aftermath).

If it doesn’t break the build, it doesn’t exist.

Back to all posts

The slow drift you don’t notice—until the cart abandons you

I’ve never been paged for a 10-second spike. I’ve been paged for quarters of “weird”, where LCP crept from 2.3s to 3.1s and p95 checkout latency wandered north by 120ms. No single commit “broke” it. A dozen “harmless” merges did. One client’s conversion rate slid 0.7% over 10 weeks—roughly seven figures annualized. The graphs didn’t scream; revenue did.

If you’ve felt this, you don’t need another dashboard. You need performance to be a merge blocker, wired to metrics users feel and finance respects.

Measure what users feel, not what you can easily collect

Pick metrics that map to experience and business outcomes, and scope them to critical flows.

Frontend (Core Web Vitals): LCP, INP, CLS, plus TTFB for back-end influence.
Backend: p95/p99 latency per endpoint, error rate, and saturation (CPU/RAM/DB connections).
Business: conversion in critical steps (search → product view → add-to-cart → checkout), bounce rate, session length, cancellations.

Map them explicitly:

Search results page: LCP < 2.5s (mobile 4G, Moto G4 class), INP < 200ms, size budget < 200KB JS.
Add-to-cart API: p95 < 250ms, failure < 0.5%.
Checkout: TTFB < 200ms, LCP < 2.0s (returning users, cached).

Then set budgets and SLOs in versioned config. If a PR nudges LCP +150ms on the search page, the build fails. Not “warns”—fails.

Make performance a build gate, not a dashboard ornament

Start simple: Lighthouse CI for Web Vitals deltas and k6 for API thresholds. Run them in CI on PRs and on main. Store baselines and compare.

# .github/workflows/perf-gate.yml
name: perf-gate
on: [pull_request]
jobs:
  web-vitals:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install deps
        run: |
          npm ci
          npm i -g @lhci/cli@0.13.x
      - name: Start app
        run: npm run start:test &
      - name: Lighthouse CI (mobile 4G)
        run: lhci autorun --config=./lighthouserc.js
  api-latency:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: k6 test
        uses: grafana/setup-k6-action@v1
      - name: Run k6
        run: k6 run ./perf/k6-checkout.js

Example lighthouserc.js with budgets and diffing against main:

// lighthouserc.js
module.exports = {
  ci: {
    collect: {
      numberOfRuns: 3,
      settings: {
        formFactor: 'mobile',
        screenEmulation: { mobile: true },
        throttling: { rttMs: 150, throughputKbps: 1600, cpuSlowdownMultiplier: 4 },
        extraHeaders: JSON.stringify({ 'x-test-user': 'perf-ci' })
      },
      url: [
        'http://localhost:3000/search?q=shoes',
        'http://localhost:3000/checkout'
      ]
    },
    assert: {
      assertions: {
        'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
        'interactive': ['error', { maxNumericValue: 3500 }],
        'total-byte-weight': ['warn', { maxNumericValue: 300000 }],
        'uses-responsive-images': 'error'
      }
    },
    upload: { target: 'temporary-public-storage' }
  }
};

k6 gating example with p95 thresholds:

// perf/k6-checkout.js
import http from 'k6/http';
import { check } from 'k6';
export const options = {
  scenarios: { smoke: { executor: 'constant-vus', vus: 5, duration: '1m' } },
  thresholds: {
    http_req_failed: ['rate<0.005'],
    http_req_duration: [
      'p(95)<250', // p95 checkout API must be < 250ms
      'p(99)<500'
    ]
  }
};
export default function () {
  const res = http.post(`${__ENV.API}/checkout`, JSON.stringify({ items: [1,2] }), {
    headers: { 'Content-Type': 'application/json', 'x-test-user': 'perf-ci' }
  });
  check(res, { 'status is 200': (r) => r.status === 200 });
}

This isn’t perfect science, but it turns the silent drift into a red ❌ on the PR.

Control variance or drown in false positives

Lighthouse on a noisy runner will gaslight you. Stabilize your harness:

Run 3–5 iterations and use median; set budgets with headroom (e.g., LCP < 2.5s when typical is 2.1s).
Throttle consistently (network and CPU), and use stable test data and users.
Pin Node/Chrome versions in CI; consider a dedicated perf agent or ephemeral runner with vCPU/CPU pinning.
Warm caches for returning-user paths; explicitly test both cold and warm.
Mock flaky third parties (ads, chat widgets). Block via --blockedUrlPatterns.
Store artifacts (HTML reports, HAR, traces) so devs can self-serve.

If you want less noise and trend analysis, add an external synthetic like SpeedCurve, Calibre, or WebPageTest. Use it to set baselines and detect slow drift; use CI gates to stop the bleeding at the source.

Fixes that actually move the needle (with measured outcomes)

What’s worked repeatedly at clients, with numbers you can take to finance:

Ship less JS:
- Code splitting and route-level dynamic imports with Vite/Rollup; keep hydration small. Typical win: –200–500KB → LCP –200–600ms.
- Kill dead dependencies; run source-map-explorer or webpack-bundle-analyzer in CI.
- Prefer native browser features over polyfill-heavy libs.
Smarter assets:
- Images: AVIF/WebP, sizes/srcset, lazy-load below the fold. Often 30–60% savings.
- Fonts: font-display: swap, subset, preconnect to CDNs.
- Enable Brotli and HTTP/2/3.
Cache like you mean it:
- CDN (Cloudflare/Fastly/Akamai) with stale-while-revalidate. Move cacheable edges out of origin.
- Service Worker for repeat visits.
Backend basics:
- Remove N+1s with request-level tracing (OpenTelemetry + Jaeger/Tempo). p95 often –20–40%.
- Add proper DB indexes and timeouts; watch EXPLAIN ANALYZE.

Nginx Brotli + caching example:

# nginx.conf (excerpt)
http {
  brotli on; brotli_comp_level 5; brotli_types text/plain text/css application/json application/javascript image/svg+xml;
  gzip off; # don’t double-compress
  proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=STATIC:100m inactive=7d use_temp_path=off;
  server {
    location /assets/ {
      add_header Cache-Control "public, max-age=31536000, immutable";
      brotli_static on;
    }
    location /api/ {
      proxy_pass http://app;
      proxy_connect_timeout 2s; proxy_read_timeout 5s;
    }
  }
}

Vite code splitting sanity:

// vite.config.ts (excerpt)
export default defineConfig({
  build: {
    sourcemap: true,
    rollupOptions: {
      output: {
        manualChunks(id) {
          if (id.includes('node_modules')) return 'vendor';
        }
      }
    }
  }
});

SQL plan check you can paste into psql:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total, u.email
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.created_at > now() - interval '7 days'
AND o.status = 'paid'
ORDER BY o.created_at DESC
LIMIT 50;

If Rows Removed by Filter explodes and there’s no index on (status, created_at), you’ve found your culprit.

Ship safely: canaries and flags catch what CI can’t

Even with solid CI gates, the real world hits differently. Add runtime guardrails:

Canary deploys with Argo Rollouts or Flagger on Kubernetes; watch p95, error rate, and Web Vitals proxies (TTFB via synthetic).
Feature flags (LaunchDarkly/Unleash) around heavy code paths; roll back without redeploys.

Argo Rollouts analysis gating on p95:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: p95-latency
spec:
  metrics:
  - name: http-p95
    interval: 1m
    successCondition: result < 0.25
    failureCondition: result >= 0.25
    provider:
      prometheus:
        address: http://prometheus.monitoring:9090
        query: |
          histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job="checkout"}[5m])) by (le))

Tie this to a 10% → 25% → 50% rollout. If p95 tips over 250ms, the rollout pauses and auto-aborts. Your pager stays quiet.

Prove the dollars: connect perf to revenue

At a retail client, we cut search LCP from 3.0s → 2.2s with code splitting, image AVIF, and CDN tuning. A two-week switchback test (50/50, weekend-inclusive) showed:

+1.9% product views from search
+1.2% add-to-cart
+0.6% checkout completion
Net +$410k/month projected

Another B2B SaaS: shaving 120ms off p95 on the dashboard (query index + response shaping) reduced support tickets tagged “slow” by 18% and improved trial-to-paid by 0.4%.

Put this in your board deck. Performance isn’t a virtue signal; it’s a lever.

What we’ve learned after too many postmortems

If it doesn’t break the build, it doesn’t exist.
Guard the top 3 flows first; don’t chase synthetic perfect scores.
Control variance or your team will ignore the gate.
Keep perf configs in git and review them like code.
Watch for regressions from AI-generated code and “vibe coding”—we’ve seen Copilot add a “tiny” dependency that pulled in 400KB and a polyfill fiesta. Treat AI code like an intern’s PR: verify, measure, and gate.

If you need help wiring this end-to-end—or a fast vibe code cleanup after AI-assisted “refactors”—GitPlumbers has done this at unicorns and at 20-year-old monoliths. We’ll get you to green PR checks that actually protect revenue.

Related Resources

Key takeaways

Make performance a merge blocker using user-facing metrics like LCP, INP, and p95 endpoint latency.
Stabilize measurements with repeat runs, network/device emulation, and variance-aware thresholds.
Automate detection with Lighthouse CI and k6; store baselines and diff per PR.
Tie budgets to revenue: guard critical flows (search, add-to-cart, checkout) with per-route budgets.
Roll out safely with canaries and feature flags; abort if SLOs or Web Vitals drift.
Fix with high-ROI techniques: code splitting, image optimization, CDN caching, DB query plans, async offloads.
Measure outcome impact: connect perf improvements to conversion and retention with A/B or switchback tests.

Implementation checklist

Define top 3 user journeys and map to metrics: LCP/INP for frontend, p95/p99 per endpoint.
Set performance budgets and SLOs; document in versioned config.
Add CI gates with Lighthouse CI and k6; run 3–5 iterations and compare to main.
Capture artifacts (traces, HAR, flamegraphs) on failure; link in PR checks.
Control variance: throttle network/CPU, pin test hardware, mock unstable backends.
Add canary analysis with Argo Rollouts/Flagger and Prometheus before full rollout.
Continuously prune regressions: weekly perf triage and blameless postmortems.

Questions we hear from teams

How do I choose budgets that won’t create noise?: Use current medians on stable hardware as your baseline, then set budgets with 10–20% headroom. Run 3–5 iterations and gate on the median. Tweak thresholds per route; checkout deserves stricter budgets than blog pages.
Should we block merges on synthetic metrics only?: Block on synthetic for immediate feedback (Lighthouse CI, k6) and complement with RUM (e.g., INP from real users via Google Analytics 4 or Elastic RUM) for ongoing validation. If RUM drifts despite green builds, tighten the gate or adjust test realism.
Our stack is a legacy monolith. Is this only for SPAs on Kubernetes?: No. You can run Lighthouse against server-rendered pages and k6 against any HTTP endpoint. We’ve put this in Jenkins for on-prem monoliths and in Buildkite/GitHub Actions for microservices. Start with one route and one API.
What about AI-generated code and perf regressions?: Treat AI suggestions like junior dev PRs: benchmark, profile, and gate. We’ve seen Copilot add “harmless” client-side JSON parsing that doubled bundle size. Add bundle-size checks and Web Vitals gates to catch vibe coding regressions early.
How do we tie improvements to dollars?: Use switchback or A/B tests with holdout groups. Track conversion, retention, and support tickets alongside perf metrics. Even simple before/after analysis with seasonality-aware controls can show revenue impact.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Get a performance gate wired into your CI See how we cut checkout LCP by 800ms