Why not just rely on server-side metrics (CPU, GC, qps)?

Because users don’t feel CPU. They feel LCP, INP, and checkout latency. Server metrics are necessary but insufficient. We’ve seen servers look healthy while RUM showed LCP p75 blowing past 3s on mid Android due to a font swap and 3rd-party tags.

Synthetic or RUM—do I need both?

Yes. Synthetic (Lighthouse, k6) is fast and deterministic for CI gating. RUM proves impact on real devices, networks, and geos, and powers SLO-based canaries and rollbacks.

Won’t performance tests slow down CI?

Run quick smoke checks (30–90 seconds) on PRs and deeper tests nightly. Gate only critical routes/APIs per PR; everything else can be async. The cost of a perf regression in prod is higher than a 2–3 minute CI job.

How do I pick budgets?

Start with Google’s Web Vitals guidance (LCP <2.5s, INP <200ms) and your current p75/p95. Set budgets 10–20% tighter than current, then ratchet down after each win. Make them per route, per device class, and per region.

What about microservices backends?

Budget at the edge of the user journey. Then add contracts for hot APIs (p95/p99 by region). Use k6 with tags per endpoint, and validate the journey in canary with RUM. Don’t drown in 200 service-level charts—users don’t care which microservice was slow.

Performance-optimization · Nov 6, 2025 · 10 minute read

The Optimization Isn’t Real Until CI Says So: Automating Performance Proof with User-Centric Metrics

Q: What about microservices backends?

Budget at the edge of the user journey. Then add contracts for hot APIs (p95/p99 by region). Use k6 with tags per endpoint, and validate the journey in canary with RUM. Don’t drown in 200 service-level charts—users don’t care which microservice was slow.

If it doesn’t move LCP, INP, and checkout latency for real users, it’s not an optimization. Wire performance tests into CI/CD and gate merges with budgets tied to revenue.

Avery Grant

Principal Architect, GitPlumbers

20 years keeping systems fast and boring. Led perf engineering at two unicorns, tuned JVMs at scale, shipped Core Web Vitals wins for retailers and fintechs. Believes budgets beat dashboards.

Performance isn’t faster until the customer says it is—and CI enforces it.

Back to all posts

The release that felt faster but cost us revenue

I’ve watched teams “speed up” an app and ship a regression. One fintech client shaved 20% off server CPU after a gRPC hop refactor. Perf graphs looked great in Grafana… until revenue dipped. Turns out we added a blocking font load and lazy-loaded the hero image incorrectly. LCP got worse on mid-tier Android in LATAM, checkout time went up p95, and conversions fell 3.1% in two days. No one noticed until finance asked why the daily run rate was off.

I’ve seen this fail at unicorns and mom-and-pop SaaS: people optimize what’s easy to measure (CPU, GC, container cost) and ignore what users feel (LCP, INP, p95 checkout latency). The fix is boring but effective: automate performance tests around user-facing metrics and make merges fail if we don’t improve—or if we regress.

If your CI doesn’t gate on user-centric budgets, your “optimization” is a rumor.

Measure what users feel, not what servers brag about

Servers love to brag about CPU and throughput. Users care about when the page draws and when taps respond. Track these:

Core Web Vitals: LCP (<2.5s p75), INP (<200ms p75), CLS (<0.1 p75).
TTFB: p75 under ~200–400ms depending on region.
Journey latency: p95 time from product page to payment success.
API p95/p99 by endpoint and region (e.g., /cart/checkout p95 < 800ms).
Apdex per user flow (search, add-to-cart, pay), not just per service.

Map metrics to money:

100ms faster LCP on product pages raised a retailer’s conversion by 1.6% (our client; 8-figure annual run rate gain).
Reducing p95 checkout from 1.2s to 700ms dropped abandonment by 2.4pp.

Set budgets per route/API and device class. Example budgets:

LCP p75: Desktop 1.8s, Mid-Android 2.5s, Low-end 3.0s.
INP p75: 150ms on product pages; 200ms on account pages.
API /pricing p95: 300ms NA/EU; 450ms APAC.

These budgets become gates in CI and canary analysis. No green, no merge.

Make tests repeatable and truthful

Synthetic tests catch regressions fast; Real User Monitoring (RUM) proves impact. Use both.

Synthetic: Lighthouse CI for Web Vitals, k6 for API latency and throughput. Run on fixed hardware or cloud workers with network shaping.
RUM: web-vitals library in your app; ship to Prometheus or a vendor (SpeedCurve, Splunk RUM, Datadog). Slice by device, OS, region, and release version.
Truthful data: Test with production-like payload sizes, images, and cache headers. Use real cookies/feature flags. Warm and cold cache scenarios.
Traffic replay for APIs: Sample production requests with gor (GoReplay) or service mesh mirrors (Istio mirrorPercentage) against staging/canary.

Keep it cheap and sane:

Run quick smoke perf checks on PRs, deeper runs nightly or on release candidates.
Use small, deterministic datasets and seeded test accounts to remove noise.

Wire it into CI/CD with hard budgets

Here’s the combo we deploy a lot at GitPlumbers: Lighthouse CI for web, k6 for APIs, Prometheus for budgets/SLOs, enforced through CI and canary.

A minimal k6 script with thresholds that fail the build on regression:

// tests/perf/api-smoke.js
import http from 'k6/http';
import { check, group, sleep } from 'k6';

export const options = {
  thresholds: {
    'http_req_waiting{page:home}': ['p(95)<300'],
    'http_req_waiting{page:product}': ['p(95)<400'],
    'checks': ['rate>0.99']
  },
  scenarios: {
    smoke: {
      executor: 'constant-vus',
      vus: 10,
      duration: '1m'
    }
  }
};

export default function () {
  group('home', () => {
    const res = http.get(`${__ENV.BASE_URL}/` , { tags: { page: 'home' } });
    check(res, { '200': r => r.status === 200 });
  });
  group('product', () => {
    const res = http.get(`${__ENV.BASE_URL}/product/123`, { tags: { page: 'product' } });
    check(res, { '200': r => r.status === 200 });
  });
  sleep(1);
}

A Lighthouse CI config that enforces Web Vitals budgets:

// lighthouserc.json
{
  "ci": {
    "collect": {
      "url": ["https://staging.example.com/", "https://staging.example.com/product/123"],
      "numberOfRuns": 3,
      "settings": { "preset": "desktop" }
    },
    "assert": {
      "assertions": {
        "categories:performance": ["error", { "minScore": 0.9 }],
        "metrics/largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
        "metrics/cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }],
        "metrics/interactive": ["warn", { "maxNumericValue": 3800 }]
      }
    }
  }
}

A GitHub Actions job that runs both and fails the PR if budgets blow up:

name: perf-gates
on: [pull_request]

jobs:
  web-vitals:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npm ci
      - name: Run Lighthouse CI
        run: npx @lhci/cli autorun --upload.target=temporary-public-storage

  api-latency:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run k6
        uses: grafana/k6-action@v0.3.0
        with:
          filename: tests/perf/api-smoke.js
        env:
          BASE_URL: https://staging.example.com

If a change bumps LCP over 2.5s p75 or pushes API p95 beyond budget, the PR turns red. You didn’t “optimize”; you broke the budget. Fix it or flag-gate it.

Validate optimizations with experiments, not vibes

Pick an optimization. Prove it. Then ship.

Hypothesis: “Serve images as AVIF with srcset and proper sizes will drop LCP 20% on product pages.”
Implement behind a feature flag (e.g., LaunchDarkly, Unleash).
Synthetic proof: LHCI drops LCP from 2.9s → 2.1s on staging.
Canary to 5% of traffic, mobile-only, via Argo Rollouts.
RUM proof: web-vitals shows p75 LCP 2.8s → 2.2s on mid Android.
Business impact: conversion +1.4%, add-to-cart +2.1% in 48 hours.
Roll out to 100%, then ratchet budgets tighter by 10%.

We often add edge improvements:

TTFB: Cache HTML for anon users at the CDN with stale-while-revalidate. Saved a publisher 90ms p75 TTFB globally.
JavaScript diet: Split vendors.js by route; defer 3rd-party tags. INP p75 dropped 30–60ms.
API hot paths: Precompute price breakdowns into Redis; /pricing p95 fell from 480ms → 190ms.

Document results in a perf changelog that ties code diffs to metric shifts and revenue. Future-you will thank you when finance asks “what changed?”

Guardrails: SLOs, canaries, and auto-rollback

CI gates stop bad merges; SLOs stop bad rollouts. Two concrete pieces:

Prometheus alert based on RUM (not server CPU):

# prom-rules.yaml
groups:
- name: perf-slo
  rules:
  - record: slo:lcp_p75_seconds
    expr: histogram_quantile(0.75, sum(rate(web_vitals_lcp_bucket{service="web",env="prod"}[5m])) by (le))
  - alert: WebVitalsLCPSLOViolation
    expr: slo:lcp_p75_seconds > 2.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: "LCP p75 above SLO ({{ $value }}s > 2.5s)"
      runbook: https://gitplumbers.com/runbooks/web-lcp

Argo Rollouts analysis that aborts a canary when LCP degrades:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: web-lcp
spec:
  metrics:
  - name: lcp-p75
    interval: 1m
    successCondition: result < 2.5
    provider:
      prometheus:
        address: http://prometheus.monitoring:9090
        query: |
          histogram_quantile(0.75, sum(rate(web_vitals_lcp_bucket{service="web",env="canary"}[5m])) by (le))

If your canary pushes LCP above 2.5s for 10 minutes, rollout halts automatically. No 2am firefight.

A one-week rollout plan

We’ve set this up in a week for teams from Series A startups to Fortune 100.

Day 1: Pick journeys and KPIs. Define budgets for product page, search, checkout; and /pricing, /cart, /orders APIs. Agree on p75/p95 targets by region and device class.
Day 2: Add web-vitals to the app. Start shipping RUM to Prometheus with release labels.
Day 3: Add Lighthouse CI to PRs for key pages. Fail if LCP > 2.5s or performance score < 0.9.
Day 4: Add k6 smoke with p95 thresholds for key APIs. Run against staging env seeded with realistic data.
Day 5: Create Prometheus SLO alerts and a Grafana dashboard that shows budgets vs actuals by release.
Day 6: Wire Argo Rollouts or Flagger for canary + analysis based on RUM metrics.
Day 7: Pilot an optimization (e.g., image format switch). Validate via CI → canary → RUM. Publish the perf changelog and tighten budgets 10%.

What I’d tell my past self: start with budgets and CI gates, not dashboards. Dashboards are for browsing; gates are for shipping safely.

Related Resources

Key takeaways

If it doesn’t improve user-facing metrics (LCP, INP, p95 checkout latency), it didn’t happen.
Automate performance checks in CI/CD with hard budgets that block merges.
Use both synthetic tests and RUM; validate on canary before full rollout.
Tie budgets to business targets (Apdex, conversion, retention) and measure the impact.
Make rollback automatic when SLOs drift—no heroics required.

Implementation checklist

Define user-centric KPIs: LCP, INP, CLS, p95 API latency, Apdex per journey.
Create performance budgets per route/API and per region/device class.
Add Lighthouse CI for web vitals in PRs; fail builds on budget breaches.
Add k6 for API p95/p99 thresholds; run on ephemeral env or canary.
Collect RUM (e.g., via Boomerang/Web-Vitals) and export to Prometheus.
Use Argo Rollouts or Flagger to canary with automated analysis/abort.
Wire Prometheus alerts to SLOs, not CPU/GC noise.
Publish a weekly perf scorecard that maps tech changes to revenue/SLAs.

Questions we hear from teams

Why not just rely on server-side metrics (CPU, GC, qps)?: Because users don’t feel CPU. They feel LCP, INP, and checkout latency. Server metrics are necessary but insufficient. We’ve seen servers look healthy while RUM showed LCP p75 blowing past 3s on mid Android due to a font swap and 3rd-party tags.
Synthetic or RUM—do I need both?: Yes. Synthetic (Lighthouse, k6) is fast and deterministic for CI gating. RUM proves impact on real devices, networks, and geos, and powers SLO-based canaries and rollbacks.
Won’t performance tests slow down CI?: Run quick smoke checks (30–90 seconds) on PRs and deeper tests nightly. Gate only critical routes/APIs per PR; everything else can be async. The cost of a perf regression in prod is higher than a 2–3 minute CI job.
How do I pick budgets?: Start with Google’s Web Vitals guidance (LCP <2.5s, INP <200ms) and your current p75/p95. Set budgets 10–20% tighter than current, then ratchet down after each win. Make them per route, per device class, and per region.
What about microservices backends?: Budget at the edge of the user journey. Then add contracts for hot APIs (p95/p99 by region). Use k6 with tags per endpoint, and validate the journey in canary with RUM. Don’t drown in 200 service-level charts—users don’t care which microservice was slow.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Book a performance budget sprint See our performance case studies