Seven Performance Playbooks That Actually Move the Needle (Core Web Vitals to Token Throughput)

Reusable, testable playbooks that tie p95 latency to revenue — for SPA+BFF, Monolith+DB, Microservices, Kafka, Serverless, and LLM inference.

You don’t need a silver bullet — you need seven boring playbooks you can run in your sleep.
Back to all posts

The playbook mindset: from latency to revenue

I’ve seen the same movie at a dozen companies: teams run “performance sprints,” speed up a few endpoints, and six months later regressions creep back. The fix isn’t heroics — it’s playbooks. For each architecture you run, you need a repeatable checklist that ties user-facing metrics to business outcomes, with clear rollout and rollback.

  • Metrics that matter: LCP, TTI, CLS (web), p95/p99 latency (APIs), TTFT and TPOT (LLM), Apdex, consumer_lag (Kafka), error rate, and cache hit ratio.
  • Business linkage: conversion rate, revenue per minute, abandonment, retention, support tickets, infra spend.
  • Proof: A/B or holdout groups; compare conversion and performance before/after. Don’t ship “faster” — ship “+0.8% conversion at p95 -200ms”.

The Amazon and Google playbooks have beaten this into us for years: 100ms can cost measurable revenue. I’ve watched an LCP drop from 3.1s to 1.9s lift mobile conversion 1–2% at a mid-market retail client. That’s the language finance understands.

SPA + BFF + CDN: win the first interaction

This is the stack where Core Web Vitals pay the bills. Your goal is LCP < 2.5s, TTI < 2s, stable layouts, and predictable navigations.

  1. Measure

    • Run Lighthouse CI and web-vitals RUM. Export to Prometheus and chart in Grafana.
    • Capture server-timing headers in the BFF for TTFB breakdown.
  2. Quick wins (often 30–60% LCP improvement in a week)

    • Push static assets to a CDN with Cache-Control: public, max-age=31536000, immutable and ETag.
    • Enable brotli and http3 (QUIC) at the edge (Cloudflare/Akamai/Fastly).
    • Serve images as AVIF/WebP, use sizes/srcset, and lazy-load below-the-fold.
    • Inline critical CSS (<14KB) and defer the rest; eliminate render-blocking JS.
    • In Next.js/Remix, prefer app/ routing and next/image; adopt React Server Components where feasible.
  3. Deeper fixes

    • Move HTML to the edge with stale-while-revalidate. Cookie-aware bypass for logged-in users.
    • Collapse N+1 BFF calls into a single aggregate; use HTTP/2 multiplexing with keep-alive.
    • Ship priority hints: <link rel="preload" as="image" imagesrcset="..."> and <link rel="preconnect" href="https://api.yourbff.com">.
  4. Config snippets

    • NGINX brotli + cache:
      brotli on;
      brotli_comp_level 6;
      location /assets/ {
        expires 1y;
        add_header Cache-Control "public, max-age=31536000, immutable";
      }
    • Edge worker (Cloudflare) for SWR:
      const ttl = 60, swr = 300
      return new Response(html, {headers: { 'Cache-Control': `public, max-age=${ttl}, stale-while-revalidate=${swr}` }})
  5. Expected outcomes

    • LCP -500–1200ms, TTI -300–800ms; 0.5–2.0% conversion lift on mobile product pages.

Monolith + RDBMS: stop making the database cry

90% of the wins are query shape, indexes, and caching. I’ve watched teams throw read replicas at a SELECT * with a bad filter. Don’t be that team.

  1. Measure

    • Turn on pg_stat_statements and sample slow queries; trace with OpenTelemetry.
    • EXPLAIN (ANALYZE, BUFFERS) your top 20 queries.
  2. Quick wins

    • Add covering indexes for hot paths. Example:
      CREATE INDEX CONCURRENTLY idx_orders_user_status_created ON orders(user_id, status, created_at DESC) INCLUDE (total);
    • Kill ORM-generated N+1s; batch IN queries or add data loaders.
    • Introduce a read-through cache (Redis) for idempotent reads:
      GET order:123
      # miss -> fetch DB -> SETEX order:123 300 <json>
    • Use pgbouncer in transaction mode; cap app pool to protect DB.
  3. Deeper fixes

    • Move heavy reads to a materialized view refreshed by a job.
    • Denormalize joins serving product pages into a precomputed document.
    • Tune Postgres for your hardware: shared_buffers ~25% RAM, work_mem per sort/hash, effective_cache_size realistic.
  4. Guardrails

    • Timeouts everywhere: app query timeout <= 3000ms; cancel long-running queries.
    • Circuit-breaker around Redis; stale reads > failures for catalog pages.
  5. Expected outcomes

    • API p95 -30–70%; DB CPU -40%; cache hit ratio 80–95%; infra spend -20–30%.

Microservices over REST: tame the network, not the team

When a request fans out to 6 services, you don’t optimize one handler — you control blast radius. The toolkit: timeouts, backpressure, retries, and bulkheads.

  1. Measure

    • Distributed traces (OpenTelemetry -> Tempo/Jaeger), service mesh metrics (Istio/Envoy) for p95, error_rate, and retry storms.
  2. Quick wins

    • Set sane defaults in Envoy:
      route:
        timeout: 2s
      retry_policy:
        retry_on: 5xx,connect-failure,reset
        num_retries: 2
        per_try_timeout: 500ms
      circuit_breakers:
        thresholds:
          max_connections: 1024
          max_pending_requests: 512
      outlier_detection:
        consecutive_5xx: 5
        base_ejection_time: 30s
    • Stop synchronous chains for non-critical work; enqueue to a queue.
  3. Deeper fixes

    • Introduce a BFF to reduce client fan-out; cache GETs at edge with ETag/max-age.
    • Apply bulkheads: separate pools for third-party calls. Never let a flaky payment API starve product detail requests.
    • Use canaries + SLO guards to prevent a bad deploy from burning the error budget in minutes.
  4. Expected outcomes

    • p95 -20–50% on user flows; MTTR -30–60% thanks to fewer retry storms; fewer brownouts under peak.

Event-driven with Kafka: throughput without surprise lag

Kafka fixes fan-out costs, then adds its own footguns. The playbook is about producer batching, consumer backpressure, and observability of lag.

  1. Measure

    • Track consumer_lag, end-to-end time from event to user-visible effect, and DLQ rate.
  2. Quick wins

    • Producer config:
      linger.ms=10
      batch.size=131072
      compression.type=zstd
      acks=all
    • Consumers: set max.poll.records and process in batches; checkpoint on success.
  3. Deeper fixes

    • Increase partitions to scale, but align to consumer concurrency and keying semantics.
    • For read models, switch hot-path projections to incremental updates rather than full recomputes.
    • Apply backpressure: pause consumption when downstream latency spikes; expose it in metrics.
  4. Guardrails

    • DLQ with retention and alerts; replay tested via a staging topic.
    • Idempotency keys at sinks to tolerate retries.
  5. Expected outcomes

    • End-to-end “write-to-visible” p95 from 2.5s -> 800ms; backlog recovery 5x faster; fewer paging incidents during spikes.

Serverless APIs (Lambda/Cloud Functions): kill cold starts and thundering herds

Serverless is great until cold starts meet chatty DBs. You need to pre-warm, proxy DBs, and right-size memory.

  1. Measure

    • Split initDuration from duration in logs. Track p95 and error rate under load tests (k6).
  2. Quick wins

    • Enable Provisioned Concurrency or SnapStart (Java). Use Lambda Power Tuning to pick memory that yields best ms/$.
    • Put RDS behind RDS Proxy; reuse TCP. For DynamoDB, enable Adaptive capacity and DAX for read-heavy.
    • Bundle trim and connection reuse (keep-alive); move secrets to env or cache.
  3. Deeper fixes

    • Precompute expensive data into Redis/ElastiCache with TTLs; serve stale on timeout.
    • Fan-out heavy work to Step Functions/SQS to avoid synchronous timeouts.
  4. Expected outcomes

    • Cold start p95 -60–90%; API p95 -30–50%; 20–40% cost reduction at same throughput.

LLM/AI inference: tokens per second is your throughput

LLM latency feels different: users notice TTFT and tokens/sec more than raw request latency. Your levers are batching, KV cache, quantization, and admission control.

  1. Measure

    • Track TTFT, TPOT (ms/token), throughput (tokens/sec), GPU utilization, and rejection rate.
  2. Quick wins

    • Use an inference server with batching and paged KV cache (vLLM, Triton).
    • Quantize to 4/8-bit (bitsandbytes/AWQ) if quality allows; enable FlashAttention.
    • Cap context length; cache system prompts; stream tokens to improve perceived latency.
  3. Deeper fixes

    • Batch by length; tune max_batch_size and max_tokens to keep GPUs >80% utilized.
    • Preload popular models; pin to GPUs with sufficient VRAM to avoid swaps.
    • Admission control: shed long prompts during load; queue with user feedback.
  4. Config sketch

    python -m vllm.entrypoints.openai.api_server \
      --model meta-llama/Meta-Llama-3-8B-Instruct \
      --tensor-parallel-size 2 \
      --max-num-batched-tokens 8192 \
      --gpu-memory-utilization 0.9
  5. Expected outcomes

    • TTFT -40–70%; tokens/sec +2–5x; infra cost per 1k tokens -30–60% with quantization and batching.

Operationalize your playbooks so they stick

A playbook isn’t a Confluence page. It’s code, alerts, and dashboards that survive reorgs.

  • Versioned playbooks in a repo: /playbooks/<pattern>/README.md with metrics, quick wins, configs, runbooks, dashboards links.
  • GitOps delivery: Terraform for infra, ArgoCD for manifests; PRs update timeouts, autoscaling, or CDN rules.
  • SLOs and error budgets: define p95/TTFT SLOs; alerts fire when burn rate > 2x budget.
  • CI gates: k6 smoke tests and Lighthouse budgets block regressions.
  • Dashboards: Grafana folders per playbook; “before vs after” panels and ROI overlays.
  • Cadence: monthly load tests and quarterly GameDays. Archive learnings and update the playbooks.

When GitPlumbers runs these with clients, we usually see: fewer firefights, clearer ROI on infra spend, and product managers asking when they can “buy another 500ms.” That’s when you know the playbooks are doing their job.

Related Resources

Key takeaways

  • Tie every optimization to a user-facing metric (LCP, TTI, p95, TTFT) and a business KPI (conversion, churn, AOV).
  • Create architecture-specific playbooks with quick wins, deeper fixes, and guardrails; version them like code.
  • Measure with synthetic and RUM; enforce with SLOs and error budgets; prove impact with A/B or holdout tests.
  • Automate rollout via GitOps (ArgoCD/Terraform), and memorialize learnings in runbooks and dashboards.
  • Expect diminishing returns; chase the biggest deltas first (network, cache, query shape) before exotic tuning.

Implementation checklist

  • Map each service/page to user-facing metrics (LCP, TTI, p95, Apdex) and business KPIs.
  • Baseline with Lighthouse, k6, production traces (OpenTelemetry), and RUM.
  • Prioritize quick wins: CDN, caching, timeouts, indexes, image formats, compression.
  • Set SLOs and budgets; wire alerts to user-impacting thresholds, not CPU graphs.
  • Automate changes with IaC and GitOps; ship with canaries and feature flags.
  • Prove value with A/B or holdouts and publish before/after dashboards.
  • Schedule regular load/regression tests and GameDays to keep playbooks fresh.

Questions we hear from teams

How do I tie performance to revenue credibly?
Use holdouts or A/B. For the impacted pages or APIs, track both performance (LCP, p95) and business KPIs (conversion, revenue per session). Compare deltas between test/control. Share a single dashboard that shows latency down, conversion up, with confidence intervals.
What if my biggest bottleneck is a third-party API?
Bulkhead it with separate connection pools, strict timeouts, retries with jitter, and a circuit breaker. Cache idempotent responses and design graceful degradation (placeholders, queued actions). Negotiate rate and latency SLOs with the vendor and monitor them as if they were your own service.
We’re already on a service mesh. Isn’t that enough?
Meshes give you the knobs; the playbook tells you where to set them and how to verify the business impact. You still need sane timeouts, retry budgets, SLOs, and canaries wired into rollout policy.
How often should we run load tests?
At minimum, before major launches and monthly thereafter. Automate a short k6 smoke in CI, plus a heavier off-peak run that mimics traffic mix. Re-baseline after big dependency upgrades (framework, runtime, DB).
Can we standardize these across teams without becoming a platform bottleneck?
Yes. Ship opinionated defaults as reusable Terraform/Helm modules and mesh policies, with escape hatches. Guard the SLOs, not the implementation details. Empower teams to deviate only with data.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to GitPlumbers about your next performance push See how we instrument playbooks end-to-end

Related resources