Stop Paying for p99 You Don’t Need: A Framework That Balances Performance and Cost

Tie user-facing SLOs to unit economics, then optimize where it pays back. No heroics, just a repeatable loop that keeps customers happy and your COGS sane.

If a change doesn’t move a user metric and a cost metric in the same direction, it’s not optimization—it’s vanity.
Back to all posts

Related Resources

Key takeaways

  • Optimize for user-facing metrics (LCP, INP, p95 API) tied to revenue, not abstract CPU graphs.
  • Set performance SLOs and cost budgets per user journey; measure cost-per-transaction alongside latency.
  • Use a tight control loop: instrument, budget, experiment, right-size, and gate changes with error budgets.
  • Focus on proven wins: CDN + cache TTLs, SQL query plans, compression, autoscaling targets, and instance selection.
  • Report results in business terms: conversion lift, COGS reduction, capacity unlocked.

Implementation checklist

  • Define 3-5 critical user journeys and attach performance SLOs (LCP/INP/p95) and cost-per-action budgets.
  • Instrument RUM and backend tracing; centralize in Grafana with Prometheus and a cost source (Kubecost/Cloud billing).
  • Create Prometheus recording rules for latency and cost; alert on SLO burn rates and cost budget breaches.
  • Implement autoscaling guardrails (target utilization 55–65%, min/max replicas, p95-based scale-out).
  • Ship quick wins: CDN cache rules, Brotli, SQL indexing, Redis hot keys, image format swaps (WebP/AVIF).
  • Adopt canary/feature flags; roll changes under error-budget guardrails.
  • Review weekly: SLO status, cost per journey, top regressions, and next experiments.

Questions we hear from teams

Should we target p99 or p95?
Target p95 for most journeys. p99 is expensive and often dominated by outliers (mobile radio, cold caches). Invest in p99 only for flows where a single slow request kills revenue (e.g., payment auth).
How do we measure cost per action accurately?
Tag resources by journey/service in Terraform, ingest cloud cost (Kubecost, AWS CUR) into Prometheus, and divide by action counters (e.g., `checkout_completed_total`). Smooth over a 7–30 day window to reduce noise.
Will Graviton/ARM break our stack?
Most modern runtimes are fine. Build multi-arch images, verify native deps (e.g., `sharp`, `pg-native`, `grpc`), and canary 10%. We’ve transitioned Node, Python, Java, and Go services with minimal work.
What if RUM makes privacy nervous?
Collect metrics, not PII. Sample aggressively, hash IDs, and honor consent. Tools like DataDog RUM, OpenTelemetry SDKs, or bare `web-vitals` let you control payloads precisely.
Do we need a service mesh to do this?
No. Mesh helps with retries and circuit breaking at scale, but you can start with NGINX/Envoy ingress and app-level timeouts. Don’t introduce a mesh unless you have multi-team needs that justify the complexity.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Get a 30‑day performance–cost blueprint See how we instrument cost per action

Related resources