Stop Paying for p99 You Don’t Need: A Framework That Balances Performance and Cost
Tie user-facing SLOs to unit economics, then optimize where it pays back. No heroics, just a repeatable loop that keeps customers happy and your COGS sane.
If a change doesn’t move a user metric and a cost metric in the same direction, it’s not optimization—it’s vanity.Back to all posts
Key takeaways
- Optimize for user-facing metrics (LCP, INP, p95 API) tied to revenue, not abstract CPU graphs.
- Set performance SLOs and cost budgets per user journey; measure cost-per-transaction alongside latency.
- Use a tight control loop: instrument, budget, experiment, right-size, and gate changes with error budgets.
- Focus on proven wins: CDN + cache TTLs, SQL query plans, compression, autoscaling targets, and instance selection.
- Report results in business terms: conversion lift, COGS reduction, capacity unlocked.
Implementation checklist
- Define 3-5 critical user journeys and attach performance SLOs (LCP/INP/p95) and cost-per-action budgets.
- Instrument RUM and backend tracing; centralize in Grafana with Prometheus and a cost source (Kubecost/Cloud billing).
- Create Prometheus recording rules for latency and cost; alert on SLO burn rates and cost budget breaches.
- Implement autoscaling guardrails (target utilization 55–65%, min/max replicas, p95-based scale-out).
- Ship quick wins: CDN cache rules, Brotli, SQL indexing, Redis hot keys, image format swaps (WebP/AVIF).
- Adopt canary/feature flags; roll changes under error-budget guardrails.
- Review weekly: SLO status, cost per journey, top regressions, and next experiments.
Questions we hear from teams
- Should we target p99 or p95?
- Target p95 for most journeys. p99 is expensive and often dominated by outliers (mobile radio, cold caches). Invest in p99 only for flows where a single slow request kills revenue (e.g., payment auth).
- How do we measure cost per action accurately?
- Tag resources by journey/service in Terraform, ingest cloud cost (Kubecost, AWS CUR) into Prometheus, and divide by action counters (e.g., `checkout_completed_total`). Smooth over a 7–30 day window to reduce noise.
- Will Graviton/ARM break our stack?
- Most modern runtimes are fine. Build multi-arch images, verify native deps (e.g., `sharp`, `pg-native`, `grpc`), and canary 10%. We’ve transitioned Node, Python, Java, and Go services with minimal work.
- What if RUM makes privacy nervous?
- Collect metrics, not PII. Sample aggressively, hash IDs, and honor consent. Tools like DataDog RUM, OpenTelemetry SDKs, or bare `web-vitals` let you control payloads precisely.
- Do we need a service mesh to do this?
- No. Mesh helps with retries and circuit breaking at scale, but you can start with NGINX/Envoy ingress and app-level timeouts. Don’t introduce a mesh unless you have multi-team needs that justify the complexity.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
