Stop Buying CPUs for Bad Code: A Pragmatic Framework to Balance Performance and Cloud Spend

If your p95 is creeping up and your AWS bill is doing the same, you don’t need a new instance type—you need a resource optimization framework that ties user experience to dollars.

Performance that doesn’t move a business metric is vanity—cost that doesn’t improve user-perceived latency is waste.
Back to all posts

The incident that made our CFO learn what p95 means

We were staring at a Datadog dashboard at 9:12 AM on a Tuesday. Marketing had just launched a promo; sessions doubled; p95 TTFB jumped from 380ms to 1.2s, and conversion cratered. AWS auto scaling dutifully spun up 40% more nodes. Revenue still dropped. The CFO asked why we were paying more to sell less. That’s the day we stopped treating performance as a “nice to have” and built a framework that balances user experience with cost—no heroics, no silver bullets.

I’ve seen this movie at VC-backed SaaS, old-guard enterprises, and unicorn marketplaces. The pattern is consistent: teams watch CPU graphs, finance watches the bill, users experience the lag. The fix is aligning user-facing metrics with a cost target and optimizing the hot path ruthlessly.

What to measure: user-facing performance tied to money

If you can’t map performance to revenue, you’ll lose the budget argument every time.

  • User metrics: LCP, INP, CLS from RUM (Datadog RUM, New Relic Browser, SpeedCurve, GA4). Track mobile separately. Tie segments (geo, device) to vitals and conversion.

  • Service metrics: p95/p99 latency, error rate, saturation (CPU, memory, connection pool usage), queue depth. SRE 101, but sampled by endpoint and customer tier.

  • Business metrics: conversion rate, AOV, churn/retention, $/order, cost per acquisition. Correlate perf changes with these directly.

  • Cost metrics: $ per 1k requests, $ per transaction, egress, storage IOPS. Tag infra (aws:cost-allocation-tag) so you can apportion cost per service.

As famously observed by Amazon and others, 100ms can move revenue. In practice, your multiplier is yours—instrument it. We usually see 1–4% conversion lift per 100–300ms TTFB/LCP improvement on mobile in transactional flows.

# Example: enable AWS cost allocation tags for a service via Terraform
resource "aws_ce_cost_allocation_tag" "svc" {
  tag_key = "service"
  status  = "Active"
}

The resource optimization framework

Here’s what actually works when the heat is on:

  1. Set SLOs with a cost ceiling. Example: p95 TTFB <= 500ms, 99.9% availability, error rate < 0.5%, and cost <= $0.50 per 1k requests for the storefront. Publish them.

  2. Define efficiency SLIs. RPS/vCPU, cache hit ratio, DB CPU/query, queue age, $ per request. Review weekly with product + finance.

  3. Instrument the hot path. Trace end-to-end with OpenTelemetry into Prometheus/Jaeger/Datadog. Add user_id/plan tags to see who suffers.

  4. Optimize before scaling. Edge cache, compress, eliminate N+1, batch outbound calls, pool DB connections. Only then, right-size infra.

  5. Automate safety. Canary with Argo Rollouts, guardrails on p95 latency and error budgets, automatic rollback on regression.

  6. Continuously validate. Run k6 load tests each release candidate; compare cost and latency to baselines.

We add a simple equation to keep teams honest:

$ per request = (compute + storage + egress + third-party APIs) / requests.

You don’t need CPAs in standup—just make the number visible next to p95 latency.

Tactical playbook with measurable outcomes

When you only have a week until the next promo, these are the moves that pay.

  • CDN/edge caching (CloudFront/Fastly)
    • Cache HTML for anon traffic; use stale-while-revalidate to keep the edge warm.
    • Push API responses that are cacheable (product lists, price snapshots) with short TTLs.
    • Expect: 30–70% origin offload; p95 TTFB drops 100–400ms; egress bill down 20–50%.
# Nginx behind Fastly/CloudFront: compress & cache friendly headers
brotli on;
brotli_comp_level 5;
brotli_types text/html text/css application/javascript application/json;

gzip on;

a location /catalog {
    add_header Cache-Control "public, s-maxage=300, stale-while-revalidate=600";
    try_files $uri @app;
}

location @app {
    proxy_pass http://app_upstream;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
  • Kill N+1 and slow queries (PostgreSQL)
    • Turn on pg_stat_statements, capture top 10 by total time.
    • Add missing composite indexes; avoid wildcard ILIKE on hot paths.
    • Expect: 2–10x query speedups, DB CPU down 20–60%, fewer scaling events.
-- Find the worst offenders
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;

-- Example fix: product lookup by category + status
CREATE INDEX CONCURRENTLY idx_products_category_status
  ON products (category_id, status)
  INCLUDE (price);

-- Verify improvement
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products WHERE category_id = 42 AND status = 'active';
  • Compress and shrink assets

    • Switch JS/CSS to brotli, images to WebP/AVIF with sharp/imgproxy.
    • Inline critical CSS; defer non-critical JS; kill 3rd-party tags that don’t convert.
    • Expect: LCP down 200–600ms on mobile; CDN egress down 15–30%.
  • Async the non-essential

    • Queue write-heavy or slow third-party calls (email, CRM, fraud checks) via SQS/Kafka.
    • Use idempotency keys and retries with jitter; cap concurrency.
    • Expect: p95 API latency down 100–300ms; steadier CPU; fewer tail spikes.
  • Connection pooling and timeouts

    • pgbouncer in transaction mode; sane connect_timeout and statement_timeout.
    • Circuit breakers (Envoy, Istio) to shed load instead of death by thundering herd.
# k6 smoke to find the knee of the curve
import http from 'k6/http';
import { sleep } from 'k6';

export let options = { stages: [
  { duration: '2m', target: 100 },
  { duration: '3m', target: 300 },
  { duration: '3m', target: 600 },
]};

export default function () {
  const res = http.get('https://yourdomain.com/api/checkout');
  if (res.status !== 200) { throw new Error('bad'); }
  sleep(1);
}

Right-size infra without guesswork

Scaling is not strategy. Right-sizing is.

  • Autoscale on meaningful signals. Use HPA on RPS or queue depth via Prometheus Adapter, not just CPU.
# HPA based on requests-per-second (custom metric) via Prometheus Adapter
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 4
  maxReplicas: 30
  metrics:
    - type: Pods
      pods:
        metric:
          name: requests_per_second
        target:
          type: AverageValue
          averageValue: "50"  # 50 RPS per pod
  • Pick the right silicon. m7g (Graviton3) often yields 20–40% better price/perf than x86. Validate with your workload.

  • Bin-pack smartly. Fewer, larger nodes improve pod density; use VPA to shrink memory hogs; Cluster Autoscaler with Pod Priority for critical paths.

  • Use Spot where it’s safe. Mix on-demand + spot for stateless tiers; respect interruption budgets and PodDisruptionBudget.

  • Keep the data close. Put caches/DB in the same AZ; egress and cross-AZ chatter burns cash and latency.

Cache TTLs are a finance decision

I’ve seen teams argue for hours about max-age. Here’s how to end it: price the trade-off.

  • Every 10% increase in cache hit ratio is usually 50–150ms faster TTFB and 5–15% less origin cost.
  • Use short s-maxage with stale-while-revalidate to get 80% of the benefit with low risk.
  • Add cache busting by key on writes (e.g., product update invalidates /product/:id).
HTTP/1.1 200 OK
Cache-Control: public, s-maxage=300, stale-while-revalidate=600
ETag: "p-12345-v4"

Measure origin offload in Fastly/CloudFront and map it to $ per 1k requests. Once finance sees the slope, TTL debates get short.

Ship safely: canaries wired to user metrics

Canary releases stop surprises if they’re tied to the numbers that matter.

  • Use Argo Rollouts canaries with Prometheus checks on p95 latency, error rate, and LCP (from RUM export).
  • Auto-abort if p95 degrades by >10% or error budget burn >2x.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 120 }
        - analysis:
            templates:
              - templateName: latency-check
            args:
              - name: p95
                value: "<500"  # guardrail
        - setWeight: 50
        - pause: { duration: 300 }
        - setWeight: 100

With GitOps (ArgoCD), your rollout configs live next to code. Post-merge, your pipeline runs load tests, updates HPA targets, and flips feature flags gradually (LaunchDarkly/OpenFeature).

Mini case study: rescuing a marketplace during a promo

A two-sided marketplace came to GitPlumbers two weeks before a flash sale. Mobile LCP was 3.2s, p95 TTFB 600ms, EC2 spend up 35% month-over-month.

What we changed in 9 days:

  • Fastly edge caching with stale-while-revalidate for anon HTML and catalog responses (300s TTL). Origin offload moved from 22% to 71%.

  • Killed two N+1s and added a composite index on orders(user_id, status); pgbouncer in transaction mode.

  • Brotli for text assets, WebP for hero images; trimmed three marketing tags that added 180ms on mobile.

  • HPA based on RPS (Prometheus Adapter) instead of CPU; swapped half the pool to m7g.large.

Results (measured over the 48h promo):

  • p95 TTFB: 600ms -> 280ms (-53%)

  • Mobile LCP: 3.2s -> 2.1s (-1.1s)

  • Conversion: +3.1% absolute on mobile PDP -> checkout

  • EC2 compute: -28% cost; egress: -18%

  • $ per 1k requests: $0.90 -> $0.50 (-44%)

The CFO didn’t care about Brotli vs. gzip. They cared that revenue per visit improved while spend went down.

What I’d do again (and what I’d skip)

Do again:

  • Establish cost SLOs alongside latency SLOs; publish the dashboard in the all-hands deck.

  • Attack the hot path first; caching + DB fixes usually beat infra changes on ROI.

  • Canary with performance guardrails; avoid big-bang releases during promos.

Skip:

  • Chasing exotic instance types before fixing N+1s and asset bloat.

  • CPU-only autoscaling; it lags reality and overscales on noisy neighbors.

  • Month-long tuning of a system without a baseline load test. Get a knee-of-the-curve in a day with k6.

Put it in motion this sprint

If you only have a week:

  1. Turn on RUM, chart LCP/INP/CLS next to conversion and device type.
  2. Pick SLOs plus a cost SLO; make $ per 1k requests visible.
  3. Add CDN caching with SWR for the top 2 endpoints; measure offload.
  4. Kill the top 5 queries in pg_stat_statements; pool connections.
  5. Switch to Brotli and WebP; drop deadweight third-party tags.
  6. HPA on RPS with a modest ceiling; canary the change with Argo Rollouts.
  7. Re-run your k6 test; compare to the baseline and decide what to scale—not guess.

When you’re ready to institutionalize this, GitPlumbers can help you wire the dashboards, budgets, and guardrails, and ship the boring, reliable changes that move both performance and the P&L.

Related Resources

Key takeaways

  • Tie performance SLOs directly to a cost SLO like $/request or $/transaction.
  • Measure the user—not the server—using RUM and Core Web Vitals (LCP, INP, CLS) and connect to revenue metrics.
  • Optimize the hot path first: cache at the edge, kill N+1 queries, compress aggressively, and right-size infra via autoscaling and load tests.
  • Make changes safe with canaries, error budgets, and automatic rollbacks wired to performance KPIs.
  • Track efficiency SLIs (RPS per vCPU, cache hit ratio, $/req) and review them weekly with product and finance.

Implementation checklist

  • Instrument RUM for LCP/INP/CLS and map to conversion rate by segment.
  • Define SLOs: p95 TTFB, LCP, error rate, plus a cost SLO ($/req).
  • Introduce efficiency SLIs: RPS/vCPU, cache hit ratio, DB CPU/query, queue age.
  • Put CloudFront/Fastly in front of origin; set SWR/S-maxage; measure origin offload.
  • Kill top 5 slow DB queries using pg_stat_statements and CREATE INDEX.
  • Load test with k6 to find the knee of the throughput/latency curve.
  • Enable HPA/VPA with custom metrics (RPS/queue depth) via Prometheus Adapter.
  • Ship with canaries (Argo Rollouts) and guardrails on p95 latency and error budgets.

Questions we hear from teams

What’s the fastest way to prove ROI on performance work?
Instrument RUM and correlate LCP/TTFB with conversion by device/geo. Then edge-cache the hottest unauthenticated endpoints and kill the top 5 slow queries. You’ll usually see measurable conversion lift within a week and reduced origin cost from offload.
How do we pick a cost SLO?
Start with historical spend divided by requests over a stable period—say $0.70 per 1k requests. Set an initial SLO 10–20% lower to force optimization. Track egress separately; it often hides big wins via caching and image compression.
Is switching to Graviton worth the churn?
Often yes for stateless services. We regularly see 20–40% better price/perf. Validate with a canary pool under production traffic. For JVM stacks, ensure you’re on recent JDKs and tune GC; for Go/Node, changes are typically minimal.
Can we autoscale on RPS in Kubernetes without vendor lock-in?
Yes. Use Prometheus metrics and the Prometheus Adapter to expose custom metrics to HPA. Scale on requests_per_second or queue depth instead of CPU. Keep CPU as a fallback to avoid sudden drops.
How do we keep optimization from becoming a one-off project?
Make efficiency SLIs first-class: put $/req and RPS/vCPU on the same dashboard as p95. Review in a weekly perf standup with product and finance. Tie promotions and feature rollouts to error budgets and performance guardrails.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to GitPlumbers about a Performance + Cost Audit Download our Performance Guardrails checklist

Related resources