Your CDN Isn’t “On” — It’s Misconfigured (And Your Global Users Are Paying the Price)
CDNs don’t magically fix latency. Cache keys, TTLs, compression, image variants, and origin shielding do. Here’s the playbook we use at GitPlumbers to turn “we have a CDN” into measurable LCP and conversion wins across continents.
A CDN doesn’t make you fast. A disciplined cache key, sane TTLs, and a strategy for misses does.Back to all posts
The pattern I keep seeing: “We have a CDN” (but LCP in APAC is still brutal)
I’ve lost count of the times a team told me, “We’re on CloudFront/Cloudflare, so performance should be fine,” while their users in Sydney are staring at a blank page for 4–6 seconds.
What’s happening under the hood is usually boring and fixable:
- The HTML isn’t cached (or can’t be), so every request pays the origin RTT.
- The cache key is fragmented by query params or headers, so the hit rate is trash.
- Images are served as one-size-fits-none
jpegmonsters, and every region pays for it. - The origin gets thundering-herded on deploys because there’s no shielding or stale serving.
When you fix those, the wins show up where leadership actually cares:
- p75 LCP drops (often by 500ms–2000ms globally)
- conversion rate and activation tick up (even small % moves are huge at scale)
- origin load drops (infra cost + fewer incident pages)
At GitPlumbers, we treat CDN work like plumbing: it’s invisible when it works, catastrophic when it doesn’t.
Measure like you mean it: tie edge changes to user-facing KPIs
Before touching config, pick the metrics that will keep you honest:
- Core Web Vitals (RUM):
LCP,INP,CLS - Network/user-perceived:
p75 TTFBby region,p95for your top routes - CDN health: cache hit rate,
origin_fetches,5xx,shieldhit rate - Business: checkout completion, signup conversion, session depth, support tickets tagged “slow”
A lightweight baseline workflow:
- Run a synthetic check from multiple geos.
- Compare with real users (RUM) because synthetic lies by omission.
# Quick-and-dirty: check TTFB and cache status
curl -s -o /dev/null -w '\nTTFB:%{time_starttransfer}s Total:%{time_total}s\n' \
-H 'Cache-Control: no-cache' \
https://www.example.com/
# Inspect cache headers
curl -I https://www.example.com/assets/app.jsWhat you want to see (varies by CDN):
Cache-Controlis explicit (not defaulted)Ageincreases on hot objectsCF-Cache-Status: HITorX-Cache: Hit from cloudfront
If you don’t have RUM, you’re tuning a race car in a dark garage. Get some client-side measurement running before you declare victory.
Cache behavior is the whole game: separate HTML, static assets, and APIs
Most “CDN didn’t help” stories boil down to mixing content types under one caching policy.
A practical split that works in the real world:
- Static hashed assets (
/assets/app.8c1f3d.js): cache hard (1 year), immutable - Images: cache for days/weeks, but control variants (see next section)
- HTML: cache cautiously (minutes), use
stale-while-revalidate - APIs: cache selectively (GETs for public data), otherwise don’t
Example Cache-Control guidance:
- Hashed static:
public, max-age=31536000, immutable - HTML (SSR):
public, max-age=60, stale-while-revalidate=300, stale-if-error=86400
Here’s a CloudFront distribution snippet (Terraform) showing different behaviors:
resource "aws_cloudfront_distribution" "site" {
enabled = true
origin {
domain_name = "origin.example.com"
origin_id = "origin"
custom_origin_config {
http_port = 80
https_port = 443
origin_protocol_policy = "https-only"
origin_ssl_protocols = ["TLSv1.2"]
}
}
default_cache_behavior {
target_origin_id = "origin"
viewer_protocol_policy = "redirect-to-https"
# HTML: short TTL, allow stale patterns via origin headers
cache_policy_id = aws_cloudfront_cache_policy.html.id
origin_request_policy_id = aws_cloudfront_origin_request_policy.minimal.id
response_headers_policy_id = aws_cloudfront_response_headers_policy.security.id
}
ordered_cache_behavior {
path_pattern = "/assets/*"
target_origin_id = "origin"
viewer_protocol_policy = "redirect-to-https"
cache_policy_id = aws_cloudfront_cache_policy.static_immutable.id
origin_request_policy_id = aws_cloudfront_origin_request_policy.none.id
}
}Measurable outcomes we commonly see after this split (assuming you had “one policy to rule them all” before):
- CDN hit rate up from ~40–60% to 85–95% on static content
- Origin CPU down 20–50% (less TLS, less app rendering, fewer IO spikes)
- p75 TTFB improvement of 150–500ms for cached HTML in far regions (depends on origin geo)
Fix your cache key (or enjoy paying for the same bytes 10,000 different ways)
Cache keys are where good CDNs go to die. I’ve seen teams accidentally include:
Authorizationheaders on public assets (instant miss)- marketing query params (
utm_*,fbclid,gclid) on HTML or images - device headers (
User-Agent) that explode variants
Rules of thumb that keep performance sane:
- Only vary on what truly changes the response
- Strip junk query params at the edge
- Prefer a small allowlist over a “forward all” default
Cloudflare Workers example: normalize query params so the cache key doesn’t fragment:
export default {
async fetch(request) {
const url = new URL(request.url)
// Keep only the params that actually affect content
const allowed = new Set(["lang", "currency"])
for (const key of [...url.searchParams.keys()]) {
if (!allowed.has(key)) url.searchParams.delete(key)
}
const normalized = new Request(url.toString(), request)
return fetch(normalized, {
cf: {
cacheTtl: 300,
cacheEverything: true,
},
})
},
}What this buys you in business terms:
- Higher hit rate → lower origin spend
- Fewer cache misses → lower tail latency (p95/p99) → fewer rage clicks and abandoned carts
If you only do one thing this quarter: audit cache keys on your top 20 routes and remove variant explosions.
Images and fonts: the usual suspects behind global LCP
In almost every Core Web Vitals review I’ve done since 2020, the biggest LCP offenders are:
- hero images shipped at 2–5MB
- unoptimized
pngwhereavifwould be ~10–30% of the bytes - fonts that block rendering and aren’t cached aggressively
Concrete tactics that work:
- Serve AVIF/WebP with a controlled variant set (don’t vary on every header)
- Pre-size images and use
srcsetso mobile doesn’t download desktop - Cache fonts for a year and use
font-display: swap
Nginx snippet (origin-side) that sets sane caching for fonts and hashed assets:
location ~* \.(?:js|css)$ {
add_header Cache-Control "public, max-age=31536000, immutable";
}
location ~* \.(?:woff2|woff)$ {
add_header Cache-Control "public, max-age=31536000, immutable";
add_header Access-Control-Allow-Origin "*";
}Expected outcomes when you get image strategy right:
- LCP improves 300ms–1500ms depending on how bad it was
- Bandwidth drops 20–60% on image-heavy pages
- Mobile conversion improves measurably (this is where slow hurts the most)
This is also where “AI-generated frontends” (yes, the vibe-coded ones) tend to be pathological: five different image components, each inventing its own query params and sizes. GitPlumbers ends up doing a lot of vibe code cleanup here.
Reduce origin distance and blast radius: origin shield, tiered caching, and stale serving
Even with a perfect cache policy, you’ll still have cache misses. The question is whether misses:
- quietly hit a nearby shield and return fast, or
- dogpile your origin until it keels over
What actually works in production:
- Origin Shield / Tiered Cache (CloudFront Origin Shield, Fastly Shielding, Cloudflare Tiered Cache)
stale-while-revalidateso users don’t wait on revalidationstale-if-error(or CDN serve-stale) so deploys/incidents don’t turn into global outages
Typical “we stopped paging” configuration shape:
- Choose a shield region close to the origin (or close to your database if that’s the real bottleneck)
- Enable serve-stale on 5xx for 1–24 hours depending on content risk
A pragmatic header for HTML that tolerates brief origin issues:
Cache-Control: public, max-age=60, stale-while-revalidate=300, stale-if-error=86400Business impact I’ve seen (especially for e-commerce and SaaS dashboards):
- Lower error rate during deploys (customers see slightly stale pages instead of failures)
- Lower MTTR because you’re not firefighting a global stampede
- Better p95 TTFB because shields reduce long-haul origin trips
Protocols and compression: the “boring” wins that stack up
This stuff won’t make a splashy demo, but it’s cumulative:
- Enable Brotli (
br) for text (html,js,css,json) - Prefer HTTP/2 and HTTP/3 where supported
- Keep TLS modern and fast (ECDSA certs can help, but don’t start a holy war)
Quick verification:
# Check Brotli
curl -I -H 'Accept-Encoding: br' https://www.example.com/assets/app.js
# Confirm HTTP/3 (depends on curl build)
curl --http3 -I https://www.example.com/Typical measurable outcomes:
- 10–25% smaller transfers for text assets vs gzip (varies)
- Faster start render on high-latency links due to protocol improvements
The catch: if your CDN is doing Brotli but your cache key varies on Accept-Encoding incorrectly, you can blow your hit rate. Make sure your CDN is handling encoding variants sanely.
The playbook we use at GitPlumbers (so you can ship it without a quarter-long yak shave)
If you want this tight and predictable, run it like an engineering project, not a “CDN tuning week.”
- Pick 3–5 key routes (homepage, pricing, signup, checkout, top API GET)
- Baseline: p75 LCP/TTFB by region, CDN hit rate, conversion KPI
- Implement in this order:
- Cache split by content type
- Cache key normalization
- Image/font strategy
- Shielding + stale policies
- Compression/protocol cleanup
- Validate with:
- RUM deltas (not just lab)
- CDN logs (hit/miss, origin fetch)
- Business KPI movement (even a small lift matters)
What “good” looks like for global sites after the dust settles:
- Static asset hit rate 90%+
- HTML hit rate 60–90% (depends on personalization)
- p75 LCP under ~2.5s for major geos (your mileage varies by app)
- Fewer deploy-related incidents and fewer “site is slow” tickets
If you’re stuck in the middle — half legacy, half AI-assisted rewrites, three caching layers, and nobody remembers why Vary: User-Agent exists — that’s exactly the sort of mess GitPlumbers gets called into. We don’t sell silver bullets; we fix the plumbing, measure the deltas, and make sure you can maintain it after we leave.
Key takeaways
- A CDN only reduces latency if your **cache hit rate** is high and your **cache key** isn’t exploding variants.
- Prioritize **LCP** (images/fonts/HTML) and **TTFB** (origin shielding, tiered caching, keep-alive) because they show up in conversion and retention.
- Use **stale-while-revalidate** and **serve-stale-on-error** to improve both perceived performance and resilience during deploys/incidents.
- Compress correctly (**Brotli for text**, optimized formats for images) and avoid cache fragmentation with clean `Cache-Control` + disciplined `Vary`.
- Instrument with **RUM** + CDN logs; don’t guess. Tie improvements to a KPI (checkout completion, signups, revenue).
Implementation checklist
- Define success metrics: **p75 LCP**, **p75 TTFB**, **INP**, CDN **hit rate**, and a business KPI (conversion/activation).
- Split caching by asset type: HTML vs static assets vs APIs; set explicit `Cache-Control` for each.
- Fix cache keys: remove irrelevant query params, normalize headers, avoid `Vary: *` behavior.
- Enable **Brotli** and confirm `Content-Encoding: br` at the edge for text assets.
- Implement **image variants** (AVIF/WebP), cache them at the edge, and cap variant explosion.
- Add **origin shield/tiered caching** and protect the origin from thundering herds.
- Adopt `stale-while-revalidate` + `stale-if-error` where safe.
- Set up RUM dashboards + CDN log sampling to track regressions by region and route.
Questions we hear from teams
- Should we use a multi-CDN setup to reduce global latency?
- Sometimes, but it’s rarely your first win. Multi-CDN adds real operational complexity (routing, cache warming, WAF parity, observability). Get one CDN performing well first: high hit rate, clean cache keys, shielding, and stale policies. Then consider multi-CDN for business reasons (regional reach, vendor risk, negotiated pricing, DDoS posture).
- Can we cache HTML safely if the site is personalized?
- Yes, but you need segmentation discipline. Cache the anonymous shell (or edge-render the shell), vary only on the minimum (e.g., `Accept-Language`), and move personalization to client-side calls or edge compute where it’s safe. The failure mode is varying on cookies/headers that create per-user cache entries.
- What’s the quickest way to see if the CDN is helping?
- Check `TTFB` and cache status headers by region. If `TTFB` is still high and you see consistent misses, your HTML isn’t cached or your cache key is fragmented. Then confirm with RUM: p75 LCP by geography should move if you fixed the right thing.
- What headers matter most for CDN caching?
- `Cache-Control`, `ETag`/`Last-Modified`, and a disciplined `Vary`. Most incidents come from missing/incorrect `Cache-Control` on HTML or accidental `Vary` values that explode cache variants (like `User-Agent`).
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
