Do I still need canaries if I’m doing blue‑green?

Often yes. Blue‑green handles fast, reversible cutovers. A brief canary (1–5 minutes, 1–5% traffic) on the green stack catches obvious regressions before the full flip. Tools like Argo Rollouts or Istio make a short canary trivial.

What about databases—can I do blue‑green with a single primary?

Yes, but only with backward‑compatible changes. Use expand‑contract migrations, dual‑writes behind a feature flag, and online tools like gh-ost or pt-online-schema-change. Avoid destructive changes until blue is retired and all callers are upgraded.

Is DNS switching acceptable for blue‑green?

It works in a pinch but is slower and riskier due to TTL and resolver caching. Prefer L7 switches (ALB/Ingress/Envoy). If you must use DNS, set low TTLs (30s), pre‑warm green, and accept slower rollback.

How do I control costs of running two stacks?

Keep blue hot only during the hold (e.g., 30–120 minutes), then scale it to zero or destroy. Use HPA with a low min-replicas and tag resources for automatic cleanup. The cost is usually offset by fewer incidents and faster delivery.

How do I measure success?

Track DORA metrics: change failure rate should drop toward single digits; MTTR should trend under 15 minutes; lead time should shorten as blue‑green becomes templatized. Add business metrics (conversion, error budgets burned) to confirm customer impact.

Release-engineering · Oct 2, 2025 · 10 minute read

Blue‑Green Without the Drama: Zero‑Downtime Releases That Don’t Spike Your CFR

A pragmatic playbook for designing, operating, and scaling blue‑green so your change failure rate, lead time, and recovery time trend the right way.

Back to all posts

Blue‑Green Without the Drama: Zero‑Downtime Releases That Don’t Spike Your CFR

Key takeaways

Implementation checklist