Should we stop Friday deploys?

No. You should stop unsafe deploys. If you can auto-abort canaries, flip traffic, and toggle features within five minutes, Friday is just another day. If you can’t, the day isn’t the problem—your rollback design is.

How do we handle database rollbacks?

Don’t. Design for roll-forward. Use expand/contract: add new structures, dual-write, backfill, then switch reads. Rollback means toggling code paths off. Only perform destructive drops in a separate, low-risk change after stability is proven.

What about feature flag debt?

Treat flags like code. Each flag has an owner, a TTL, and a cleanup ticket. Separate kill switches (operational) from experiments (product). Instrument flags in logs and dashboards so you can correlate behavior with toggle states.

Who owns rollback in the org?

Platform owns the tooling and guardrails. Service teams own their rollback runbooks and drills. SRE validates SLOs and gates. Execs track CFR, lead time, and MTTR. Everybody practices.

How do we test rollbacks without scaring customers?

Shadow traffic, synthetic transactions, and monthly drills on non-critical windows. Use canary releases with tiny weights (1–5%), real SLO gates, and automatic aborts. Make practice boring so production is boring.

Release-engineering · Oct 3, 2025 · 9 minute read

Rollback-First: The Boring Friday Deploy Playbook

If you can’t reverse a bad change in five minutes, you don’t have continuous delivery—you have continuous roulette.

Back to all posts

Rollback-First: The Boring Friday Deploy Playbook

Key takeaways

Implementation checklist