Do we need ArgoCD/Argo Rollouts, or will Spinnaker/Flux work?

Use what fits your stack. We’ve implemented the same pattern with Spinnaker plus Prometheus canaries, and with Flux + Flagger. The key is: GitOps for desired state, progressive delivery with metric gates, and a controller that understands a release manifest. Tools are interchangeable if they support those capabilities.

How do you handle cross‑service DB migrations safely?

Use expand/contract. Release A does the expand (add nullable columns, backfill if needed) and ships behind a feature flag. Services read/write compatibly. Once traffic proves stable, Release B removes the old code path and performs the contract (drop/make non‑null). Never do destructive changes in the same release that introduces new readers/writers.

Feature flags vs canaries—do we need both?

Yes, they solve different problems. Canaries reduce blast radius for a new binary. Feature flags decouple risky behavior changes from deploys. We deploy with canaries and then ramp behavior with flags, so rollbacks are binary or flag flips rather than emergency patches.

How do we measure change failure rate without a lot of manual bookkeeping?

Emit a release event when a rollout starts and finishes, include `release_id`, versions, and result. Integrate your incident tool to auto‑tag releases that trigger alerts or policy violations as failures. Grafana/Looker can compute CFR as failed/(total) releases automatically.

Release-engineering · Oct 2, 2025 · 10 minute read

The Release Train That Finally Worked: Automating Multi‑Service Deploys Without Spiking CFR

What we ship is a system, not a service. Here’s the playbook we use to automate multi‑service releases, cut change failure rate, and make rollback boring.

Back to all posts

The Release Train That Finally Worked: Automating Multi‑Service Deploys Without Spiking CFR

Key takeaways

Implementation checklist