Stop Breaking Clients: A Field‑Tested API Versioning Playbook That Actually Preserves Backward Compatibility

I’ve seen more outages from “harmless” API tweaks than from full rewrites. Here’s the pragmatic versioning strategy we deploy at clients to ship changes without lighting up PagerDuty.

I’ve seen more outages from “minor” API tweaks than from data center failures. Compatibility isn’t a nice-to-have—it’s an SLO.
Back to all posts

The outage we caused by adding one field

I watched a Fortune 500 team ship a “minor” change: add currency to an invoice response. They didn’t bump the version or default the value. Half their Python clients deserialized into strict dataclasses and started throwing. Cue a Friday night rollback.

If you’ve been burned like this, you know the rule: with APIs, compatibility is a business SLO. This guide is the versioning playbook we use at GitPlumbers to ship changes without breaking revenue.

  • Goal: maintain backward compatibility while evolving fast
  • Scope: REST/JSON primarily, with notes on GraphQL and gRPC
  • Tooling: OpenAPI 3.1, Spectral, Prism, Kong/Envoy, Pact, Prometheus, ArgoCD

Step 1: Write down your versioning policy

Pick one approach and commit. I’ve seen teams paralyze themselves bikeshedding. The real failures happen when three styles coexist.

Common styles:

  • Path-based: /v1/customers/123 — simplest, caches love it, explicit.
  • Media type (Accept header): Accept: application/vnd.acme.v2+json — great for evolution without URL churn; needs gateway support.
  • Subdomain: v1.api.example.com — workable but adds DNS/SSL overhead.

My default for public APIs: path-based. For internal/mobile where SDKs can negotiate: media type.

Version identifiers:

  • Major only (v1, v2). Patch/minor changes are backward compatible and don’t change the version identifier.
  • Tie releases to a changelog and Deprecation/Sunset schedule.

Checkpoint:

  • A one-page “API Versioning Policy” in the repo. Include examples, headers, and routing rules.

Metric:

  • Track percent of traffic by version: v1, v2, unknown. Unknown should be <1%.

Step 2: Define compatibility rules (and enforce them)

Compatibility is a contract. The rules we put in every client’s style guide:

  • Additive changes only: you can add new optional fields, endpoints, and query params.
  • Never rename, remove, or repurpose fields. Don’t change types. Don’t change meanings.
  • New fields must be nullable or have sane defaults.
  • Maintain idempotency and pagination shapes.
  • Writers are strict; readers are tolerant. Servers accept known inputs only; clients ignore unknown outputs.

OpenAPI 3.1 governance with Spectral to enforce this in CI:

# spectral.yml
extends: ["spectral:oas", "spectral:asyncapi"]
rules:
  no-breaking-removal:
    description: "Do not remove response properties in existing versions"
    given: "$..responses..content.*.schema.properties"
    then:
      function: falsy
      field: "x-removed"
      message: "Use a new version; do not remove properties"
  new-fields-are-optional:
    description: "New properties must be optional and nullable or have defaults"
    given: "$.components.schemas[*].properties[*]"
    then:
      function: schema
      functionOptions:
        schema:
          anyOf:
            - properties: { nullable: { const: true } }
            - properties: { default: {} }

CI step:

npx @stoplight/spectral-cli lint openapi.yaml -r spectral.yml

Use Prism to mock and validate clients adapt to unknown fields:

npx @stoplight/prism-cli mock openapi.yaml --cors

Checkpoint:

  • CI fails on breaking changes. Developers can’t merge without a new version or a waiver.

Metric:

  • Count of blocked PRs due to breaking rules; should trend down after first month.

Step 3: Implement routing and negotiation at the edge

Centralize version routing so application code stays boring. Two proven patterns:

  1. Path-based with NGINX map
map $request_uri $api_version {
  default v1;
  ~^/v2/ v2;
}

upstream api_v1 { server 10.0.0.10:8080; }
upstream api_v2 { server 10.0.0.11:8080; }

location ~ ^/v[12]/customers {
  proxy_set_header X-API-Version $api_version;
  proxy_pass http://api_$api_version;
}
  1. Media-type with Kong declarative config
_format_version: "3.0"
routes:
  - name: customers-v2
    headers:
      Accept:
        - "application/vnd.acme.v2+json"
    strip_path: false
    service: customers_v2
  - name: customers-v1
    paths: ["/customers"]
    strip_path: false
    service: customers_v1
plugins:
  - name: response-transformer
    config:
      add:
        headers:
          - "X-API-Version:v2"

App code can still double-check for safety:

// Express example
app.get('/v1/customers/:id', getCustomerV1);
app.get('/v2/customers/:id', getCustomerV2);

Checkpoint:

  • Every request gets a version label at the edge (X-API-Version) for metrics and logs.

Metrics:

  • 4xx and 5xx by version
  • P95 latency by version
  • Request mix by version (adoption curve)

Step 4: Evolve schemas without breaking consumers

This is where most teams slip. Show your engineers the before/after explicitly.

OpenAPI example (add currency safely with default):

openapi: 3.1.0
info:
  title: Billing API
  version: 1.3.0
paths:
  /v1/invoices/{id}:
    get:
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/InvoiceV1_1'
components:
  schemas:
    InvoiceV1:
      type: object
      required: [id, total]
      properties:
        id: { type: string }
        total: { type: number }
    InvoiceV1_1:
      allOf:
        - $ref: '#/components/schemas/InvoiceV1'
        - type: object
          properties:
            currency:
              type: string
              default: USD

gRPC notes:

  • Only add fields with new numbers; never change types.
  • Use reserved for removed fields to avoid reuse.
  • Use wrapper types for optionals.
syntax = "proto3";
import "google/protobuf/wrappers.proto";

message Customer {
  string id = 1;
  string name = 2;
  reserved 3; // old 'age' field, do not reuse
  google.protobuf.StringValue email = 4; // optional
}

GraphQL notes:

  • Don’t version the endpoint. Use @deprecated and additive fields.
  • Communicate removal dates and provide query examples.
type Invoice {
  id: ID!
  total: Float!
  currency: String @deprecated(reason: "Use money.amount and money.currency")
  money: Money!
}

directive @deprecated(reason: String) on FIELD_DEFINITION | ENUM_VALUE

Checkpoint:

  • For each change, the PR includes: OpenAPI diff, migration notes, and mock server examples.

Metric:

  • Consumer breakage caught pre-merge via contract tests > 90% of the time.

Step 5: Prove compatibility with contract tests

Relying on Postman collections and hope isn’t enough. Wire in consumer-driven contracts for critical consumers.

Pact example (Node):

// pact.test.js
const { Pact } = require('@pact-foundation/pact');
const { Matchers } = require('@pact-foundation/pact');

const provider = new Pact({ consumer: 'MobileApp', provider: 'CustomersAPI' });

describe('v1 customer contract', () => {
  beforeAll(() => provider.setup());
  afterAll(() => provider.finalize());

  it('returns a customer with additive fields ignored', async () => {
    await provider.addInteraction({
      state: 'customer 123 exists',
      uponReceiving: 'a request for customer 123',
      withRequest: { method: 'GET', path: '/v1/customers/123' },
      willRespondWith: {
        status: 200,
        headers: { 'Content-Type': 'application/json' },
        body: {
          id: '123',
          name: 'Ada',
          // currency can appear in future; consumer must ignore
          currency: Matchers.like('USD')
        }
      }
    });
    // invoke your client here
  });
});

CI wiring (GitHub Actions):

name: contract-tests
on: [push, pull_request]
jobs:
  pact:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test

Checkpoint:

  • All high-traffic consumers publish contracts; provider verifies on PRs.

Metrics:

  • Number of contract verifications per release
  • Time to detect breaking change in CI (target <10 minutes)

Step 6: Roll out safely with canaries and SLOs

If your rollout plan is “flip the switch,” you’re volunteering for an incident review. Use progressive delivery.

  • Put version into metrics and logs at the gateway.
  • Canary v2 to 1–5% via gateway routing, then ramp.
  • Watch SLOs for 30–60 minutes at each step.

Istio VirtualService example:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: customers
spec:
  hosts: ["api.example.com"]
  http:
    - match:
        - uri: { prefix: "/v2/customers" }
      route:
        - destination: { host: customers-v2 }
          weight: 5
        - destination: { host: customers-v1 }
          weight: 95

PromQL you’ll actually use:

sum(rate(http_requests_total{route="/customers",version!=""}[5m])) by (version)
sum(rate(http_requests_errors_total{version="v2"}[5m]))
  /
sum(rate(http_requests_total{version="v2"}[5m]))

Checkpoint:

  • Runbooks define rollback conditions: error rate deltas, latency regressions, saturation.

Metrics:

  • Error budget burn by version (v2 must not exceed SLO)
  • MTTR when canary fails (target <15 minutes)

Step 7: Deprecation and sunset that won’t burn goodwill

Deprecation is communication + instrumentation + time. Headers help, but they’re not the plan.

  • Add Deprecation: true (IETF draft) and Sunset (RFC 8594) headers months in advance.
  • Return Link: </v2/customers>; rel="successor-version".
  • Publish a calendar and changelog; email known integrators; update SDKs.
  • Provide shims/adapters in SDKs to ease migration.

Express example adding headers:

app.use('/v1', (req, res, next) => {
  res.setHeader('Deprecation', 'true');
  res.setHeader('Sunset', 'Tue, 31 Dec 2025 23:59:59 GMT');
  res.setHeader('Link', '</v2/customers>; rel="successor-version"');
  next();
});

Sunsetting checklist:

  1. 95% of traffic off old version for 30 days
  2. All known high-value clients migrated
  3. Deprecation headers present for 90 days
  4. No elevated error rates on successor version

Metrics:

  • Weekly adoption trend per version
  • Count of active API keys hitting deprecated endpoints

I’ve never seen a deprecation go sideways when the team had per-version metrics, contracts, and a public calendar. I’ve seen plenty fail when it was “just a Jira ticket.”

What good looks like (and what we’d change next time)

What works:

  • One-page policy, backed by OpenAPI + Spectral + CI gates
  • Edge routing with per-version metrics; canary + rollback guardrails
  • Contract tests for top consumers; mock servers for dev velocity
  • Clear deprecation headers, docs, SDK shims, and a published timeline

What we’d change:

  • Add dashboards early; don’t wait for v2 to realize you can’t measure adoption
  • Bake migration examples into SDK READMEs with copy/paste diffs
  • Make “ignore unknown fields” a linters rule in client repos

If you need a second set of eyes or someone to trench through your messy lived-in stack, this is GitPlumbers’ bread and butter. We’ve done this for fintechs on Kong, SaaS on Envoy, and marketplaces on API Gateway + Lambda.

Related Resources

Key takeaways

  • Pick one versioning style and write it down. Consistency beats purity.
  • Compatibility is a contract: additive changes only, strict writers/tolerant readers, and never repurpose fields.
  • Govern with OpenAPI + linters + contract tests. Don’t rely on tribal knowledge.
  • Route by version at the edge. Use canaries and error budgets to de-risk rollouts.
  • Measure adoption and breakage by version. If you can’t see it, you can’t sunset it.
  • Deprecation is a process, not a header. Communicate, instrument, and provide SDKs/migrations.

Implementation checklist

  • Document your versioning policy and compatibility rules in the repo README and API style guide.
  • Choose one versioning style (path, header/media type, or subdomain) and implement gateway routing.
  • Adopt OpenAPI 3.1 and add Spectral rules to block breaking changes in CI.
  • Stand up consumer-driven contracts (Pact) for key consumers and wire into CI.
  • Expose per-version metrics and logs; label traffic with `version` and `client`.
  • Add Deprecation and Sunset headers; publish a deprecation calendar and changelog.
  • Canary new versions by 1–5%, watch SLOs for 30–60 minutes, then gradually ramp.
  • Provide migration guides and SDK shims; track adoption weekly until 95%+ migrated.

Questions we hear from teams

Path versioning or Accept header—what should we pick?
If your API is public or cache-heavy, choose path versioning (`/v1`). If you control clients (mobile/web SDKs) and want smoother evolution without URL churn, use media types (`Accept: application/vnd.acme.v2+json`). Don’t mix styles within the same surface area.
When do we bump the major version?
Only when you need to remove, rename, or repurpose fields; change types; or alter semantics that break tolerant readers. Additive changes (new optional fields, endpoints, query params) should not bump major.
How do we do this with GraphQL?
Don’t version the endpoint. Use `@deprecated` on fields/enum values, add new types/fields, and communicate removal timelines. Provide query examples and codemods in client repos.
What about gRPC?
Use proto3 with additive fields; never change field numbers or types. Mark removed fields as `reserved`. For optional semantics, use wrapper types like `google.protobuf.*Value`.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to an engineer about your API versioning plan See how we instrument per-version SLOs

Related resources