The Friday Night Supply‑Chain Attack We Didn’t Ship

How a payments platform avoided a seven‑figure breach by baking security into dev, not bolting it on later.

“We didn’t slow down shipping. We slowed down attackers.” — CTO, anonymized fintech client
Back to all posts

The Friday Night We Didn’t Get Owned

If you’ve ever shipped on a Friday, you know the feeling. This client—a Series D payments platform processing eight-figure daily volume—had just kicked off a routine release. Ten minutes later, Slack lit up: a popular utility package they depended on had been hijacked upstream. Malicious PR merged, new npm tarball published, same semver. Classic supply-chain play.

Here’s the difference: nothing reached prod.

The pipeline built the container, generated an SBOM, signed both, pushed to registry, and ArgoCD attempted to sync. The cluster’s admission controller verified signatures and provenance, matched SBOM contents against vulnerability policy, and refused the deployment. Alert fired to #sec-dev, deployment auto-rolled back to the prior digest. Merchants saw nothing. Finance slept. Legal never got that call.

That wasn’t luck. It was a year of unglamorous work to make security a default behavior, not a meeting.

Context: Payments, PCI, and a Kubernetes Footgun

Constraints we walked into:

  • Regulatory pressure: PCI DSS 4.0, SOC 2 Type II. Auditors wanted evidence, not vibes.
  • Zero downtime tolerance: Checkout SLOs were tight: p99 < 300ms, error rate < 0.1% during peak.
  • Stack: EKS on AWS, ArgoCD GitOps, GitHub Actions, Go services, Node for ops tooling, Terraform, Postgres on RDS, Istio for mTLS and traffic shaping.
  • Team reality: 35 engineers, 2.5 SREs, a security team of 1. Everyone shipping. AI-assisted PRs increasing throughput—and risk.

Before we arrived, they had scanners, but they were theater:

  • CI scanned images but didn’t block merges. Findings piled up in Jira.
  • Long-lived cloud keys in repo secrets. Rotations were “quarterly.”
  • Cluster allowed privileged pods “temporarily.” Temporary is forever.
  • No SBOMs, no signatures, no provenance. Supply chain was “trust me, bro.”

What We Changed: Security-First Without Slowing Delivery

We didn’t add gates that humans babysit. We automated evidence and enforcement:

  1. Provenance and SBOM as first-class artifacts

    • syft to generate SPDX for every build.
    • cosign to sign images and attest SBOMs into the transparency log (Rekor).
  2. Admission control with policy-as-code

    • Kyverno to enforce "only signed images" and ban privileged pods.
    • policy-controller (cosign) to verify signatures and attestations on admission.
  3. Least-privilege CI with OIDC

    • GitHub Actions id-token to assume short-lived AWS roles; no stored cloud creds.
    • permissions: read-all ➜ explicit per-step permissions.
  4. Shift-left scanning that blocks, not nags

    • Semgrep rules tuned to their codebase and CodeQL for deeper analysis.
    • Trivy for image and SBOM vuln checks with allowlist windows tied to SLAs.
  5. Network and runtime hardening

    • PodSecurity standards enforced, default NetworkPolicy deny, Istio mTLS strict.
    • Read-only root FS, no privilege escalation, and resource limits required.
  6. Infra as code with guardrails

    • tfsec and Conftest on Terraform plans; no merges on critical misconfigs.
    • ArgoCD app-of-apps with protected main; change windows for high-risk infra.
  7. Secret discipline

    • gitleaks pre-commit + server-side hooks; External Secrets backed by AWS Secrets Manager + KMS.
  8. Dashboards that matter

    • SLOs for security pipeline latency (< 2 min p95 added to build), unsigned-image block rate, and critical vuln burn-down.

Implementation Details You Can Steal

Hardened GitHub Actions with OIDC and minimal permissions:

name: build-and-sign
on:
  push:
    branches: [main]
permissions:
  contents: read
  id-token: write
jobs:
  build:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-go@v5
        with:
          go-version: '1.22'
      - name: Configure AWS (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/gha-build
          aws-region: us-east-1
      - name: Build
        run: go build -ldflags "-s -w" -o app ./cmd/app
      - name: SBOM
        run: syft dir:. -o spdx-json=sbom.json
      - name: Build image
        run: |
          docker build -t ghcr.io/acme/payments:${{ github.sha }} .
          docker push ghcr.io/acme/payments:${{ github.sha }}
      - name: Sign and attest
        env:
          COSIGN_EXPERIMENTAL: 1
        run: |
          cosign sign --oidc-provider=github -a git_sha=${{ github.sha }} ghcr.io/acme/payments:${{ github.sha }}
          cosign attest --predicate sbom.json --type spdx ghcr.io/acme/payments:${{ github.sha }}

Admission policy to block privileged containers (Kyverno):

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-privileged
spec:
  validationFailureAction: enforce
  rules:
    - name: privileged-containers
      match:
        any:
          - resources:
              kinds: ["Pod"]
      validate:
        message: "Privileged containers are not allowed"
        pattern:
          spec:
            containers:
              - =(securityContext):
                  =(privileged): "false"
                  =(allowPrivilegeEscalation): "false"

Default-deny network policy with a single egress exception to RDS:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-by-default
  namespace: payments
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress: []
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: infra
      ports:
        - protocol: TCP
          port: 5432

Terraform policy (Conftest/OPA) to require S3 encryption:

package terraform.security

deny[msg] {
  input.resource_type == "aws_s3_bucket"
  not input.encrypted
  msg := sprintf("S3 bucket %s must enable default encryption", [input.name])
}

Secret scanning at commit time (pre-commit with gitleaks):

- repo: https://github.com/gitleaks/gitleaks
  rev: v8.18.0
  hooks:
    - id: gitleaks
      args: ["--no-banner", "--redact"]

And for verification at deploy, we used cosign policy-controller with ArgoCD so only signed-and-attested images with clean SBOMs make it past the door. No human approvals, no tickets—just crypto.

The Attempted Breach and Why It Failed

The attack path looked like dozens we’ve seen:

  • Maintainer account of a utility library compromised; malicious version published with a payload that attempted credential exfil and container breakout (using CAP_SYS_ADMIN when available, outbound beacon on 443).
  • Renovate opened a PR to bump the lib. AI-assisted review LGTM’d it. CI built the image.
  • Trivy flagged a critical CVE from the SBOM delta; policy marked the image "quarantined". Build artifact still pushed (we keep evidence), but:
    • cosign signatures and attestations were present, yet
    • Kyverno verifyImages rule checked the SBOM attestation findings against an allowlist SLA—critical issues within cardholder data path are blocker. Admission denied. ArgoCD rolled back.
  • Egress controls plus Istio mTLS prevented any pod-to-pod lateral move and blocked the beacon’s IPs.

Security didn’t depend on someone noticing a Slack alert. The system enforced the decision.

Measurable Outcomes (90 Days and 12 Months)

First 90 days:

  • 83% reduction in critical vulns in running workloads (from 236 to 40) measured via Trivy + SBOM ingest in Security Hub.
  • 0 secrets merged to main after enabling pre-commit + server hooks (from ~3/month).
  • Deployment lead time unchanged (p50 stayed ~45 minutes from merge ➜ prod); security checks added +92 seconds p95 to builds.
  • Unsigned image blocks: 100% of attempted unsigned deploys were denied (7 incidents, all from ad-hoc test images).

At 12 months:

  • Three supply-chain attempts blocked at admission (including the Friday incident). No customer impact, no downtime.
  • MTTR for security-related rollbacks: 6 minutes p50 using ArgoCD auto-sync and progressive delivery.
  • Cardholder data environment met PCI on first attempt with auditors accepting SBOMs, signatures, and policy logs as evidence—cutting audit prep time by ~60%.
  • Conservatively $2.7M in risk avoided across incident response, legal, fines, and merchant churn, based on benchmarks and the company’s transaction volume. No ransomware, no breach disclosure.

What Bit Us, What We’d Do Differently

  • We initially over-blocked on medium CVEs without exploit paths, which throttled dev. Fix: tie policies to exploitability (EPSS), reachability (code paths), and environment (is it in the CDE?).
  • First pass SBOMs were noisy. We standardized on SPDX 2.3, pinned syft versions, and diffed SBOMs PR-to-PR to focus on deltas.
  • GitHub Actions GITHUB_TOKEN had broader scopes than needed in a couple workflows. We moved to explicit permissions per job and audited actions/* versions. Lock to SHAs when feasible.
  • Teams tried to bypass ArgoCD for “hotfixes.” We eliminated kubectl write access to prod except for break-glass, logged, time-bound roles via aws-iam-authenticator and SSO.

Actionable Next Steps You Can Implement This Sprint

  • Add SBOM + signing to your CI. Start with syft and cosign.
  • Enforce admission. If Kyverno is too heavy right now, start with policy-controller verify on signatures.
  • Flip GitHub Actions to OIDC. Remove stored cloud creds. Set permissions explicitly.
  • Default-deny the network. Add one NetworkPolicy per namespace this week.
  • Block secrets at commit and server-side. Audit your history with gitleaks --repo-path . --log-opts=--all.
  • Wire tfsec and Conftest into your Terraform PR checks. Fail on critical.
  • Track two KPIs: unsigned deploy blocks/week and build latency added by security. If latency > 2 min p95, optimize before adding more checks.

You don’t need a zero trust keynote. You need signatures, policies, and the spine to enforce them. The rest is press release.

Related Resources

Key takeaways

  • Security-first development can ship as fast—or faster—than trust-me pipelines when you automate verification, not paperwork.
  • Provenance and policy at admission (not just scanning) is the line between a scary Slack ping and a headline.
  • Treat infra and security as code: SBOMs, signatures, policies, and network controls live in Git and are enforced automatically.
  • Adopt OIDC, short-lived creds, and secret scanning to starve attackers of the easiest vector: leaked tokens.
  • Measure it: track blocked attempts, reduced critical vulns, zero secret commits, unchanged lead time, and MTTR.

Implementation checklist

  • Generate and attest SBOMs for every artifact (`syft`, `cosign attest`).
  • Sign images and verify at admission (`cosign`, Kyverno/`policy-controller`).
  • Harden CI with OIDC and least-privileged `permissions` in GitHub Actions.
  • Shift-left with `Semgrep`, `CodeQL`, `Trivy`, `tfsec`, and prevent merges on critical findings.
  • Enforce cluster policy: Pod Security Standards, `NetworkPolicy`, `disallow-privileged` via Kyverno.
  • Adopt GitOps (ArgoCD) with protected branches and change windows for infra.
  • Scan and block secrets at commit time (`gitleaks`/`detect-secrets`) and use External Secrets + KMS.
  • Continuously test with chaos/security drills and track SLOs for security pipeline latency.

Questions we hear from teams

Will admission policies and signing slow my team down?
Not if you design for automation. In this case, we added ~92 seconds p95 to builds and 0 seconds to deploy time. The key is to treat signatures and SBOMs as artifacts produced by CI and verified by the cluster—no manual approvals.
Do I need to rebuild my entire stack to get these benefits?
No. Start with SBOM generation and image signing in CI, then add admission verification in one cluster/namespace. Roll out `NetworkPolicy` defaults and pre-commit secret scanning in parallel. You can phase this in over 2–4 sprints.
What about AI-generated code and dependency drift?
Shift-left rules (`Semgrep`, `CodeQL`) catch common AI-introduced issues. Pair that with `Renovate`/`Dependabot`, SBOM deltas, and policy that blocks risky updates in sensitive services. Automation lets you accept safe changes quickly and quarantine the rest.
How does this help with PCI and audits?
Auditors accepted signed SBOMs, admission logs, and CI logs as evidence for software integrity and change control. It reduced audit prep by ~60% because the “paper trail” is generated automatically on every change.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to an engineer about securing your pipeline See our Secure Delivery playbook

Related resources