The Friday Night Supply‑Chain Attack We Didn’t Ship
How a payments platform avoided a seven‑figure breach by baking security into dev, not bolting it on later.
“We didn’t slow down shipping. We slowed down attackers.” — CTO, anonymized fintech clientBack to all posts
The Friday Night We Didn’t Get Owned
If you’ve ever shipped on a Friday, you know the feeling. This client—a Series D payments platform processing eight-figure daily volume—had just kicked off a routine release. Ten minutes later, Slack lit up: a popular utility package they depended on had been hijacked upstream. Malicious PR merged, new npm tarball published, same semver. Classic supply-chain play.
Here’s the difference: nothing reached prod.
The pipeline built the container, generated an SBOM, signed both, pushed to registry, and ArgoCD attempted to sync. The cluster’s admission controller verified signatures and provenance, matched SBOM contents against vulnerability policy, and refused the deployment. Alert fired to #sec-dev, deployment auto-rolled back to the prior digest. Merchants saw nothing. Finance slept. Legal never got that call.
That wasn’t luck. It was a year of unglamorous work to make security a default behavior, not a meeting.
Context: Payments, PCI, and a Kubernetes Footgun
Constraints we walked into:
- Regulatory pressure: PCI DSS 4.0, SOC 2 Type II. Auditors wanted evidence, not vibes.
- Zero downtime tolerance: Checkout SLOs were tight: p99 < 300ms, error rate < 0.1% during peak.
- Stack:
EKSon AWS,ArgoCDGitOps,GitHub Actions,Goservices,Nodefor ops tooling,Terraform,PostgresonRDS,Istiofor mTLS and traffic shaping. - Team reality: 35 engineers, 2.5 SREs, a security team of 1. Everyone shipping. AI-assisted PRs increasing throughput—and risk.
Before we arrived, they had scanners, but they were theater:
- CI scanned images but didn’t block merges. Findings piled up in Jira.
- Long-lived cloud keys in repo secrets. Rotations were “quarterly.”
- Cluster allowed privileged pods “temporarily.” Temporary is forever.
- No SBOMs, no signatures, no provenance. Supply chain was “trust me, bro.”
What We Changed: Security-First Without Slowing Delivery
We didn’t add gates that humans babysit. We automated evidence and enforcement:
Provenance and SBOM as first-class artifacts
syftto generate SPDX for every build.cosignto sign images and attest SBOMs into the transparency log (Rekor).
Admission control with policy-as-code
Kyvernoto enforce "only signed images" and ban privileged pods.policy-controller(cosign) to verify signatures and attestations on admission.
Least-privilege CI with OIDC
- GitHub Actions
id-tokento assume short-lived AWS roles; no stored cloud creds. permissions: read-all➜ explicit per-step permissions.
- GitHub Actions
Shift-left scanning that blocks, not nags
Semgreprules tuned to their codebase andCodeQLfor deeper analysis.Trivyfor image and SBOM vuln checks with allowlist windows tied to SLAs.
Network and runtime hardening
PodSecuritystandards enforced, defaultNetworkPolicydeny, Istio mTLS strict.- Read-only root FS, no privilege escalation, and resource limits required.
Infra as code with guardrails
tfsecandConfteston Terraform plans; no merges on critical misconfigs.ArgoCDapp-of-apps with protected main; change windows for high-risk infra.
Secret discipline
gitleakspre-commit + server-side hooks; External Secrets backed by AWS Secrets Manager + KMS.
Dashboards that matter
- SLOs for security pipeline latency (< 2 min p95 added to build), unsigned-image block rate, and critical vuln burn-down.
Implementation Details You Can Steal
Hardened GitHub Actions with OIDC and minimal permissions:
name: build-and-sign
on:
push:
branches: [main]
permissions:
contents: read
id-token: write
jobs:
build:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Configure AWS (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/gha-build
aws-region: us-east-1
- name: Build
run: go build -ldflags "-s -w" -o app ./cmd/app
- name: SBOM
run: syft dir:. -o spdx-json=sbom.json
- name: Build image
run: |
docker build -t ghcr.io/acme/payments:${{ github.sha }} .
docker push ghcr.io/acme/payments:${{ github.sha }}
- name: Sign and attest
env:
COSIGN_EXPERIMENTAL: 1
run: |
cosign sign --oidc-provider=github -a git_sha=${{ github.sha }} ghcr.io/acme/payments:${{ github.sha }}
cosign attest --predicate sbom.json --type spdx ghcr.io/acme/payments:${{ github.sha }}Admission policy to block privileged containers (Kyverno):
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-privileged
spec:
validationFailureAction: enforce
rules:
- name: privileged-containers
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "Privileged containers are not allowed"
pattern:
spec:
containers:
- =(securityContext):
=(privileged): "false"
=(allowPrivilegeEscalation): "false"Default-deny network policy with a single egress exception to RDS:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-by-default
namespace: payments
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress: []
egress:
- to:
- namespaceSelector:
matchLabels:
name: infra
ports:
- protocol: TCP
port: 5432Terraform policy (Conftest/OPA) to require S3 encryption:
package terraform.security
deny[msg] {
input.resource_type == "aws_s3_bucket"
not input.encrypted
msg := sprintf("S3 bucket %s must enable default encryption", [input.name])
}Secret scanning at commit time (pre-commit with gitleaks):
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.0
hooks:
- id: gitleaks
args: ["--no-banner", "--redact"]And for verification at deploy, we used cosign policy-controller with ArgoCD so only signed-and-attested images with clean SBOMs make it past the door. No human approvals, no tickets—just crypto.
The Attempted Breach and Why It Failed
The attack path looked like dozens we’ve seen:
- Maintainer account of a utility library compromised; malicious version published with a payload that attempted credential exfil and container breakout (using
CAP_SYS_ADMINwhen available, outbound beacon on 443). Renovateopened a PR to bump the lib. AI-assisted review LGTM’d it. CI built the image.Trivyflagged a critical CVE from the SBOM delta; policy marked the image "quarantined". Build artifact still pushed (we keep evidence), but:cosignsignatures and attestations were present, yetKyvernoverifyImages rule checked the SBOM attestation findings against an allowlist SLA—critical issues within cardholder data path are blocker. Admission denied. ArgoCD rolled back.
- Egress controls plus Istio mTLS prevented any pod-to-pod lateral move and blocked the beacon’s IPs.
Security didn’t depend on someone noticing a Slack alert. The system enforced the decision.
Measurable Outcomes (90 Days and 12 Months)
First 90 days:
- 83% reduction in critical vulns in running workloads (from 236 to 40) measured via
Trivy+ SBOM ingest inSecurity Hub. - 0 secrets merged to
mainafter enabling pre-commit + server hooks (from ~3/month). - Deployment lead time unchanged (p50 stayed ~45 minutes from merge ➜ prod); security checks added +92 seconds p95 to builds.
- Unsigned image blocks: 100% of attempted unsigned deploys were denied (7 incidents, all from ad-hoc test images).
At 12 months:
- Three supply-chain attempts blocked at admission (including the Friday incident). No customer impact, no downtime.
- MTTR for security-related rollbacks: 6 minutes p50 using ArgoCD auto-sync and progressive delivery.
- Cardholder data environment met PCI on first attempt with auditors accepting SBOMs, signatures, and policy logs as evidence—cutting audit prep time by ~60%.
- Conservatively $2.7M in risk avoided across incident response, legal, fines, and merchant churn, based on benchmarks and the company’s transaction volume. No ransomware, no breach disclosure.
What Bit Us, What We’d Do Differently
- We initially over-blocked on medium CVEs without exploit paths, which throttled dev. Fix: tie policies to exploitability (
EPSS), reachability (code paths), and environment (is it in the CDE?). - First pass SBOMs were noisy. We standardized on SPDX 2.3, pinned
syftversions, and diffed SBOMs PR-to-PR to focus on deltas. - GitHub Actions
GITHUB_TOKENhad broader scopes than needed in a couple workflows. We moved to explicitpermissionsper job and auditedactions/*versions. Lock to SHAs when feasible. - Teams tried to bypass ArgoCD for “hotfixes.” We eliminated kubectl write access to prod except for break-glass, logged, time-bound roles via
aws-iam-authenticatorand SSO.
Actionable Next Steps You Can Implement This Sprint
- Add SBOM + signing to your CI. Start with
syftandcosign. - Enforce admission. If
Kyvernois too heavy right now, start withpolicy-controllerverify on signatures. - Flip GitHub Actions to OIDC. Remove stored cloud creds. Set
permissionsexplicitly. - Default-deny the network. Add one
NetworkPolicyper namespace this week. - Block secrets at commit and server-side. Audit your history with
gitleaks --repo-path . --log-opts=--all. - Wire
tfsecandConftestinto your Terraform PR checks. Fail on critical. - Track two KPIs: unsigned deploy blocks/week and build latency added by security. If latency > 2 min p95, optimize before adding more checks.
You don’t need a zero trust keynote. You need signatures, policies, and the spine to enforce them. The rest is press release.
Key takeaways
- Security-first development can ship as fast—or faster—than trust-me pipelines when you automate verification, not paperwork.
- Provenance and policy at admission (not just scanning) is the line between a scary Slack ping and a headline.
- Treat infra and security as code: SBOMs, signatures, policies, and network controls live in Git and are enforced automatically.
- Adopt OIDC, short-lived creds, and secret scanning to starve attackers of the easiest vector: leaked tokens.
- Measure it: track blocked attempts, reduced critical vulns, zero secret commits, unchanged lead time, and MTTR.
Implementation checklist
- Generate and attest SBOMs for every artifact (`syft`, `cosign attest`).
- Sign images and verify at admission (`cosign`, Kyverno/`policy-controller`).
- Harden CI with OIDC and least-privileged `permissions` in GitHub Actions.
- Shift-left with `Semgrep`, `CodeQL`, `Trivy`, `tfsec`, and prevent merges on critical findings.
- Enforce cluster policy: Pod Security Standards, `NetworkPolicy`, `disallow-privileged` via Kyverno.
- Adopt GitOps (ArgoCD) with protected branches and change windows for infra.
- Scan and block secrets at commit time (`gitleaks`/`detect-secrets`) and use External Secrets + KMS.
- Continuously test with chaos/security drills and track SLOs for security pipeline latency.
Questions we hear from teams
- Will admission policies and signing slow my team down?
- Not if you design for automation. In this case, we added ~92 seconds p95 to builds and 0 seconds to deploy time. The key is to treat signatures and SBOMs as artifacts produced by CI and verified by the cluster—no manual approvals.
- Do I need to rebuild my entire stack to get these benefits?
- No. Start with SBOM generation and image signing in CI, then add admission verification in one cluster/namespace. Roll out `NetworkPolicy` defaults and pre-commit secret scanning in parallel. You can phase this in over 2–4 sprints.
- What about AI-generated code and dependency drift?
- Shift-left rules (`Semgrep`, `CodeQL`) catch common AI-introduced issues. Pair that with `Renovate`/`Dependabot`, SBOM deltas, and policy that blocks risky updates in sensitive services. Automation lets you accept safe changes quickly and quarantine the rest.
- How does this help with PCI and audits?
- Auditors accepted signed SBOMs, admission logs, and CI logs as evidence for software integrity and change control. It reduced audit prep by ~60% because the “paper trail” is generated automatically on every change.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
