The Audit That Stopped Our Releases: Codifying Least‑Privilege, Rotation, and Dependency Risk as Code
Translate policy into guardrails, checks, and automated proofs your auditors actually accept, without grinding delivery to a halt.
If it isn’t codified, enforced, and leaving a cryptographically verifiable trail, it’s not a control—it’s a suggestion.Back to all posts
I’ve watched a fintech freeze all deploys for two weeks because an auditor asked, "Show me where you enforce least‑privilege and how you prove secrets are rotated." The team had a wiki page and a few Terraform modules. Not enough. We turned policy into code, wired it into CI, and shipped automated proofs with every release. They passed the audit, reduced mean time to remediate critical IAM issues from days to hours, and didn’t tank velocity.
Translate policy into guardrails, checks, and proofs
- Guardrails: Pre‑approved modules, policy boundaries, and admission controls that make the secure path the path of least resistance.
- Checks:
OPA/Conftestonterraform plan,Kyverno/Gatekeeperon K8s manifests, license and CVE scanners in CI. - Automated proofs: Sigstore signatures, in‑toto/SLSA attestations, OPA decision logs, and evidence stored in WORM object storage.
The pattern is simple:
- Express the rule once as code.
- Run it where drift happens (pre‑merge, at admission, on deploy).
- Emit artifacts that a human (or auditor) can verify without trusting your word.
Least‑privilege as code (AWS + K8s)
You can’t lint your way out of wildcard IAM or cluster‑admin in prod. Enforce at the boundary and test it before merge.
Pre‑merge checks with OPA on Terraform plans
# policy/iam.rego
package terraform.iam
deny[msg] {
some i
rc := input.resource_changes[i]
rc.type == "aws_iam_policy"
some s
act := rc.change.after.Statement[s].Action
act == ["*"] or act == "*"
msg := sprintf("IAM policy %s grants wildcard action", [rc.address])
}
deny[msg] {
rc := input.resource_changes[_]
rc.type == "aws_iam_role"
not rc.change.after.permissions_boundary
msg := sprintf("Role %s missing permissions_boundary", [rc.address])
}terraform init
terraform plan -out tf.plan
terraform show -json tf.plan | conftest test -p policy -Enforce IAM permissions boundaries in Terraform
resource "aws_iam_policy" "boundary" {
name = "app-permissions-boundary"
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Deny",
Action = ["iam:*", "kms:ScheduleKeyDeletion"],
Resource = "*"
}]
})
}
resource "aws_iam_role" "app" {
name = "svc-foo"
assume_role_policy = data.aws_iam_policy_document.assume.json
permissions_boundary = aws_iam_policy.boundary.arn
}Admission‑time enforcement in Kubernetes with Kyverno
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-privileged
spec:
validationFailureAction: enforce
rules:
- name: no-privileged
match:
resources:
kinds:
- Pod
validate:
message: 'Privileged pods are not allowed'
pattern:
spec:
containers:
- securityContext:
privileged: falseBetween Conftest and Kyverno, the bad patterns never hit prod. And yes, we still run AWS SCPs to keep the blast radius small for stray accounts.
Secret rotation that actually rotates
If your "rotation" is a calendar reminder, you’re one pager away from a breach report. Prefer dynamic credentials; fallback to scheduled rotation when the target doesn’t support dynamic.
Dynamic DB creds with Vault + External Secrets Operator
# Configure a DB connection in Vault
vault write database/config/prod-database \
plugin_name=postgresql-database-plugin \
connection_url='postgresql://{{username}}:{{password}}@db.prod:5432/postgres?sslmode=verify-full' \
allowed_roles='app-readwrite' \
username='vault' password='REDACTED'
# Define a role that issues short-lived creds
vault write database/roles/app-readwrite \
db_name=prod-database \
creation_statements='CREATE ROLE "{{name}}" WITH LOGIN PASSWORD ''{{password}}'' VALID UNTIL ''{{expiration}}''; GRANT SELECT,INSERT,UPDATE,DELETE ON ALL TABLES IN SCHEMA public TO "{{name}}";' \
default_ttl=1h max_ttl=4h# Kubernetes pulls dynamic creds on a 15m refresh schedule
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-db-creds
spec:
refreshInterval: 15m
secretStoreRef:
name: vault
kind: ClusterSecretStore
target:
name: app-db-creds
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: database/creds/app-readwrite
property: username
- secretKey: password
remoteRef:
key: database/creds/app-readwrite
property: passwordAWS Secrets Manager with rotation Lambda
resource "aws_secretsmanager_secret" "db" {
name = "prod/app/db"
}
resource "aws_secretsmanager_secret_rotation" "db" {
secret_id = aws_secretsmanager_secret.db.id
rotation_lambda_arn = aws_lambda_function.rotate_db.arn
rotation_rules {
automatically_after_days = 7
}
}Track rotation as an SLO: e.g., 99% of secrets have age < 30 days. Emit a daily report from Vault/Secrets Manager and fail the build if the app depends on a secret older than your SLO.
Dependency and supply chain risk as code
Your SBOM shouldn’t be a one‑off. Bake it into CI, sign artifacts, attach attestations, and block merges on criticals.
CI example with OSV‑Scanner, SBOM, and Cosign (GitHub Actions)
name: supply-chain
on:
pull_request:
paths:
- '**/package.json'
- '**/requirements.txt'
- '**/go.mod'
jobs:
osv:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: OSV scan
uses: google/osv-scanner-action@v1
with:
path: .
sbom:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Generate SBOM
uses: anchore/sbom-action@v0
with:
path: .
format: cyclonedx-json
output-file: sbom.cdx.json
- name: Sign image and attest SBOM
env:
IMAGE: ghcr.io/acme/api:${{ github.sha }}
run: |
cosign sign --key cosign.key $IMAGE
cosign attest --predicate sbom.cdx.json --type cyclonedx $IMAGELayer on policy‑as‑code to block known‑bad:
- Reject images without a valid
cosignsignature from your key. - Reject dependencies with
criticalCVEs unless an approved waiver exists. - Enforce license policy (e.g., no AGPL in backend services) using
license_finderorpip-licensesin CI.
If you’re further along, add SLSA provenance (slsa-framework/slsa-github-generator) and in‑toto attestations.
Automated proofs auditors accept
Don’t email screenshots. Produce machine‑readable evidence linked to change IDs and store it in append‑only buckets.
- OPA decision logs: Enable and upload to your artifact store.
- Cosign signatures and attestations: Verifiable supply chain evidence.
- SBOMs and scan reports: Attach to releases.
- Rotation reports: Export secret age histograms; file them with the change.
# Example: publish OPA and scan artifacts to an Object-Locked S3 bucket
mkdir -p artifacts && cp opa.json osv.json sbom.cdx.json artifacts/
aws s3api put-object --bucket audit-artifacts --key builds/$GITHUB_RUN_ID/opa.json --body artifacts/opa.json --object-lock-mode COMPLIANCE --object-lock-retain-until-date 2030-01-01T00:00:00Z
aws s3api put-object --bucket audit-artifacts --key builds/$GITHUB_RUN_ID/osv.json --body artifacts/osv.json --object-lock-mode COMPLIANCE --object-lock-retain-until-date 2030-01-01T00:00:00Z
aws s3api put-object --bucket audit-artifacts --key builds/$GITHUB_RUN_ID/sbom.cdx.json --body artifacts/sbom.cdx.json --object-lock-mode COMPLIANCE --object-lock-retain-until-date 2030-01-01T00:00:00ZPro tip: create a bot comment or a Jira link in the PR that points at these immutable artifacts. Your auditors will stop asking for screenshots when they can verify signatures and timestamps.
Balance regulated‑data constraints with delivery speed
Security leaders want field‑level masking; engineers want deploys under 15 minutes. You can have both by moving decisions left and making the happy path fast.
- Golden paths: Pre‑approved Terraform modules (VPC, RDS, KMS, IAM role) and K8s base charts with sane defaults. Publish them in an internal registry with version pins.
- Admission guardrails, not review marathons: Let
Kyverno/Gatekeeperauto‑reject bad manifests. Human reviews focus on design, not regex policing. - Data minimization by default: Use tokenization/synthetic data in dev. For analytics, enforce row/column‑level security in
BigQueryor masking policies inSnowflakeas code. - Progressive delivery: Couple these controls with
Argo Rolloutscanaries and feature flags (LaunchDarkly/Unleash) so risk stays low while velocity stays high.
What to measure and how to roll it out
You’ll only keep this discipline if you can show it improves outcomes.
Operational KPIs
- Change failure rate, MTTR, lead time (DORA)
- % of workloads failing policy in pre‑merge vs at admission (aim for regressions caught pre‑merge)
- Secret age distribution; rotation SLO compliance
- CVE remediation latency (median and P95)
Rollout approach we’ve seen work
- Inventory controls and owners. Map each policy to a repo, CI job, and artifact.
- Write policies that fail on egregious issues first (wildcards, cluster‑admin, no boundary).
- Wire policies to run on plans and manifests in CI. Start in
warnmode; publish a weekly scoreboard. - Add auto‑remediation where possible (e.g., inject boundaries and mandatory tags via Terraform modules).
- Flip to
enforceper team once their failure rate drops below an agreed threshold.
In one regulated client, this cut critical IAM misconfigurations by 83% in 60 days, took secret rotation SLO from 180 days to 7, and reduced CVE patch latency from 10 days median to under 48 hours without increasing lead time.
Key takeaways
- Policies don’t matter unless they’re expressed as code that blocks bad changes and proves good ones.
- Enforce least‑privilege at the boundary: IAM policy boundaries, SCPs, and cluster admission policies checked pre‑merge.
- Rotate secrets by default using dynamic creds or scheduled rotation, and verify rotation with evidence artifacts.
- Treat supply chain risk like any other test: SBOM, vulnerability scan, signature, and attestation in CI.
- Store audit evidence in append‑only storage and wire it to change tickets automatically.
- Balance speed with guardrails via golden paths, pre‑approved modules, and progressive enforcement.
Implementation checklist
- Inventory controls and map them to code owners, repos, and CI jobs.
- Write OPA/Kyverno policies that fail on known‑bad patterns; run them on plans and manifests pre‑merge.
- Implement permissions boundaries for IAM roles; deny wildcards and require tags/owners.
- Move to dynamic secrets (Vault) or scheduled rotation (Secrets Manager) and track rotation SLOs.
- Add SBOM + OSV scan + signature + attestation to every build.
- Publish decision logs and artifacts to an Object‑Locked bucket; link them to change records.
- Roll out in warn mode, measure drift, then flip to enforce with auto‑remediation.
Questions we hear from teams
- We’re not on AWS/K8s. Does this still apply?
- Yes. The pattern is portable: policy-as-code (OPA), plan-time checks (Pulumi/Terraform JSON, ARM/Bicep), admission controls (OpenShift/Gatekeeper), dynamic secrets (Vault), and signed artifacts (Cosign) exist across clouds and platforms.
- Won’t this slow down delivery?
- Done right, it speeds you up. Fail fast in CI, pre-approved modules, and auto-remediation prevent slow human review cycles. Our clients typically hold or improve lead time while cutting security toil.
- We’re air-gapped. Can we still use Sigstore?
- Yes. Run a private Fulcio/Rekor or use key-based cosign signatures and store attestations in your internal registry or WORM object storage. The point is verifiable evidence, not calling public services.
- How do we handle exceptions?
- Codify them, too. Use signed waivers with expiration, referenced in policy (OPA can check a waiver list). Exceptions become traceable, time-bound, and auditable.
- What’s the first control to implement?
- Least-privilege at plan-time with OPA/Conftest. It catches high-severity issues early and establishes the pattern you’ll reuse for rotation and supply chain.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
