Ship Fast, Pass Audit: Turning Policies into Pipeline Guardrails That Don’t Kill Velocity
If auditors still email you CSVs while prod deploys by hand-wavy Slack approvals, you’re one Sev-1 away from a public postmortem. Bake compliance into the pipeline, generate proofs automatically, and keep shipping without drama.
Compliance should be a compiler error, not a calendar event.Back to all posts
The day an auditor ran kubectl in prod
Two summers ago, I watched a Big 4 auditor run kubectl get pods -A in a shared cluster and find an image running :latest with hostPath mounted. You can guess the rest: freeze on changes, retroactive evidence requests, and a seven-week quarter-end death march. I've seen this movie. The fix isn’t bigger binders. It’s turning policies into guardrails, checks, and automated proofs wired into your pipeline.
This is how we do it at GitPlumbers without turning your engineers into compliance clerks.
Translate policy to code: guardrails, checks, proofs
Policies written as prose won’t save you at 2 a.m. You need three things:
- Guardrails (prevent): Admission and PR gates that stop bad changes. Think
kyvernopolicies in the cluster,confteston Terraform plans, andcheckovfor IaC. - Checks (detect): Scans and drift detection that flag gaps fast:
trivyfor images,kube-benchfor CIS,kube-hunter,tfsec/checkovin CI. - Proofs (attest): Machine-readable evidence tied to a commit, build, and environment; signed and retained. Use
oscalmappings,cosign/in-totoattestations, and immutable storage.
Map your controls (SOC 2, HIPAA, PCI DSS, NIST 800-53) to concrete pipeline steps. Example mapping:
- NIST SC-7 (boundary protection) → block public S3 buckets in Terraform plans; deny egress to 0.0.0.0/0 in security groups.
- HIPAA 164.312 (encryption) → require
efs/rdsencryption flags in IaC; verify TLS annotations on services. - PCI 1.1.6 (change control) → signed attestations per deploy; GitOps-only changes to prod.
Wire it into the pipeline without killing speed
I’ve seen teams bolt on scanners everywhere and tank their lead time by 50%. What works:
- Pre-commit (fast feedback): Local
pre-commithooks fortflint,yamllint,kubeconform. Don’t block teammates with cloud auth. - Pull request (blockers): Run deterministic IaC policy checks on plans/manifests. Fail the PR on criticals. Keep it under 3–5 minutes.
- Build (supply chain): SBOM (
syft), image scan (trivy), sign artifacts (cosign). Attach a pass/fail attestation. - Deploy (admission):
kyvernoorOPA Gatekeeperin clusters; deny noncompliant manifests. GitOps withArgoCDso changes are declarative and diffable. - Runtime (detect/drift): Continuous scans (
Falco,kube-bench), drift detection for cloud (Cloud Custodian,Steampipe), and nightly evidence rollups.
Keep regulated workloads (PHI/PCI) on a stricter path: additional checks, pinned base images, tighter admission, slower rollback allowance. Everyone else gets the fast lane.
Concrete examples: Terraform + OPA, Kubernetes + Kyverno
Let’s start with Terraform. Deny public S3 buckets at PR time using conftest and Rego on the Terraform plan.
package terraform.aws.s3
default deny = []
# Deny buckets with public ACLs
deny[msg] {
some i
rc := input.resource_changes[i]
rc.type == "aws_s3_bucket"
after := rc.change.after
acl := lower(after.acl)
acl == "public-read" or acl == "public-read-write"
msg := sprintf("S3 bucket %s has public ACL '%s'", [after.bucket, acl])
}
# Require public access block resource for each bucket
deny[msg] {
some i
rc := input.resource_changes[i]
rc.type == "aws_s3_bucket"
bucket_name := rc.change.after.bucket
not public_block_exists(bucket_name)
msg := sprintf("S3 bucket %s missing aws_s3_bucket_public_access_block", [bucket_name])
}
public_block_exists(bucket_name) {
some j
rb := input.resource_changes[j]
rb.type == "aws_s3_bucket_public_access_block"
rb.change.after.bucket == bucket_name
}A minimal GitHub Actions PR workflow:
name: policy-check
on:
pull_request:
branches: [ main ]
jobs:
terraform-policy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Terraform init/plan
run: |
terraform -chdir=infrastructure init -input=false
terraform -chdir=infrastructure plan -out=tfplan -input=false -lock=false
terraform -chdir=infrastructure show -json tfplan > plan.json
- name: Conftest
uses: instrumenta/conftest-action@v1
with:
files: infrastructure/plan.json
policy: policy/
- name: Upload evidence
if: always()
run: |
jq -n --arg sha "${{ github.sha }}" --arg run "${{ github.run_id }}" \
'{control:"NIST-SC-7", result:"'"${{ job.status }}"'", sha:$sha, run:$run, tool:"conftest"}' \
> evidence/policy.json
aws s3 cp evidence/policy.json s3://compliance-evidence/${{ github.run_id }}.json \
--sse aws:kms --sse-kms-key-id $EVIDENCE_KMS_KEYFor Kubernetes, kyverno admission rules give you fast, explainable guardrails. Example policy to block :latest and require CPU/memory:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: secure-pod-standards
spec:
validationFailureAction: enforce
rules:
- name: disallow-latest-tag
match:
resources:
kinds: [Pod, Deployment, StatefulSet, DaemonSet]
validate:
message: "Image tag ':latest' is not allowed."
pattern:
spec:
containers:
- image: "!*:latest"
- name: require-limits-requests
match:
resources:
kinds: [Pod, Deployment, StatefulSet, DaemonSet]
validate:
message: "CPU and memory requests/limits are required."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
requests:
memory: "?*"
cpu: "?*"With ArgoCD, add a policy step in the sync pipeline or rely on admission; either way, your Git history is the source of truth.
Make proofs automatic: signed, mapped, and retained
Auditors don’t want screenshots; they want consistency. You want something you can regenerate without tears. The loop:
- Produce machine-readable results per run: JSON from
conftest,checkov,trivySARIF; JUnit if that’s your test harness. - Map each result to a control. Keep a small
oscalmapping repo so you can trace from control → policy → check. - Sign the evidence for integrity.
cosign/in-totoattestations bound to the image digest or commit SHA. - Store in an immutable bucket with lifecycle rules and KMS. Access by auditors is read-only presigned URLs.
Tiny OSCAL-ish mapping snippet:
# repo: policy/mappings/oscal.yaml
controls:
- id: NIST-SC-7
title: Boundary Protection
implemented-by:
- policy: terraform.aws.s3.deny
tool: conftest
artifact: s3://${EVIDENCE_BUCKET}/${RUN_ID}.json
- id: PCI-1.1.6
implemented-by:
- policy: kyverno.secure-pod-standards
tool: kyverno
artifact: s3://${EVIDENCE_BUCKET}/${CLUSTER}/${DATE}.jsonAttest a container image after scans pass:
# After build and scan succeed
cosign attest \
--predicate evidence/policy.json \
--type https://gitplumbers.dev/policy/v1 \
--key env://COSIGN_PRIVATE_KEY \
$IMAGE_DIGESTNow your deploy step gates on “image has a passing policy attestation.” No attestation, no deploy.
Balance regulated data constraints with delivery speed
This is where I’ve seen teams crash. They try to force PCI/HIPAA controls everywhere. Instead:
- Segment environments and repos.
apps-regulated/*go through stricter workflows;apps-unregulated/*move faster. Different admission policies, base images, and runtime monitors. - Version policies. Policy v1.4 applies to all new services as of date X. Existing services get a sunset period with a remediation backlog.
- Exceptions-as-code. PR a waiver with owner, risk, expiry, and link to a ticket. Store next to the service, evaluated by policy.
Example exception file consumed by Rego:
# exceptions/waivers.yaml
waivers:
- id: WVR-123
policy: terraform.aws.s3.deny
resource: aws_s3_bucket.my_legacy_export
owner: data-platform
expires: 2025-01-31
risk: "Legacy vendor integration requires public ACL; fronted by signed URLs."And Rego that respects it:
package terraform.aws.s3
default waived = false
waived {
some w
w := input.waivers[_]
w.policy == "terraform.aws.s3.deny"
time.now_ns() < time.parse_rfc3339_ns(w.expires)
}
# Example use in deny rules
deny[msg] {
not waived
# ... your deny conditions ...
}- Data-aware pipelines. Use labels/annotations like
data.gitplumbers.dev/classification=phito route workloads to the regulated path. Admission denies missing labels.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-data-classification
spec:
validationFailureAction: enforce
rules:
- name: require-classification-label
match:
resources:
kinds: [Deployment]
validate:
message: "Workloads must declare data classification."
pattern:
metadata:
labels:
data.gitplumbers.dev/classification: "?*"What to measure and how to tune
You’ll only keep speed if you instrument the system and prune false positives ruthlessly.
- Policy pass rate per repo/service (target >95% after 2 sprints). Alert on regressions.
- Median time-to-remediate policy failures (keep <24h for non-prod, <72h for prod parity issues).
- Exception debt: active waivers count and average age. Trend down.
- False positive rate: ratio of dismissed findings to total. Anything >10% demands rule tuning.
- Drift incidents: prod resources that bypass Git (should be near zero with GitOps).
Dashboards: Prometheus counters from CI runs, Grafana for trends, and weekly triage with eng + security. We’ve cut “policy-induced developer time” by 30–40% just by fixing chatty rules.
What this looks like in the wild (numbers that matter)
On a recent GitPlumbers engagement for a fintech handling PCI and PII:
- PR policy checks averaged 2m18s, blocking on criticals only.
- Deployment lead time stayed within historical variance (p50 +6%).
- Evidence generation and signing added 14 seconds per build.
- Audit request cycle shrank from “please export everything” to links with signed JSON, reducing prep from 3 weeks to 3 days.
- First-quarter false positives dropped from 22% to 6% after two tuning passes.
No heroics. Just boring, deterministic automation.
Start small, then ratchet
If you’re starting from zero, here’s the crawl-walk-run that’s actually worked for us:
- Crawl: Add
checkovon Terraform andtrivyon images in PRs. Fail on criticals. Store JSON to S3. - Walk: Introduce
conftestwith 3–5 high-impact Rego rules. Add Kyverno for:latestand resource limits. Sign evidence withcosign. - Run: GitOps-only deploys with ArgoCD; admission gated by policies; exceptions-as-code with expiry; dashboards and weekly tune-ups.
Don’t try to boil the ocean in sprint one. Pick the two controls most likely to land you on the front page (public buckets and plaintext secrets), automate them, and build from there.
Key takeaways
- Translate policies into code: guardrails (prevent), checks (detect), and proofs (attest).
- Put policy gates where they hurt least and help most: PR, build, deploy, and runtime.
- Automate evidence: store machine-readable results mapped to controls (OSCAL) and sign them.
- Separate regulated and unregulated paths, version policies, and time-box exceptions.
- Measure policy pass rate, time-to-remediate, and false positive rate; tune weekly.
Implementation checklist
- Stand up OPA/Kyverno policies for infra and K8s resources.
- Add Terraform/Helm scans to PR checks and block on criticals.
- Generate and store signed JSON evidence per run, mapped to controls (OSCAL).
- Gate deployments with ArgoCD/Gatekeeper or Kyverno admission policies.
- Implement exception-as-code with expiry, owner, and linked risk ticket.
- Segment regulated workloads with stricter pipelines and image baselines.
- Dashboards: policy pass rate, median remediation time, false-positive rate, drift incidents.
Questions we hear from teams
- We already run scanners. Why add policy-as-code?
- Scanners tell you what’s wrong. Policy-as-code decides what’s allowed. The former produces lists; the latter creates gates tied to your business rules and generates signed evidence per change. That’s the difference between a noisy report and a reliable audit trail.
- Will this slow down our deploys?
- Done right, PR checks add 2–5 minutes and deploy gates add milliseconds (admission) to seconds (attestation verify). We keep builds parallelized, run deterministic checks on static artifacts, and fail fast only on criticals. Our clients’ lead times typically stay within ±10%.
- Do we need OPA/Rego if we already use Kyverno?
- Kyverno is great for Kubernetes resource policies. You’ll still want OPA/Rego or tools like Checkov for Terraform/CloudFormation, and sometimes Gatekeeper for cross-resource constraints. Many teams run both: Kyverno for K8s admission + Conftest/Checkov for IaC.
- How do we handle legacy workloads that can’t comply yet?
- Use exceptions-as-code with expiry, risk owner, and a remediation plan. Segment those workloads into a stricter enclave, add compensating controls (e.g., WAF, egress blocks), and track exception debt. The pipeline should surface and time-box the debt, not hide it.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
