Ship Fast, Pass Audit: Turning Policies into Pipeline Guardrails That Don’t Kill Velocity

If auditors still email you CSVs while prod deploys by hand-wavy Slack approvals, you’re one Sev-1 away from a public postmortem. Bake compliance into the pipeline, generate proofs automatically, and keep shipping without drama.

Compliance should be a compiler error, not a calendar event.
Back to all posts

The day an auditor ran kubectl in prod

Two summers ago, I watched a Big 4 auditor run kubectl get pods -A in a shared cluster and find an image running :latest with hostPath mounted. You can guess the rest: freeze on changes, retroactive evidence requests, and a seven-week quarter-end death march. I've seen this movie. The fix isn’t bigger binders. It’s turning policies into guardrails, checks, and automated proofs wired into your pipeline.

This is how we do it at GitPlumbers without turning your engineers into compliance clerks.

Translate policy to code: guardrails, checks, proofs

Policies written as prose won’t save you at 2 a.m. You need three things:

  • Guardrails (prevent): Admission and PR gates that stop bad changes. Think kyverno policies in the cluster, conftest on Terraform plans, and checkov for IaC.
  • Checks (detect): Scans and drift detection that flag gaps fast: trivy for images, kube-bench for CIS, kube-hunter, tfsec/checkov in CI.
  • Proofs (attest): Machine-readable evidence tied to a commit, build, and environment; signed and retained. Use oscal mappings, cosign/in-toto attestations, and immutable storage.

Map your controls (SOC 2, HIPAA, PCI DSS, NIST 800-53) to concrete pipeline steps. Example mapping:

  • NIST SC-7 (boundary protection) → block public S3 buckets in Terraform plans; deny egress to 0.0.0.0/0 in security groups.
  • HIPAA 164.312 (encryption) → require efs/rds encryption flags in IaC; verify TLS annotations on services.
  • PCI 1.1.6 (change control) → signed attestations per deploy; GitOps-only changes to prod.

Wire it into the pipeline without killing speed

I’ve seen teams bolt on scanners everywhere and tank their lead time by 50%. What works:

  1. Pre-commit (fast feedback): Local pre-commit hooks for tflint, yamllint, kubeconform. Don’t block teammates with cloud auth.
  2. Pull request (blockers): Run deterministic IaC policy checks on plans/manifests. Fail the PR on criticals. Keep it under 3–5 minutes.
  3. Build (supply chain): SBOM (syft), image scan (trivy), sign artifacts (cosign). Attach a pass/fail attestation.
  4. Deploy (admission): kyverno or OPA Gatekeeper in clusters; deny noncompliant manifests. GitOps with ArgoCD so changes are declarative and diffable.
  5. Runtime (detect/drift): Continuous scans (Falco, kube-bench), drift detection for cloud (Cloud Custodian, Steampipe), and nightly evidence rollups.

Keep regulated workloads (PHI/PCI) on a stricter path: additional checks, pinned base images, tighter admission, slower rollback allowance. Everyone else gets the fast lane.

Concrete examples: Terraform + OPA, Kubernetes + Kyverno

Let’s start with Terraform. Deny public S3 buckets at PR time using conftest and Rego on the Terraform plan.

package terraform.aws.s3

default deny = []

# Deny buckets with public ACLs
deny[msg] {
  some i
  rc := input.resource_changes[i]
  rc.type == "aws_s3_bucket"
  after := rc.change.after
  acl := lower(after.acl)
  acl == "public-read" or acl == "public-read-write"
  msg := sprintf("S3 bucket %s has public ACL '%s'", [after.bucket, acl])
}

# Require public access block resource for each bucket
deny[msg] {
  some i
  rc := input.resource_changes[i]
  rc.type == "aws_s3_bucket"
  bucket_name := rc.change.after.bucket
  not public_block_exists(bucket_name)
  msg := sprintf("S3 bucket %s missing aws_s3_bucket_public_access_block", [bucket_name])
}

public_block_exists(bucket_name) {
  some j
  rb := input.resource_changes[j]
  rb.type == "aws_s3_bucket_public_access_block"
  rb.change.after.bucket == bucket_name
}

A minimal GitHub Actions PR workflow:

name: policy-check
on:
  pull_request:
    branches: [ main ]

jobs:
  terraform-policy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - name: Terraform init/plan
        run: |
          terraform -chdir=infrastructure init -input=false
          terraform -chdir=infrastructure plan -out=tfplan -input=false -lock=false
          terraform -chdir=infrastructure show -json tfplan > plan.json
      - name: Conftest
        uses: instrumenta/conftest-action@v1
        with:
          files: infrastructure/plan.json
          policy: policy/
      - name: Upload evidence
        if: always()
        run: |
          jq -n --arg sha "${{ github.sha }}" --arg run "${{ github.run_id }}" \
            '{control:"NIST-SC-7", result:"'"${{ job.status }}"'", sha:$sha, run:$run, tool:"conftest"}' \
            > evidence/policy.json
          aws s3 cp evidence/policy.json s3://compliance-evidence/${{ github.run_id }}.json \
            --sse aws:kms --sse-kms-key-id $EVIDENCE_KMS_KEY

For Kubernetes, kyverno admission rules give you fast, explainable guardrails. Example policy to block :latest and require CPU/memory:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: secure-pod-standards
spec:
  validationFailureAction: enforce
  rules:
    - name: disallow-latest-tag
      match:
        resources:
          kinds: [Pod, Deployment, StatefulSet, DaemonSet]
      validate:
        message: "Image tag ':latest' is not allowed."
        pattern:
          spec:
            containers:
              - image: "!*:latest"
    - name: require-limits-requests
      match:
        resources:
          kinds: [Pod, Deployment, StatefulSet, DaemonSet]
      validate:
        message: "CPU and memory requests/limits are required."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"
                  requests:
                    memory: "?*"
                    cpu: "?*"

With ArgoCD, add a policy step in the sync pipeline or rely on admission; either way, your Git history is the source of truth.

Make proofs automatic: signed, mapped, and retained

Auditors don’t want screenshots; they want consistency. You want something you can regenerate without tears. The loop:

  • Produce machine-readable results per run: JSON from conftest, checkov, trivy SARIF; JUnit if that’s your test harness.
  • Map each result to a control. Keep a small oscal mapping repo so you can trace from control → policy → check.
  • Sign the evidence for integrity. cosign/in-toto attestations bound to the image digest or commit SHA.
  • Store in an immutable bucket with lifecycle rules and KMS. Access by auditors is read-only presigned URLs.

Tiny OSCAL-ish mapping snippet:

# repo: policy/mappings/oscal.yaml
controls:
  - id: NIST-SC-7
    title: Boundary Protection
    implemented-by:
      - policy: terraform.aws.s3.deny
        tool: conftest
        artifact: s3://${EVIDENCE_BUCKET}/${RUN_ID}.json
  - id: PCI-1.1.6
    implemented-by:
      - policy: kyverno.secure-pod-standards
        tool: kyverno
        artifact: s3://${EVIDENCE_BUCKET}/${CLUSTER}/${DATE}.json

Attest a container image after scans pass:

# After build and scan succeed
cosign attest \
  --predicate evidence/policy.json \
  --type https://gitplumbers.dev/policy/v1 \
  --key env://COSIGN_PRIVATE_KEY \
  $IMAGE_DIGEST

Now your deploy step gates on “image has a passing policy attestation.” No attestation, no deploy.

Balance regulated data constraints with delivery speed

This is where I’ve seen teams crash. They try to force PCI/HIPAA controls everywhere. Instead:

  • Segment environments and repos. apps-regulated/* go through stricter workflows; apps-unregulated/* move faster. Different admission policies, base images, and runtime monitors.
  • Version policies. Policy v1.4 applies to all new services as of date X. Existing services get a sunset period with a remediation backlog.
  • Exceptions-as-code. PR a waiver with owner, risk, expiry, and link to a ticket. Store next to the service, evaluated by policy.

Example exception file consumed by Rego:

# exceptions/waivers.yaml
waivers:
  - id: WVR-123
    policy: terraform.aws.s3.deny
    resource: aws_s3_bucket.my_legacy_export
    owner: data-platform
    expires: 2025-01-31
    risk: "Legacy vendor integration requires public ACL; fronted by signed URLs."

And Rego that respects it:

package terraform.aws.s3

default waived = false

waived {
  some w
  w := input.waivers[_]
  w.policy == "terraform.aws.s3.deny"
  time.now_ns() < time.parse_rfc3339_ns(w.expires)
}

# Example use in deny rules
deny[msg] {
  not waived
  # ... your deny conditions ...
}
  • Data-aware pipelines. Use labels/annotations like data.gitplumbers.dev/classification=phi to route workloads to the regulated path. Admission denies missing labels.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-data-classification
spec:
  validationFailureAction: enforce
  rules:
    - name: require-classification-label
      match:
        resources:
          kinds: [Deployment]
      validate:
        message: "Workloads must declare data classification."
        pattern:
          metadata:
            labels:
              data.gitplumbers.dev/classification: "?*"

What to measure and how to tune

You’ll only keep speed if you instrument the system and prune false positives ruthlessly.

  • Policy pass rate per repo/service (target >95% after 2 sprints). Alert on regressions.
  • Median time-to-remediate policy failures (keep <24h for non-prod, <72h for prod parity issues).
  • Exception debt: active waivers count and average age. Trend down.
  • False positive rate: ratio of dismissed findings to total. Anything >10% demands rule tuning.
  • Drift incidents: prod resources that bypass Git (should be near zero with GitOps).

Dashboards: Prometheus counters from CI runs, Grafana for trends, and weekly triage with eng + security. We’ve cut “policy-induced developer time” by 30–40% just by fixing chatty rules.

What this looks like in the wild (numbers that matter)

On a recent GitPlumbers engagement for a fintech handling PCI and PII:

  • PR policy checks averaged 2m18s, blocking on criticals only.
  • Deployment lead time stayed within historical variance (p50 +6%).
  • Evidence generation and signing added 14 seconds per build.
  • Audit request cycle shrank from “please export everything” to links with signed JSON, reducing prep from 3 weeks to 3 days.
  • First-quarter false positives dropped from 22% to 6% after two tuning passes.

No heroics. Just boring, deterministic automation.

Start small, then ratchet

If you’re starting from zero, here’s the crawl-walk-run that’s actually worked for us:

  1. Crawl: Add checkov on Terraform and trivy on images in PRs. Fail on criticals. Store JSON to S3.
  2. Walk: Introduce conftest with 3–5 high-impact Rego rules. Add Kyverno for :latest and resource limits. Sign evidence with cosign.
  3. Run: GitOps-only deploys with ArgoCD; admission gated by policies; exceptions-as-code with expiry; dashboards and weekly tune-ups.

Don’t try to boil the ocean in sprint one. Pick the two controls most likely to land you on the front page (public buckets and plaintext secrets), automate them, and build from there.

Related Resources

Key takeaways

  • Translate policies into code: guardrails (prevent), checks (detect), and proofs (attest).
  • Put policy gates where they hurt least and help most: PR, build, deploy, and runtime.
  • Automate evidence: store machine-readable results mapped to controls (OSCAL) and sign them.
  • Separate regulated and unregulated paths, version policies, and time-box exceptions.
  • Measure policy pass rate, time-to-remediate, and false positive rate; tune weekly.

Implementation checklist

  • Stand up OPA/Kyverno policies for infra and K8s resources.
  • Add Terraform/Helm scans to PR checks and block on criticals.
  • Generate and store signed JSON evidence per run, mapped to controls (OSCAL).
  • Gate deployments with ArgoCD/Gatekeeper or Kyverno admission policies.
  • Implement exception-as-code with expiry, owner, and linked risk ticket.
  • Segment regulated workloads with stricter pipelines and image baselines.
  • Dashboards: policy pass rate, median remediation time, false-positive rate, drift incidents.

Questions we hear from teams

We already run scanners. Why add policy-as-code?
Scanners tell you what’s wrong. Policy-as-code decides what’s allowed. The former produces lists; the latter creates gates tied to your business rules and generates signed evidence per change. That’s the difference between a noisy report and a reliable audit trail.
Will this slow down our deploys?
Done right, PR checks add 2–5 minutes and deploy gates add milliseconds (admission) to seconds (attestation verify). We keep builds parallelized, run deterministic checks on static artifacts, and fail fast only on criticals. Our clients’ lead times typically stay within ±10%.
Do we need OPA/Rego if we already use Kyverno?
Kyverno is great for Kubernetes resource policies. You’ll still want OPA/Rego or tools like Checkov for Terraform/CloudFormation, and sometimes Gatekeeper for cross-resource constraints. Many teams run both: Kyverno for K8s admission + Conftest/Checkov for IaC.
How do we handle legacy workloads that can’t comply yet?
Use exceptions-as-code with expiry, risk owner, and a remediation plan. Segment those workloads into a stricter enclave, add compensating controls (e.g., WAF, egress blocks), and track exception debt. The pipeline should surface and time-box the debt, not hide it.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to GitPlumbers about policy-as-code that won’t slow you down See how we wire OPA/Kyverno into ArgoCD

Related resources