Stop Waving Policy PDFs: Turn GDPR/CCPA Into Guardrails Your CI Understands
Translate privacy law into tests, configs, and automated proofs — without grinding delivery to a halt.
> Auditors don’t read your Confluence. They read your logs and artifacts.Back to all posts
The audit that hurt more than the outage
I’ve sat in the room where the breach wasn’t the headline — the audit was. Adtech startup, BigQuery, and a rushed data pipeline pushed raw email hashes into a public dataset for three days. No PII left prod, but the audit trail was a murder scene: no data classification, no retention controls, zero proofs the fix stayed fixed. Legal had a 40-page GDPR policy. Engineering had… a Notion doc and crossed fingers.
Here’s what actually works: translate GDPR/CCPA into guardrails, checks, and automated proofs that your CI/CD and runtime can enforce. You keep shipping. Auditors get receipts. Nobody prints a binder.
Map law to controls engineers can ship
You don’t need to be a lawyer. You need crisp control objectives and owners. Boil GDPR/CCPA down to 6 things you can test:
- Data minimization & purpose limitation: Only collect what’s needed; only use it for declared purposes.
- Consent/opt-out: Respect EU consent and CCPA “do not sell/share” signals in code paths.
- Security of processing: Encryption, access control, monitoring. Prove it.
- Retention & deletion: Keep it only as long as needed; delete comprehensively.
- Data Subject Rights (DSR): Access, delete, portability; respond on time.
- Transfers & processors: Track where data goes and who touches it.
Translate those into guardrails:
- Classification tags at schema boundaries: Add
classification=pii|sensitive|public,purpose=analytics|support,retention=30d|1yto protobuf/Avro/DB schemas and topics. - Policy-as-code: OPA/Rego checks on Terraform, K8s, and pipeline configs. Block anti-patterns (public buckets, unencrypted topics, PII in debug logs).
- Runtime controls: Egress proxies, DLP, field-level encryption/tokenization.
- Workflow automation: Orchestrated DSR and deletion jobs with proofs.
- Evidence generation: Attestations in CI/CD stored immutably.
Guardrails as code: block bad, bless good
You don’t win arguments with Slack threads. You win with failing builds.
- Terraform + OPA (Conftest): Kill public buckets and missing encryption.
package terraform.s3
# Deny S3 buckets without SSE or with public ACL
violation[msg] {
some r
input.resources[r].type == "aws_s3_bucket"
not input.resources[r].values.server_side_encryption_configuration
msg := sprintf("S3 bucket %s missing SSE", [input.resources[r].name])
}
violation[msg] {
some r
input.resources[r].type == "aws_s3_bucket_acl"
input.resources[r].values.acl == "public-read"
msg := sprintf("Bucket ACL public-read not allowed: %s", [input.resources[r].name])
}Run it in CI:
terraform show -json plan.out > plan.json
conftest test plan.json --policy policy/- Kubernetes + Gatekeeper: Require data classification labels and block
piiworkloads outside trusted namespaces.
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{
"msg": msg,
}] {
missing := {l | l := input.parameters.labels[_]; not input.review.object.metadata.labels[l]}
count(missing) > 0
msg := sprintf("Missing labels: %v", [missing])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-data-classification
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
labels: ["data-classification", "purpose"]- GitOps: Deploy policies with
Argo CDso drift is visible and changes are reviewed. No snowflake clusters, no one-off exceptions.
Handle regulated data in code and storage
If your devs can accidentally log a raw email, they will — on a Friday.
- Classification in the type system: Annotate models with PII metadata and block unsafe sinks.
// privacy.ts
export type Classification = 'pii' | 'sensitive' | 'public'
export interface FieldMeta { cls: Classification; purpose: ('analytics'|'support')[]; retentionDays: number }
export function redact<T>(obj: T, meta: Record<keyof T, FieldMeta>): Partial<T> {
const out: any = {}
for (const k in obj) {
if (meta[k]?.cls === 'pii') out[k] = 'REDACTED'
else out[k] = (obj as any)[k]
}
return out
}
// usage in a logger wrapper
logger.info('event', redact(userEvent, userEventMeta))Field-level encryption/tokenization: Use deterministic encryption for join keys, random for everything else. Prefer
KMS/Vaultmanaged keys and rotation.- Hashicorp Vault Transit or Google Cloud KMS.
- Tokenization service issuing format-preserving tokens for emails/phones.
Egress/DLP:
- Block outbound traffic by default; allowlisted domains via egress proxy (
Envoy+external_author cloud egress controls). - Turn on DLP in logs: Datadog Sensitive Data Scanner or Splunk DLP to catch PII in traces/logs.
- Block outbound traffic by default; allowlisted domains via egress proxy (
Third-party processors: Tag outbound events (
purpose,classification) and route with a policy router; drop events when consent is missing.
Automate DSR and retention so it doesn’t page SRE
Manual DSRs don’t scale. Build a pipeline once and reuse it.
- Identity graph: Map user IDs across systems (auth ID, billing ID, device IDs). Store in a privacy service, not a spreadsheet.
- Orchestrator: On
DELETErequest, fan out to systems with idempotent delete functions. Keep proofs. - Retention: Prefer native TTLs over cron greps.
Examples:
- S3 lifecycle
{
"Rules": [
{
"ID": "delete-analytics-30d",
"Filter": {"Tag": {"Key": "purpose", "Value": "analytics"}},
"Status": "Enabled",
"Expiration": {"Days": 30}
}
]
}- BigQuery partition expiration
ALTER TABLE analytics.events
SET OPTIONS (
partition_expiration_days = 30
);- GitHub Actions: DSR job + proof artifact
name: dsrcycle
on: workflow_dispatch
jobs:
delete:
runs-on: ubuntu-latest
steps:
- name: Run DSR delete
run: |
./dsr delete --subject ${{ inputs.subject_id }} --evidence dsr-evidence.json
- name: Attest evidence
run: |
cosign attest --predicate dsr-evidence.json --predicate-type https://example.com/dsr \
--key ${{ secrets.COSIGN_KEY }} ${{ github.sha }}
- name: Upload immutable evidence
run: |
aws s3 cp dsr-evidence.json s3://privacy-evidence/${{ github.run_id }}/ --endpoint-url ... \
--server-side-encryption aws:kmsTrack the SLOs: DSR MTTR, success rate, and deletion coverage across systems.
Proof beats promises: attestations on every deploy
Auditors don’t read your Confluence. They read your logs and artifacts.
- Prove policy checks ran: Store OPA test results and Terraform plan diffs as build artifacts.
- Prove supply chain integrity: Use
SLSA-style provenance,Sigstore cosignfor attestations, and sign your container images. - Immutability: Store evidence in S3 with Object Lock (WORM) or in a write-once bucket with restricted roles.
Minimal CI stage:
# policy checks
conftest test plan.json --policy policy/ --output json > opa-results.json
# sign and attest
cosign sign --key $COSIGN_KEY $IMAGE
cosign attest --predicate opa-results.json --predicate-type https://openpolicyagent.org/predicate $IMAGE
# ship to evidence bucket
aws s3 cp opa-results.json s3://privacy-evidence/$GIT_SHA/ --sse aws:kmsMake proofs discoverable: link evidence IDs in your change log, and expose a dashboard (read-only) for privacy and security teams.
Keep speed: golden paths, fast exceptions, no heroics
I’ve seen teams crater velocity by turning compliance into a ticket queue. Don’t.
- Golden paths: Provide templates that are already compliant:
- Terraform modules with SSE, lifecycle, and bucket policies baked in.
- Service scaffolds with logging redaction, consent checks, and classification annotations.
- Exceptions with expiry: Allow a temporary bypass only with a reason, approver, and TTL. Auto-expire and page on overdue cleanup.
- Just-in-time access: Ephemeral IAM via
sts:AssumeRolewith session tags and ABAC; log every elevation. - Shift-left privacy: Add simple linters to block
console.log(user.email)at PR time. Teach devs the 10 rules that matter.
If it’s slower than a Slack DM to security, it won’t get used.
The 30-day plan we run at GitPlumbers
You don’t need a year-long program to stop the bleeding. Here’s the starter plan we ship in real orgs:
- Week 1: Inventory and classify
- Pull schema registry, Terraform state, and data catalogs; label top-20 assets with
classification,purpose,retention. - Turn on DLP scanning in logs and tracing (Datadog/Splunk) and fix the top 5 offenders.
- Pull schema registry, Terraform state, and data catalogs; label top-20 assets with
- Week 2: Block the obvious
- Add OPA policies for encryption, public access, and K8s labels; wire into CI on Terraform and Helm.
- Ship golden Terraform modules and service templates.
- Week 3: DSR + retention automation
- Implement a DSR orchestrator and wire 3 core systems (auth DB, data warehouse, object store).
- Add lifecycle policies (S3) and partition expirations (BigQuery/Snowflake).
- Week 4: Proofs and exceptions
- Add cosign attestations, evidence bucket with Object Lock, and exception workflow with TTL.
- Define 3 privacy SLOs and put them on the exec dashboard.
By day 30, audits stop being fire drills and engineers stop tiptoeing around PII.
Key takeaways
- Translate policies into testable control objectives tied to code and infra.
- Classify data at the schema boundary and enforce with policy-as-code.
- Automate DSR and retention with orchestrated deletes and lifecycle policies.
- Generate machine-verifiable proofs (attestations) on every deploy.
- Use golden paths and exceptions-with-expiry to keep dev velocity high.
Implementation checklist
- Inventory regulated data and tag schemas/streams with purpose, retention, and classification.
- Enforce OPA/Rego policies in CI on Terraform/K8s manifests; block risky patterns.
- Implement field-level encryption/tokenization for PII at rest and in transit.
- Automate DSR and retention: deletion pipelines, lifecycle policies, proof logs.
- Add egress/DLP controls to stop PII from leaking into logs and third parties.
- Produce attestations and store evidence immutably with rotation and access controls.
- Track privacy SLOs: DSR MTTR, policy violation rate, time-to-exception-closure.
Questions we hear from teams
- How do we balance GDPR consent with CCPA opt-out in code?
- Model consent as a capability matrix per user. EU users require purpose-specific opt-in before data flows to processors; CCPA users can opt-out of sale/share. In code, check `consent.can('analytics')` before emitting. For CCPA, maintain a denylist and drop/route events accordingly. Log decisions with user region and purpose to generate proofs.
- What if we can’t encrypt everything at field level?
- Prioritize high-risk fields (email, phone, government IDs) and use deterministic encryption for join keys. For the rest, ensure at-rest encryption (KMS) and strong access controls. Add DLP and log redaction to reduce exposure risk, then incrementally expand coverage.
- Do we need a privacy tech vendor (OneTrust, DataGrail)?
- They help with portals and workflows. We still wire the back-end deletes and evidence into your systems. Start with an internal orchestrator and adopt a vendor if scale or legal/comms cadence warrants it.
- How do we prove deletions actually happened?
- Have each system return a verifiable receipt (row count affected, object versions deleted, timestamps). Aggregate into an evidence JSON, sign it with Sigstore, and store in an immutable bucket. Sample deletions and run reconciliation jobs to catch drift.
- Will these checks slow our CI/CD?
- OPA/Conftest on plans/manifests adds seconds, not minutes. Cache Terraform providers and parallelize checks. For heavy scans (full DLP), run post-deploy with kill switches and alerts instead of blocking builds.
- What about data transfers and SCCs?
- Track geo residency and processor locations in your asset catalog. Enforce region-pinned storage and routing in infra (e.g., GCP VPC Service Controls, AWS SCPs). Produce a register of processors with data categories and purposes — your legal team handles SCCs; engineering enforces the routing.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
