CI/CD Security Gates That Catch Real Bugs (Without Killing Your Velocity)
What we wire into pipelines at companies that can’t afford a breach — and how to deploy it in a week without wrecking dev flow.
If it doesn’t break the build, it’s just a dashboard.Back to all posts
I’ve seen the same movie a dozen times: a harmless-looking PR sails through CI, merges Friday, and by Monday your cluster is mining Monero because someone copy-pasted a "helpful" Dockerfile from a blog. Security dashboards looked green, but nothing actually blocked the deploy.
This guide is the wiring diagram we use at GitPlumbers when we harden pipelines for teams that need wins in days, not quarters. It’s opinionated, tool-agnostic enough, and focused on what to gate vs. what to just alert — so you catch real issues without grinding dev velocity to a halt.
Map the Security Gates Across Your Pipeline
You don’t need 20 tools. You need the right tripwires in the right places.
- PR-time: SAST and secret scanning. Fast feedback, block obvious footguns.
- Build-time: SBOM generation and dependency/image scans. Decide what blocks production.
- Pre-deploy: IaC/K8s policy checks; Dockerfile lint; prevent root,
latesttags, and dangerous capabilities. - Deploy-time: Admission controls verify signatures and enforce runtime policies.
- Post-deploy: DAST against staging; runtime alerts (Falco/eBPF) in prod (alert-only at first).
Metrics to watch:
- Gate pass rate (% of builds passing each security stage)
- Time-to-green after a security fail (median hours)
- Critical vulns older than 30 days (count)
- False positive rate (% of findings dismissed as noise)
Decision rules:
- Fail on leaked secrets, unsigned images, and Critical vulns in production artifacts
- Warn on Mediums while you tune noise for the first 2-4 weeks
- Waive with expiry, enforced in-repo, not by email
PR-Time: SAST and Secret Scanning That Doesn’t Nag
Start where developers live. Keep it sub-3 minutes.
Recommended tools:
- SAST:
Semgrep(fast, customizable),CodeQL(deeper, slower), orSonarQubeif you already run it. - Secrets:
gitleaks(fast), or native GitHub/GitLab secret scanning if you pay for it.
Example: GitHub Actions for Semgrep + gitleaks on pull_request:
ame: security-pr
on:
pull_request:
branches: [ main ]
permissions:
contents: read
pull-requests: write
jobs:
semgrep:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: returntocorp/semgrep-action@v1
with:
config: p/ci # start curated; add custom rules later
generateSarif: true
publishToken: ${{ secrets.SEMGREP_TOKEN }} # optional
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: semgrep.sarif
gitleaks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: zricethezav/gitleaks-action@v2
with:
args: detect --source=. --redact --no-gitGitLab CI variant:
stages: [test]
semgrep:
stage: test
image: returntocorp/semgrep:latest
script:
- semgrep ci --config=p/ci --sarif --output semgrep.sarif
artifacts:
reports:
sast: semgrep.sarif
secrets:
stage: test
image: zricethezav/gitleaks:latest
script:
- gitleaks detect --source=. --redact --no-git --report-format sarif --report-path gitleaks.sarif
artifacts:
reports:
secret_detection: gitleaks.sarifCheckpoints:
- Keep runtime < 3 minutes per PR
- Fail PRs on secrets and Critical/High SAST only; leave Mediums as PR comments
- Track: PR block rate < 10% after first week; false positives < 10%
Build-Time: SBOM, Dependency, and Image Scans With Real Gates
This is where you catch supply-chain issues and set policies that matter.
Tools that work:
- SBOM:
Syft(CycloneDX, SPDX) — fast and scriptable - Image/OS deps:
TrivyorGrype - App deps:
Trivy fs,OWASP Dependency-Check, orSnykif you have it - License:
FOSSAorSnyk licenseif legal cares (they do)
Example GitHub Actions job (build + SBOM + scan):
name: build-and-scan
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: |
docker build -t ghcr.io/acme/api:${{ github.sha }} .
- name: Generate SBOM (CycloneDX JSON)
uses: anchore/sbom-action@v0
with:
artifact-name: sbom-${{ github.sha }}.json
format: cyclonedx-json
image: ghcr.io/acme/api:${{ github.sha }}
- name: Trivy image scan (fail on High/Critical)
uses: aquasecurity/trivy-action@0.21.0
with:
image-ref: ghcr.io/acme/api:${{ github.sha }}
format: 'sarif'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'HIGH,CRITICAL'
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-results.sarif
- name: Push image (only if scan passed)
if: success()
run: docker push ghcr.io/acme/api:${{ github.sha }}Jenkinsfile snippet (for the “we’re not on Actions” crowd):
pipeline {
agent any
stages {
stage('Build') { steps { sh 'docker build -t registry/api:$GIT_COMMIT .' } }
stage('SBOM') { steps { sh 'syft registry/api:$GIT_COMMIT -o cyclonedx-json > sbom.json' } }
stage('Scan') {
steps { sh 'trivy image --ignore-unfixed --severity HIGH,CRITICAL --exit-code 1 registry/api:$GIT_COMMIT' }
}
stage('Push') { when { expression { currentBuild.resultIsBetterOrEqualTo('SUCCESS') } } steps { sh 'docker push registry/api:$GIT_COMMIT' } }
}
post { always { archiveArtifacts artifacts: 'sbom.json' } }
}Policy choices that won’t backfire:
- Production images: fail on Critical/High; staging: warn on High; ignore Medium for first month
- Require SBOM artifact on every build; keep 90 days in object storage
- License scan: fail only on copyleft or disallowed licenses; warn on “unknown” while you tag
Metrics:
- Builds blocked by High/Critical < 5% (after 2 weeks)
- Median fix time for blocked builds < 24 hours
- Critical vulns older than 30 days: zero for prod images
Infrastructure and Container Policies: Stop Bad Manifests at the Door
The fastest way to get hacked is a good app running in a bad container. Enforce basics in CI and at admission.
Tools:
- Dockerfile:
Hadolint - IaC:
Checkovortfsecfor Terraform;kube-linterfor K8s manifests - Admission:
KyvernoorOPA Gatekeeper
CI checks (GitHub Actions):
name: iac-and-dockerfile
on: [pull_request]
jobs:
dockerfile:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hadolint/hadolint-action@v3.1.0
with:
dockerfile: Dockerfile
iac:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: bridgecrewio/checkov-action@v12
with:
directory: .
framework: terraform,kubernetes
soft_fail: true # start warn-only, then flip to falseCluster gate (Kyverno): block latest tags and running as root:
aPiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: safe-images
spec:
validationFailureAction: enforce
rules:
- name: disallow-latest
match:
resources:
kinds: [Pod, Deployment, StatefulSet, DaemonSet]
validate:
message: "Images must be pinned (no :latest)"
pattern:
spec:
containers:
- image: "!*:latest"
- name: require-nonroot
match:
resources:
kinds: [Pod, Deployment, StatefulSet, DaemonSet]
validate:
message: "Containers must not run as root"
pattern:
spec:
securityContext:
runAsNonRoot: true
containers:
- securityContext:
runAsUser: ">=10000"Checkpoints:
- CI fails on Dockerfile High severities; IaC starts as soft-fail, becomes hard gate in week 3
- Admission blocks manifests with
latest, privileged, or hostPath mappings - Track: policy block count trends down week over week; median remediation < 1 day
DAST and Runtime: Test What Users Actually Hit
Don’t point ZAP at prod. Use ephemeral or staging envs, tied to release candidates.
Tools:
- DAST:
OWASP ZAP(baseline scan),Burpif you’re fancy/security team managed - Perf + sanity:
k6can double as a smoke harness - Runtime detection:
Falco(eBPF) for alerts on suspicious syscalls (alert-only early on)
Example nightly ZAP run (GitHub Actions):
name: dast-nightly
on:
schedule:
- cron: '0 2 * * *'
jobs:
zap:
runs-on: ubuntu-latest
steps:
- name: ZAP baseline scan
uses: zaproxy/action-baseline@v0.10.0
with:
target: https://staging.acme.internal
rules_file_name: .zap/rules.tsv
cmd_options: '-a -m 5'.zap/rules.tsv example (downgrade known false positives):
10021 IGNORE Cache control
40012 WARN X-Frame-Options header missingWhat to gate:
- For GA releases: if DAST finds Critical auth/session issues, block release
- Otherwise: treat DAST as a nightly alert stream and ticket the results
Metrics:
- Nightly scan completion success rate > 95%
- Critical DAST findings MTTR < 72 hours
- False-positive rate < 15% after first month
Supply Chain: Sign Everything and Verify at Admission
If it isn’t signed, it’s just a blob from the internet. Use Cosign with keyless signing and enforce verification in the cluster and your GitOps controller.
Signing in CI (GitHub Actions):
# build image first
docker build -t ghcr.io/acme/api:${GITHUB_SHA} .
docker push ghcr.io/acme/api:${GITHUB_SHA}
# keyless sign with OIDC identity
COSIGN_EXPERIMENTAL=1 cosign sign ghcr.io/acme/api:${GITHUB_SHA}
# create SLSA provenance attestation
git checkout -q
gh attestation sign --predicate slsa-provenance.json --repository ghcr.io/acme/api --subject ghcr.io/acme/api:${GITHUB_SHA}Admission verification (Kyverno policy fragment):
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-signatures
spec:
validationFailureAction: enforce
rules:
- name: require-cosign
match:
resources:
kinds: [Pod, Deployment]
verifyImages:
- image: "ghcr.io/acme/*"
attestors:
- entries:
- keyless:
issuer: "https://token.actions.githubusercontent.com"
subject: "repo:acme/api:ref:refs/heads/main"GitOps verification (ArgoCD image updater or Flux):
Fluxsupports Cosign verification before deploy viaImagePolicyArgoCDcan run a pre-sync hook job to verify signatures; or use Kyverno/Gatekeeper to enforce
Checkpoints:
- 100% of production images signed and verified
- Admission rejects unsigned images and missing provenance
- Track: zero unsigned images deployed; time to fix signature config < 1 day
Make It Stick: Waivers, Dashboards, and Sane Alerts
The tech is the easy part. Process prevents backsliding.
Operational habits that work:
- Waivers in-repo with expiry; require
CODEOWNERSapproval - PR comments over Slack spam; aggregate alerts to channels with an error budget (e.g., < 5 messages/day)
- One dashboard your execs can read: Criticals by service, time-to-green, unsigned images blocked
- Weekly triage with dev leads; close stale findings; fix noisy rules quickly
Example waiver pattern (YAML checked into repo):
# .security-waivers.yaml
- id: TRIVY-APK-2023-1234
severity: HIGH
expires: 2025-01-31
reason: Base image patch pending vendor release
owner: team-paymentsAnd a tiny script to enforce expiry in CI:
python scripts/check_waivers.py .security-waivers.yaml || exit 1KPI starter set:
- Gate pass rate by stage (PR, build, deploy)
- Median time-to-green after security fail
- Critical vulns > 30 days (goal: 0 in prod artifacts)
- False positive rate (goal: < 10%)
Results we typically see after 4-6 weeks:
- 70–85% reduction in leaked-secrets incidents
- Time-to-green for blocked builds down to same day
- Near-zero unsigned images in prod
- Developers still shipping: lead time unchanged within margin of error
Key takeaways
- Start with PR-time SAST and secret scanning; fail on secrets and critical issues, only warn on stylistic rules.
- Generate an SBOM on every build and gate on High/Critical vulns for production images; make Mediums visible but non-blocking initially.
- Scan Dockerfiles and IaC (Terraform/K8s) in CI; enforce non-root and pinned tags with Kyverno or Gatekeeper in the cluster.
- Sign images and attestations with Cosign; verify signatures and SLSA provenance at admission and in GitOps (ArgoCD/Flux).
- Run lightweight DAST nightly against staging; only block deployment if critical auth or P1 issues are detected.
- Track a small KPI set: gate pass rate, time-to-green, critical vulns >30 days, false-positive rate, waiver SLA.
- Tight feedback beats big-bang rollouts: iterate thresholds and suppress noisy rules quickly to keep dev trust.
Implementation checklist
- Map your control points: PR, build, image, deploy, runtime.
- Pick tools per layer: Semgrep, gitleaks, Trivy/Grype, Syft (SBOM), Checkov/tfsec, Kyverno, ZAP, Cosign.
- Create a baseline and thresholds: fail on secrets/critical; warn on others for the first two weeks.
- Wire PR scanning first; add SBOM + image scan next; then IaC/container policies; then signing; then DAST.
- Publish SBOMs and scan reports to an artifact store with retention.
- Implement waivers with expiry in-repo; require codeowner approval for High/Critical exemptions.
- Add admission policies to verify Cosign signatures and block risky pod specs.
- Measure and iterate weekly; tune rules to keep false positives under 10%.
Questions we hear from teams
- Will this tank our CI times?
- Not if you stage it right. PR-time SAST/secrets should stay under 3 minutes. SBOM + Trivy adds ~60–120s per image with caching. DAST is nightly against staging. Net effect on median CI time is usually +2–4 minutes, which is acceptable if you’re blocking only on Critical/High early on.
- Open source or commercial tools?
- Start open source: Semgrep, gitleaks, Trivy/Grype, Syft, Checkov, Kyverno. If you need portfolio views, SSO, or developer assignment, layer Snyk/Prisma/FOSSA later. The gate logic lives in your pipeline and cluster, not the vendor UI.
- How do we manage false positives?
- Keep rulesets tight (`p/ci` in Semgrep, tuned policies in Checkov), suppress noisy checks in-repo, and enforce waiver expirations. Track false positive rate; if it’s over 10%, fix rules before adding new gates.
- We’re a monorepo with polyglot services — does this still work?
- Yes. Run language-specific SAST jobs by path filters, generate one SBOM per image, and report at the service label. Use a shared pipeline template (GitHub composite actions, GitLab includes, Jenkins shared library) so teams don’t hand-roll configs.
- What about AI-generated code and “vibe coding”?
- Treat it like any other risky input: SAST on PR, secrets scan, dependency pinning, and mandatory reviews on critical paths. We’ve had to clean up Copilot-spawned Dockerfiles that ran as root and referenced `latest`. The gates above catch that before it hits prod.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
