Security Scanning in CI/CD That Engineers Don’t Hate: A Step‑By‑Step Playbook
A pragmatic, battle-tested way to wire SAST, SCA, IaC, container, SBOM, and DAST into your pipeline—without grinding deploys to a halt.
If it’s not automated, it didn’t happen. — Every on-call SRE, after a 2 a.m. pagerBack to all posts
The Friday 4 p.m. Deploy That Taught Me to Gate on Risk
A unicorn fintech asked us why their EKS cluster kept bleeding secrets. Root cause: a well-meaning engineer copy-pasted AI-generated code that logged JWTs at debug and pushed a PR that sailed through because “tests passed.” Dependency scan? Ran nightly. SAST? Only on main. By Monday, Incident Response was in full cosplay.
I’ve seen this movie across banks, SaaS, and healthtech. The fix isn’t a $500k platform. It’s putting the right scanners at the right checkpoints and gating on risk, not noise. Here’s the playbook we actually ship at GitPlumbers.
What to Scan, Where in the Pipeline
Think in checkpoints:
- Pre-commit/PR: fast feedback, low false positives
SAST:semgreporcodeql(language-dependent)Secrets:gitleaks(fast)IaC:checkov/tfsecon changed files
- Build: deeper scans with baselines
SBOM:syft→ scan withtrivyorgrypeSCA: deps viatrivy fs/snykper language
- Container image:
- Image scan with
trivy imageorgrype - Sign with
cosign, verify in admission
- Image scan with
- Deploy gate:
- Policy enforcement with
OPA(conftest) and SLSA provenance checks
- Policy enforcement with
- Nightly/Off-path:
DAST:OWASP ZAPbaseline against staging- Full
CodeQL/Sonar + long-running fuzz on critical services
Rule of thumb: add <8 minutes to the PR path; move heavy scans off the critical path.
A Tooling Baseline That Works (Open-Source First)
- SAST:
Semgrep(fast, great rulesets) and/orCodeQL(deeper, slower). For JVM, consider SonarQube LTS. - Secrets:
Gitleaks(better signal than trufflehog v2; both are fine). - SCA & Containers:
Trivy(files, images, and SBOM scan);Grypeis a solid alternative. - SBOM:
Syftto generate CycloneDX or SPDX. - IaC:
Checkovortfsecfor Terraform/K8s/CloudFormation. - DAST:
OWASP ZAPbaseline for PRs/nightly. - Policy:
OPAwithconftestfor deploy-time gates. - Signing:
cosign+Sigstorekeyless where possible. - Supply chain posture:
OpenSSF Scorecards,SLSAprovenance in build.
Versions that haven’t burned us recently:
semgrep≥ 1.63,returntocorp/semgrep-action@v1aquasecurity/trivy-action@0.20.0anchore/syft≥ 1.1.0,anchore/grype≥ 0.77.0gitleaks/gitleaks-action@v2bridgecrewio/checkov≥ 3.2open-policy-agent/conftest≥ 0.51sigstore/cosign≥ 2.2
Reference Implementation: GitHub Actions YAML You Can Paste
Keep PR latency low and gate on new high/critical findings. Upload SARIF so devs get inline annotations.
name: ci-security
on:
pull_request:
push:
branches: [ main ]
jobs:
sast-and-secrets:
runs-on: ubuntu-22.04
permissions:
security-events: write
contents: read
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- name: Semgrep SAST (PR)
uses: returntocorp/semgrep-action@v1
with:
config: p/ci
generateSarif: true
sarifFile: semgrep.sarif
auditOn: pull_request
baselineRef: origin/main
- name: Upload Semgrep SARIF
uses: github/codeql-action/upload-sarif@v3
with: { sarif_file: semgrep.sarif }
- name: Gitleaks (secrets)
uses: gitleaks/gitleaks-action@v2
with:
args: "detect --no-git -v --report-format sarif --report-path gitleaks.sarif"
- name: Upload Gitleaks SARIF
uses: github/codeql-action/upload-sarif@v3
with: { sarif_file: gitleaks.sarif }
sbom-and-sca:
runs-on: ubuntu-22.04
needs: [sast-and-secrets]
steps:
- uses: actions/checkout@v4
- name: Install Syft
run: |
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin v1.1.0
syft version
- name: Generate CycloneDX SBOM
run: syft dir:. -o cyclonedx-json > sbom.cdx.json
- name: Trivy scan SBOM (fail on new highs)
uses: aquasecurity/trivy-action@0.20.0
with:
scan-type: 'sbom'
input: 'sbom.cdx.json'
severity: 'CRITICAL,HIGH'
ignore-unfixed: true
exit-code: '1'
format: 'sarif'
output: 'trivy-sbom.sarif'
- name: Upload Trivy SARIF
uses: github/codeql-action/upload-sarif@v3
with: { sarif_file: trivy-sbom.sarif }
image-scan-sign:
runs-on: ubuntu-22.04
needs: [sbom-and-sca]
env:
IMAGE: ghcr.io/${{ github.repository }}:${{ github.sha }}
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t $IMAGE .
- name: Trivy image scan
uses: aquasecurity/trivy-action@0.20.0
with:
image-ref: ${{ env.IMAGE }}
severity: 'CRITICAL,HIGH'
ignore-unfixed: true
exit-code: '1'
- name: Sign image with cosign
env:
COSIGN_EXPERIMENTAL: '1'
run: |
echo "$COSIGN_KEY" > cosign.key
cosign sign --key cosign.key $IMAGE
- name: Push image
run: docker push $IMAGE
iac-policy:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Checkov IaC
run: |
pipx install checkov
checkov -d infra/ --framework terraform,kubernetes --compact --soft-fail=$([[ "${{ github.event_name }}" == "pull_request" ]] && echo true || echo false)Nightly job for DAST (off the critical path):
name: nightly-dast
on:
schedule: [ { cron: '0 3 * * *' } ]
jobs:
zap:
runs-on: ubuntu-22.04
steps:
- name: ZAP Baseline Scan
uses: zaproxy/action-baseline@v0.12.0
with:
target: https://staging.example.com
rules_file_name: '.zap-baseline-rules.tsv' # tune false positives
fail_action: trueGitLab/Jenkins folks: same checkpoints apply. In GitLab CI, use semgrep and trivy templates; in Jenkins, isolate scanners in parallel stages and cache vuln DBs (mount /root/.cache/trivy).
Gating on Risk: Policy as Code and Expiring Waivers
Block the stuff that matters, everywhere else warn and create a ticket.
- Fail PRs on new findings with
severity >= high(CVSS ≥7), or any secret detected. - Soft-fail on existing known issues; track burn-down.
- Time-bound waivers: owner, reason, expiry (e.g., 30 days). Store in repo under
security_waivers.yamland validate with OPA. - CODEOWNERS must approve any waiver touching their area.
Example conftest Rego to enforce waivers and severities:
package ci.policy
import future.keywords.if
violation[msg] {
some f in input.findings
f.new == true
f.severity == "CRITICAL"
msg := sprintf("New critical finding: %s", [f.id])
}
# Allow waivers if not expired
violation[msg] {
some f in input.findings
f.severity == "HIGH"
f.new == true
not waived(f.id)
msg := sprintf("New high finding without waiver: %s", [f.id])
}
waived(id) if {
some w in input.waivers
w.id == id
time.now_ns() < time.parse_rfc3339_ns(w.expires)
}Gate in CI:
conftest test results.json --policy policy/ --output tableWhere results.json is normalized Trivy/Semgrep output plus a waivers array. Keep it boring and deterministic.
Performance Tricks So Devs Don’t Revolt
- Cache vuln DBs:
- Trivy: mount cache
~/.cache/trivy; pre-warm in a job that runs daily.
- Trivy: mount cache
- Parallelize: run SAST, secrets, and IaC in separate jobs.
- Scope scans: only changed paths on PRs (
git diff --name-only $BASE_SHA). - Baseline comparisons: Semgrep
baselineRef, Trivy SBOM on current vsorigin/main. - Pin versions and base images: reduce churn from upstream CVE noise.
- Move heavy scans off PR path: full CodeQL, ZAP active scans, fuzzing.
Keep PR path added time under 5–8 minutes. If you exceed, you’ll get shadow deploys.
Metrics That Actually Drive Behavior
Stop bragging about “10k vulnerabilities found.” Track:
- MTTP (Mean Time to Patch) criticals: target < 72h for internet-exposed services.
- New criticals per PR: goal is zero; alert if >0.
- False-positive rate: < 10% or engineers stop trusting the tools.
- Exception debt: count of active waivers and their median age; SLO: < 30 days.
- Pipeline added time: p95 added latency per PR; SLO: < 8 minutes.
How to implement quickly:
- Export SARIF to GitHub Security; use the API to compute deltas.
- Push a summary to Prometheus via
pushgateway:
cat <<EOF | curl -s --data-binary @- http://pushgateway:9091/metrics/job/ci_security/instance/${GITHUB_RUN_ID}
ci_new_criticals ${NEW_CRITICALS}
ci_pipeline_added_seconds ${ADDED_SECONDS}
EOF- Grafana dashboard with panels: “New Criticals per PR (7d)”, “MTTP by repo”, “Waivers expiring soon”.
Rollout Plan That Won’t Trigger Mutiny
- Week 1–2 (Audit mode): add scanners with
soft-fail; publish dashboards; triage top 20 issues. - Week 3 (Gating on new highs): fail PRs on new
HIGH/CRITICAL+ any secrets; enforce IaC on changed files. - Week 4–5 (Expand): add container image gates and cosign verification; require SBOM artifact.
- Week 6 (DAST nightly): ticket creation wired to Jira/Linear; tag
securityand service team. - Ongoing: monthly rule tuning; retire stale waivers; rotate base images.
Have a break-glass path (SECURITY_BYPASS env var) that requires approval + auto-created incident. Use sparingly; audit it.
Don’t Ignore Supply Chain: Signing and Admission
- Sign every image:
COSIGN_EXPERIMENTAL=1 cosign sign --key $COSIGN_KEY $IMAGE
cosign verify --key $COSIGN_PUB $IMAGE- Verify at cluster admission with
policy-controlleror Kyverno: only allow signed images from trusted registries and digests, not tags. - Emit SBOMs with images and store them; scan periodically as CVE data updates.
- Run OpenSSF Scorecards weekly on repos; fix high-risk findings (unpinned actions, no branch protection).
Results You Can Expect (And the Gotchas)
What we’ve seen after 60–90 days:
- 90% reduction in new criticals making it to
main. - MTTP criticals drop from weeks to <72h.
- Pipeline time increase of 4–7 minutes p95 on PRs.
Common pitfalls:
- Turning on every rulepack on day one → revolt. Start lean.
- Ignoring exceptions hygiene → waivers become permanent debt.
- No owner for remediation → findings rot. Use CODEOWNERS + auto-assign.
- AI-generated “vibe coding” sneaks in insecure patterns. Scanners catch them; we’ve done plenty of vibe code cleanup and AI code refactoring after incidents—cheaper to prevent.
If you want templates tuned for your stack (Go monorepo on Bazel? Polyglot Node/Java/Python with GitLab? Jenkins + ArgoCD? Been there.), GitPlumbers can help you wire this up without killing velocity.
Key takeaways
- Place scanners at the right checkpoints: pre-commit/PR, build, image, IaC, SBOM, DAST, and deploy gates.
- Gate on risk, not volume: fail on new findings with CVSS ≥7, warn otherwise; escalate over time.
- Use open-source first (Semgrep, Trivy, Checkov, Gitleaks, Syft) and add commercial where it makes sense.
- Cache and parallelize to keep added time under 5–8 minutes per pipeline; run heavy scans nightly.
- Track MTTP (mean time to patch), new-critical-per-PR, and false-positive rate—not vanity counts.
- Create a time-bound exception workflow with CODEOWNERS + OPA policies so waivers don’t become forever debt.
- Sign and verify artifacts with cosign; ship an SBOM and scan it at each stage.
Implementation checklist
- Define severity thresholds and gating rules (CVSS ≥7 blocks merges).
- Add PR-time SAST and secrets scanning; soft-fail for two weeks while you fix backlog.
- Generate and scan SBOMs (Syft + Trivy) in build; fail on new criticals only.
- Scan containers and base images; pin digests and sign with cosign.
- Add IaC checks (Checkov/tfsec) and enforce on new violations.
- Run ZAP baseline DAST nightly against staging; open tickets automatically.
- Instrument metrics: MTTP, new criticals per PR, false-positive rate, pipeline added time.
- Stand up exception workflow with expiring waivers and owner approvals.
Questions we hear from teams
- How do I keep false positives from tanking developer trust?
- Start with lean rulepacks (e.g., Semgrep `p/ci`), run in soft-fail for 1–2 weeks, and tune. Track false-positive rate and drop rules over 10%. Baseline against `main` so you only gate on new issues.
- Is CodeQL worth the runtime hit?
- For critical repos, yes. Run lightweight SAST (Semgrep) on PRs and run full CodeQL on a nightly or on `main` merges. Use language packs only where they matter—no need to scan Ruby in a Go service.
- Open-source or commercial scanners?
- Start open-source (Semgrep, Trivy, Checkov, Gitleaks, Syft). Add commercial (Snyk, Sonar, Prisma) when you need enterprise policies, ticketing depth, or specific compliance reporting.
- What about monorepos?
- Scope scans to changed paths and use a matrix per language. Generate per-directory SBOMs to avoid 100MB artifacts. Cache per-tool per-language to keep runtime down.
- How do I enforce signed images in Kubernetes?
- Install Sigstore’s policy-controller or Kyverno. Write a policy that only admits images signed by your key and pinned by digest. Verify `cosign` signatures and reject tag-only images.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
