Do we need OPA if we’re all-in on AWS?

Not strictly. If you’re deep on AWS, Cedar with Verified Permissions plus SCPs/permissions boundaries can cover a lot. We still use OPA for cross-cloud and for checking Terraform/Kubernetes because it’s provider-agnostic and fits CI nicely.

How do we balance least-privilege with developer autonomy?

ABAC + JIT. Use attributes like team, env, and data_classification for coarse access, and grant time-bound elevation via PIM for sensitive actions. Put strong guardrails around the edges so teams can self-serve safely.

What’s the fastest path off long-lived keys?

Turn on OIDC federation for CI/CD first (GitHub → AWS/GCP/Azure). For humans, move to short-lived sessions via SSO and enforce via guardrails. Then rotate and delete remaining keys with a deadline and a deny policy after grace.

How do we prove compliance without a GRC tool?

Emit decision logs from policy engines, retain CloudTrail/Audit Logs with object lock, and generate periodic conformance reports. We bundle these with IaC provenance into an exportable evidence pack. Most auditors accept this if it’s consistent and complete.

What about break-glass?

Keep a minimal, MFA-enforced emergency role with session recording and tight monitoring. Practice quarterly. Every use opens an incident, captures context, and is reviewed in a blameless postmortem.

Security-compliance · Oct 23, 2025 · 10 minute read

The IAM Architecture That Won’t Collapse Under Real-World Complexity

Translating policy into guardrails, checks, and automated proofs—without grinding delivery to a halt.

Alex Porter

Principal Engineer, GitPlumbers

20 years shipping and fixing distributed systems. Ex-consultant to Fortune 500 and high-growth startups; led IAM and platform programs through PCI, SOX, and HIPAA audits. Works with teams to turn security from roadblock into paved road.

Auditors don’t care about your intent; they care about repeatable controls and evidence.

Back to all posts

The scene you’ve lived: identity sprawl meets audit season

Two weeks before a PCI audit, a fintech we worked with had four IdPs (Okta, Azure AD, Google, and a rogue Keycloak), three generations of AWS accounts, and engineers sharing read-only prod creds because the access request system took 5 days. Classic. I’ve seen this fail repeatedly: policy docs in Confluence, IAM in Terraform (mostly), ad-hoc break-glass, and zero automated proof of anything. Auditors don’t care about your intent; they care about repeatable controls and evidence.

Here’s what actually works in complex orgs with M&A baggage and regulated data: treat IAM as product. Model the org and data, codify policy as guardrails and checks, generate proofs by default, and keep engineers moving with JIT access and GitOps workflows.

What good looks like (and how you know you’re getting there)

Single source of identity truth: People and groups live in Okta/Azure AD; service/workload identities live in cloud-native systems (AWS IAM/GCP Workload Identity/Azure Managed Identities) or SPIFFE/SPIRE.
Federated authN everywhere: OIDC/SAML to SaaS; OIDC to cloud from CI; no long-lived keys.
Policy as code: OPA/Rego or Cedar rules in version control; CI blocks drift; prod proves compliance with logs/attestations.
Guardrails over gates: Org-level deny lists and permissions boundaries prevent high-risk moves; app teams self-serve inside the lanes.
JIT, time-bound access: PIM or custom workflows grant temporary, MFA-gated elevation; approvals logged to a system of record.
Evidence on tap: Decision logs, CloudTrail/Audit Logs, and IaC provenance stitched into an audit bundle. Audits become exports, not archaeology.

KPIs that matter:

Lead time for access requests: target minutes, not days.
Percent of identities with least-privilege verified by policy checks: >95%.
Zero long-lived human access keys; zero shared accounts.
Audit evidence export time: under 1 hour.

Model identities, trust, and data before buying tools

Skip this and you’ll pave cow paths. Do it once, keep it current.

Inventory identities
- Humans: employees, contractors, auditors. Source: Okta or Azure AD with SCIM to downstream apps.
- Services: AWS IAM roles, GCP SAs, Azure SPNs; Kubernetes SA + projected service account tokens; SPIFFE IDs if you run SPIRE.
- Machines/robots: CI/CD, data pipelines, ETL tools.
Map trust boundaries
- Tenants/accounts/projects; VPCs/VNets; prod vs non-prod; regulated (PHI/PCI/PII) vs general.
- Identify control planes: GitHub/GitLab, Cloud providers, Kubernetes, SaaS with admin APIs.
Classify data
- P0 PHI/PCI; P1 PII; P2 internal; P3 public. Tag resources with data_classification via IaC. Enforce tags in CI and at runtime.
Choose a model
- Start RBAC for clarity; add ABAC for scale: group + attributes like team, env, data_classification, region.
- Plan for relationship-based access (Zanzibar-style) if you have complex sharing models, but don’t start there.
Decide authority of truth
- People, groups: IdP.
- App/service entitlements: code + policy repo.
- Environment ownership: GitOps repos and cloud org structure.

Write this down in a 1-page ADR. Revisit quarterly.

Translate policy into guardrails, checks, and proofs

Policies are useless until they’re code. Use three layers:

Preventive guardrails (can’t do the wrong thing)
- AWS: Organizations SCP + IAM permissions_boundary.
- GCP: Organization Policies (e.g., restrict public IPs), IAM Conditions.
- Azure: Management Group Policy + Blueprints.
Detective checks (you did a thing; we validate)
- OPA/Rego with conftest against Terraform plans and Kubernetes manifests.
- Drift detectors: Cloud Custodian, Steampipe + mods.
Automated proofs (we can show it)
- Decision logs from OPA, CloudTrail/Audit Logs, and IaC provenance (SLSA/in-toto) stored immutably.
- Periodic evidence pack generation for audits.

Example: enforce permissions boundaries on every AWS IAM role with an OPA policy that runs in CI against the Terraform plan.

package terraform.aws.iam

# Fail any IAM role without a permissions boundary
violation[msg] {
  some i
  rc := input.resource_changes[i]
  rc.type == "aws_iam_role"
  after := rc.change.after
  not after.permissions_boundary
  msg := sprintf("Role %s missing permissions_boundary", [after.name])
}

And wire it into GitHub Actions:

name: iam-guardrails
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform plan -out=tfplan.bin
      - run: terraform show -json tfplan.bin > tfplan.json
      - uses: open-policy-agent/conftest-action@v1
        with:
          files: tfplan.json
          policy: policy/

If someone sneaks a role in without the boundary, the PR is blocked before it ever hits prod.

Keep delivery fast: federate CI, use JIT, and kill static creds

The fastest way to crater velocity is tickets for credentials. The fix is modern federation and time-bound elevation.

CI/CD to cloud via OIDC
- GitHub example: configure aws-actions/configure-aws-credentials and lock the trust policy to repo and environment.

- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/ci-deploy-prod
    aws-region: us-east-1

Trust policy on the role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"},
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"},
        "StringLike": {"token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:environment:prod"}
      }
    }
  ]
}

JIT access with PIM
- Azure AD PIM or Okta + custom workflow grants Admin for 1 hour, requires MFA, Slack approval, and tickets the request automatically.
- GCP IAM Recommender + Access Context Manager for time-bound constraints.
Ephemeral human access
- SSH: use Boundary or Teleport with short-lived certs; no static bastion keys.
- Databases: IAM auth where possible (RDS/Aurora/GCP Cloud SQL).
Secrets
- Vault or cloud-native secrets; tie leases to identities; rotate aggressively.

Net effect: engineers push buttons, approvals are quick, and every elevation is provable.

Concrete guardrails for regulated data

You don’t need a thousand policies; you need a handful of sharp ones.

Deny risky data access at the org level (AWS example with SCP + bucket policy)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyPIIOutsideVpc",
      "Effect": "Deny",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::corp-pii-*/*",
      "Condition": {
        "StringNotEqualsIfExists": {"aws:sourceVpce": ["vpce-123", "vpce-456"]}
      }
    }
  ]
}

IAM permissions boundary that forbids broad S3 actions on PII buckets:

resource "aws_iam_policy" "boundary" {
  name = "gp-permissions-boundary"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect   = "Deny",
        Action   = ["s3:*"],
        Resource = ["arn:aws:s3:::corp-pii-*", "arn:aws:s3:::corp-pii-*/*"]
      }
    ]
  })
}

resource "aws_iam_role" "svc" {
  name                 = "svc-analytics"
  assume_role_policy   = data.aws_iam_policy_document.assume.json
  permissions_boundary = aws_iam_policy.boundary.arn
}

Policy-as-code for data-classified access (OPA/Rego)

package authz.pii

default allow = false

# Only finance analysts with MFA and active JIT window can read PII datasets
allow {
  input.action == "read"
  input.resource.data_classification == "PII"
  input.subject.group == "finance-analysts"
  input.subject.mfa == true
  now := time.now_ns()
  now >= input.subject.jit_window.start
  now <= input.subject.jit_window.end
}

Prefer Cedar if you’re deep in AWS Verified Permissions; OPA/Rego if you want provider-agnostic control.

permit(
  principal in Group::"finance-analysts",
  action in Action::"read",
  resource in Dataset::"pii"
) when { context.mfa == true && context.time in principal.jit_window };

Automated proofs: evidence or it didn’t happen

Auditors ask, “Show me where you enforce and how you know it’s working.” Build evidence as a byproduct.

Decision logs
- OPA decision logs shipped to S3/GCS/Blob with object lock; indexed by your SIEM.
- Include policy_id, input.hash, decision, user, timestamp.
Change provenance
- Signed commits and build provenance (SLSA/in-toto). Artifact and IaC digests attached to change requests.
Control conformance reports
- Nightly job evaluates fleet state against Rego policies (e.g., roles with wildcards, buckets without tags). Stores results and trends.
Audit bundles
- One click exports: IaC repos + CI logs + OPA decisions + CloudTrail/Audit Logs + ticket links. We wire these in via a small Go service and a Makefile target.

If you can’t export evidence in under an hour, you don’t have repeatable controls—you have heroics.

Rollout plan that won’t blow up your quarter

Pick two paved roads: (a) CI federation with OIDC and (b) permissions boundaries. Ship them org-wide in 2 sprints.
Stand up policy repo: OPA/Rego starter pack, conftest wiring, sample tests. Block only on critical issues; warn on the rest.
JIT access pilot: Choose one high-sensitivity env. PIM, Slack approvals, 1-hour windows. Measure lead time drop.
Evidence pipeline: Enable decision logs, wire to SIEM, add weekly conformance report.
Decommission static creds: Track down long-lived keys; replace with federation. Enforce via guardrails after grace period.

What we’ve seen: 30–60 days to get from “ticket hell + audit dread” to “federated CI, boundaries, basic proofs.” Another 60–90 for org-wide JIT and evidence bundles. Velocity improves because engineers aren’t waiting on people for access.

What we’d do differently next time

Start with ABAC tagging discipline earlier; retrofitting data_classification is always painful.
Don’t try to unify every IdP on day one. Federate first, migrate later.
Train auditors by showing policy and decisions, not just screenshots. They adapt fast when the evidence is clean.
Publish SLOs: access lead time, evidence export time, and drift remediation MTTR. What gets measured gets maintained.

If you want a seasoned crew to build the paved roads and leave you owning them, that’s literally what GitPlumbers does.

Related Resources

Key takeaways

Model identities, trust, and data first—then choose tools. Avoid vendor-driven architectures.
Turn policy into code: preventive guardrails (SCPs/permissions boundaries), detective checks (OPA/Conftest), and automated proofs (decision logs, attestations).
Use ABAC for scale and JIT access via PIM to balance least-privilege with delivery speed.
Make CI/CD and infra the primary enforcement points—developers should feel velocity, not gates.
Measure what matters: lead time for access, drift in IAM, and auditability SLAs.

Implementation checklist

Inventory human, service, and third-party identities; map trust boundaries and data classifications.
Adopt an IdP as source of truth (Okta/AAD/Keycloak) and enforce SCIM for lifecycle management.
Implement cloud guardrails: AWS SCPs + permissions boundaries, GCP org policies, Azure management groups.
Codify IAM in `terraform` with OPA `conftest` checks in CI for every MR/PR.
Federate CI/CD to cloud via OIDC with tight trust policies; kill long-lived keys.
Roll out JIT access with PIM (AAD/Okta/GCP) and enforce time-bound, MFA-gated sessions.
Emit decision logs and compliance evidence automatically; store immutably and index in your SIEM.
Practice break-glass procedures quarterly; audit every use.

Questions we hear from teams

Do we need OPA if we’re all-in on AWS?: Not strictly. If you’re deep on AWS, Cedar with Verified Permissions plus SCPs/permissions boundaries can cover a lot. We still use OPA for cross-cloud and for checking Terraform/Kubernetes because it’s provider-agnostic and fits CI nicely.
How do we balance least-privilege with developer autonomy?: ABAC + JIT. Use attributes like team, env, and data_classification for coarse access, and grant time-bound elevation via PIM for sensitive actions. Put strong guardrails around the edges so teams can self-serve safely.
What’s the fastest path off long-lived keys?: Turn on OIDC federation for CI/CD first (GitHub → AWS/GCP/Azure). For humans, move to short-lived sessions via SSO and enforce via guardrails. Then rotate and delete remaining keys with a deadline and a deny policy after grace.
How do we prove compliance without a GRC tool?: Emit decision logs from policy engines, retain CloudTrail/Audit Logs with object lock, and generate periodic conformance reports. We bundle these with IaC provenance into an exportable evidence pack. Most auditors accept this if it’s consistent and complete.
What about break-glass?: Keep a minimal, MFA-enforced emergency role with session recording and tight monitoring. Practice quarterly. Every use opens an incident, captures context, and is reviewed in a blameless postmortem.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Get an IAM reality check Download the IAM Playbook