How much capacity should we allocate to innovation?

Start with 10% and prove you can keep SLOs green. Add planned 70/30 spikes for specific bets. Don’t exceed 15% baseline without executive commitment and evidence that delivery is stable.

What if urgent delivery needs keep stealing the time?

Make exploration a first-class capacity line. If a week gets cannibalized, repay it within the quarter. Track stolen time as debt and report it in the Portfolio Review.

How do we prevent experiments from leaking into prod?

Require flags by default, canaries for rollouts, namespace fences for infra, and CI checks that enforce labels and guardrails. Give each exploration a kill-switch owner.

What metrics convince finance and product this is working?

Lead time to first signal, kill rate, option value (bets promoted), zero SLO regressions, and cost per learning. Show 90-day trends, not anecdotes.

Where does AI exploration fit?

Exactly the same. Flagged endpoints, budgeted inference, red-team evals before exposure, and an explicit data policy in the RFC. The guardrails matter even more with LLMs.

Culture · Oct 6, 2025 · 9 minute read

Stop Saying “20% Time”: A Real Playbook for Innovation Without Blowing Your Roadmap

Innovation time that executives won’t cut, teams won’t resent, and customers won’t feel.

Alex Ramirez

Partner, GitPlumbers

20 years shipping and rescuing software at scale (AdTech, FinTech, and SaaS). Ex-Stripe platform, ex-Atlassian DevProd. Now helping teams balance speed and safety without the theater.

Innovation that survives QBRs looks boring on the calendar and exciting in the metrics.

Back to all posts

The problem you’ve lived through

I watched a Fortune 100 try to copy Google’s “20% time.” Within a quarter, on-call pages doubled, SLOs drifted red, and the CFO’s office started sniffing around infra bills. Innovation didn’t fail—governance did. Unlabeled experiments leaked into prod, feature flags were an afterthought, and “learning” was a slide, not a metric.

At GitPlumbers, we’ve rehabbed this a dozen times. The pattern that works: treat exploration like product work. Capacity is explicit, WIP is limited, learning is measured, and blast radius is fenced. Here’s the playbook I’d put my name on.

Capacity that doesn’t lie: 90/10 baseline, scheduled 70/30 spikes

Forget blanket “20% time.” In most enterprises with on-call, vendor dependencies, and quarterly commitments, the math doesn’t close.

Baseline: Start at 90/10 (delivery/exploration). Publish it, codify it in planning, and keep it stable for two quarters.
Spikes: Pre-plan 70/30 sprints aligned to events (e.g., before re-arch OKRs, new product bets). Never ad-hoc.
WIP limit: Max 2 innovation items per team. If you start a third, you kill or finish one.
Time banking: Track exploration capacity like story points. If a sprint burns only 6% on exploration, the remaining 4% rolls forward—until the next review.
Rotations: Assign a rotating “Explorer” per squad per sprint. That engineer protects context and unlocks blockers.

I’ve seen this keep a fintech’s MTTR stable while shipping three search prototypes in a quarter. They killed two by week 4 and scaled the third under a flag.

Rituals that make it real

Rituals remove ambiguity and stop “innovation theater.” Keep these light but relentless:

Weekly 30-min Demo+Decision
- 10-min demos, 5-min decision: continue, pivot, or kill. No “carryover” without an explicit next learning goal.
- Required attendees: EM, PM, tech lead, ops/SRE. Optional: finance partner once a month.
Biweekly Portfolio Review
- Review exploration WIP across teams. Rebalance capacity, identify dependencies, confirm kill candidates.
Exploration Standup (15 min, twice weekly)
- Only active explorers and leads. Focus: blockers, data needed, risk.
RFCs with a kill-switch owner
- Every exploration has an RFC with a named owner who can kill without committee.

Here’s the RFC template we use (paste into docs/rfcs/xxxx-initiative.md):

# RFC: {Name}

- Owner: {Engineer}
- Sponsor: {EM/Director}
- Capacity: {N% for {sprints}}
- Hypothesis: {What do we believe?}
- First Signal: {What’s the earliest measurable sign it’s worth more time?}
- Guardrails: {Flags, canaries, sandbox, data policies}
- Kill Criteria: {If not X by date Y, we stop}
- Dependencies: {Teams/systems}
- Metrics: {Lead time to first signal, SLO impact, cost}

Guardrails in code and infra: limit blast radius by default

If your “experiment” changes prod behavior without a kill switch, it’s not exploration—it’s a forked release train.

Feature flags default off (Unleash, LaunchDarkly, Flagsmith). Canary to 1–5% cohorts.
Namespace fences with GitOps so experiments can’t touch the wrong clusters.
Sandbox budgets so costs can’t run away.
Auto-labeling and checks in CI to enforce the above.

Examples you can rip today:

Unleash flag for a small-cohort rollout:

{
  "name": "exploration.warm-cache",
  "type": "release",
  "enabled": false,
  "strategies": [
    {
      "name": "gradualRolloutUserId",
      "parameters": { "percentage": 5, "groupId": "explore" }
    }
  ],
  "tags": [{ "type": "simple", "value": "exploration" }]
}

ArgoCD AppProject to fence experiments to sandbox-* namespaces only:

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: exploration
spec:
  destinations:
    - server: https://kubernetes.default.svc
      namespace: sandbox-*
  clusterResourceWhitelist: []
  namespaceResourceWhitelist:
    - group: "*"
      kind: "*"

Terraform budget for AWS sandboxes so finance doesn’t panic:

resource "aws_budgets_budget" "sandbox" {
  name         = "sandbox-monthly"
  budget_type  = "COST"
  limit_amount = "1000"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  cost_filters = { TagKeyValue = ["env$sandbox"] }

  notification {
    comparison_operator = "GREATER_THAN"
    threshold           = 80
    threshold_type      = "PERCENTAGE"
    notification_type   = "FORECASTED"
    subscriber_email_addresses = ["finops@example.com","platform@example.com"]
  }
}

GitHub Actions check to enforce innovation label on PRs touching exploration/:

name: enforce-innovation-label
on: [pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/github-script@v7
        with:
          script: |
            const files = await github.paginate(github.rest.pulls.listFiles, {
              owner: context.repo.owner,
              repo: context.repo.repo,
              pull_number: context.payload.pull_request.number,
            });
            const touchesExploration = files.some(f => f.filename.startsWith('exploration/'));
            const labels = context.payload.pull_request.labels.map(l => l.name);
            if (touchesExploration && !labels.includes('innovation')) {
              core.setFailed('PR touches exploration code but lacks the innovation label');
            }

Leadership behaviors that make or break it

I’ve seen VPs declare “innovation is everyone’s job” and then pack calendars with status meetings. Don’t do that. Do this:

Protect the calendar: Guard the 10% on team calendars. If a launch crunch steals a week, repay it within the quarter.
Tie to business outcomes: Every exploration maps to a customer pain, cost driver, or SLO. “Cool tech” isn’t a business case.
Kill loudly, celebrate kills: A 50% kill rate is healthy. Praise the team that kills early and writes the learnings.
Make it visible: Portfolio board, shared metrics, short Loom demos posted in a #exploration-demos channel.
Credit fairly: Explorers shouldn’t be punished in perf because they shipped less production code. Evaluate on learning velocity and impact.
No “shadow bets”: Single leaders can’t spawn infinite side projects. Everything goes through the same capacity gate.

If you can’t point to the calendar block, the RFC, and the guardrails, it isn’t sanctioned exploration—it’s scope creep.

Metrics that matter (and don’t)

Measure learning and blast radius, not vanity output.

% Capacity Used (vs. planned): Target 80–110% of planned exploration capacity per quarter. Swinging wildly = poor planning.
Lead Time to First Signal: Time from start to first measurable result (e.g., canary conversion delta, latency improvement). Shorter is better.
Kill Rate: Healthy is 40–60%. Below 30%? You’re not taking risks. Above 70%? Bad idea funnel or bad enablement.
Option Value: Number of explorations promoted to roadmap items with validated impact.
SLO Impact: Zero SLO regressions attributable to exploration in prod. If non-zero, pause and fix guardrails.
Cost per Learning: Cloud spend + labor per exploration until first signal.

Turn it into queries your PMO and SREs can read:

Jira filter for active exploration items:

project = CORE AND labels in (innovation, exploration) AND statusCategory != Done ORDER BY priority DESC

Simple SQL for GitHub PRs labeled innovation merged behind flags:

SELECT DATE_TRUNC('week', merged_at) AS week,
       COUNT(*) FILTER (WHERE labels ? 'innovation') AS innovation_prs,
       COUNT(*) FILTER (WHERE title ILIKE '%flag%' OR body ILIKE '%flag%') AS flagged_prs
FROM prs
WHERE merged_at >= NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY 1;

A 30-60-90 day rollout that survives finance and security

Days 1–7: Publish the model
- Announce 90/10 baseline and WIP=2. Create labels innovation, rfc, kill-candidate in Jira/GitHub.
- Stand up the weekly Demo+Decision and biweekly Portfolio Review.
Days 8–14: Install guardrails
- Turn on feature flags (Unleash/LaunchDarkly). Default off.
- Create exploration ArgoCD AppProject and sandbox budgets via Terraform.
Days 15–30: Seed pilots
- 1–2 explorations per team. Require RFCs with kill criteria.
- Add the GitHub Action to enforce labeling.
Days 31–60: Prove the loop
- Track capacity burn, first signals, and kill rate. Keep SLOs green.
- Demo+Decision every week. Kill ruthlessly.
Days 61–90: Institutionalize
- Adjust baseline if needed (don’t exceed 15% without exec signoff).
- Add exploration metrics to quarterly business review.

This is the cadence we used at a public SaaS when we piloted an LLM-based triage bot. Two spikes, one shipped. MTTR dropped 14% in 60 days, and infra stayed flat because we fenced inference behind a flag and budgets.

Common failure modes (and how we fix them)

Time stealth taxes: Innovation time gets eaten by meetings. Fix: calendar holds + EM audits + leadership reminders in staff meeting.
Zombie prototypes: Nobody kills them, they slowly rot. Fix: kill criteria in RFC, owner empowered to kill, Portfolio Review highlights kill-candidates.
Security surprises: PII in sandboxes. Fix: data classification in RFC, masked datasets only, namespace policies enforced by ArgoCD.
Finance freakouts: Spiky spend during spikes. Fix: forecast spikes, apply Terraform budgets and alerts, invite FinOps monthly.
Flag sprawl: Experiments leave flags forever. Fix: innovation flags must have an expiry date and a Jira cleanup ticket at creation.

If you’ve already botched it and trust is low, bring in a neutral party to reset the system. Yes, that’s a GitPlumbers plug, because we’ve unwound this mess more times than I care to count.

Related Resources

Key takeaways

Treat innovation like product work: time-boxed, visible, and governed with WIP limits.
Use a 90/10 baseline with scheduled 70/30 spikes—don’t let “20% time” become 0% or 50%.
Make learning the unit of value: measure lead time to first signal, kill rate, and option value—not vanity prototypes.
Rituals matter: weekly Demo+Decision, biweekly portfolio review, RFCs with a kill-switch owner.
Guardrail the blast radius: feature flags default off, canaries, sandbox budgets, and namespace fences.
Leaders must protect the calendar, tie to business outcomes, and celebrate kills as much as ships.

Implementation checklist

Pick a baseline capacity (start with 90/10), publish it, and stick to it for 2 quarters.
Limit innovation WIP to 2 per team; require a named kill-switch owner per item.
Set up weekly 30-min Demo+Decision and biweekly Portfolio Review.
Instrument with labels: `innovation`, `spike`, `rfc`, `kill-candidate` across Jira/GitHub.
Enforce flags/canaries for all innovation code; default disabled behind `Unleash`/`LaunchDarkly`.
Create sandbox budgets and namespace fences for experiments.
Track metrics: % capacity used, lead time to first signal, kill rate, SLO impact, cost per learning.

Questions we hear from teams

How much capacity should we allocate to innovation?: Start with 10% and prove you can keep SLOs green. Add planned 70/30 spikes for specific bets. Don’t exceed 15% baseline without executive commitment and evidence that delivery is stable.
What if urgent delivery needs keep stealing the time?: Make exploration a first-class capacity line. If a week gets cannibalized, repay it within the quarter. Track stolen time as debt and report it in the Portfolio Review.
How do we prevent experiments from leaking into prod?: Require flags by default, canaries for rollouts, namespace fences for infra, and CI checks that enforce labels and guardrails. Give each exploration a kill-switch owner.
What metrics convince finance and product this is working?: Lead time to first signal, kill rate, option value (bets promoted), zero SLO regressions, and cost per learning. Show 90-day trends, not anecdotes.
Where does AI exploration fit?: Exactly the same. Flagged endpoints, budgeted inference, red-team evals before exposure, and an explicit data policy in the RFC. The guardrails matter even more with LLMs.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Talk to an engineer about making innovation safe See how we structure rescue engagements