Which BI tool should we pick for self‑service?

Pick the one that best fits your metrics layer and security model. Superset and Metabase are great for open tooling and GitOps; Looker is strong if you’ll commit to its semantic model. The tool is secondary to contracts, tests, and a metrics layer. Thin BI on top of governed models beats feature‑rich BI on raw data every time.

dbt tests or Great Expectations?

Both. Use dbt tests for structural guarantees (nulls, uniques, referential integrity). Use Great Expectations for richer distribution and field‑level checks. Run both in CI and fail the pipeline on violations.

How do we measure if self‑service is delivering business value?

Track adoption (MAU on golden dashboards), time‑to‑answer for key questions, MTTR for incidents, and SLO compliance on tier‑1 datasets. Tie dashboards to OKRs—if a dashboard doesn’t support a decision, it’s a candidate for deprecation.

We have a lot of LLM‑generated SQL. Safe to use?

Treat it as scaffolding. Run a vibe code cleanup: refactor into dbt models, add tests, and codify metrics. We routinely see 20–40% performance and reliability gains by replacing AI‑generated ad‑hoc SQL with modeled transformations.

Do we need data catalog and lineage from day one?

Turn on lineage early (OpenLineage + Marquez/DataHub). A catalog (DataHub/OpenMetadata) becomes important as you scale past ~50 models or multiple teams. Lineage is crucial for cutting MTTR during incidents and for safe deprecations.

Data-engineering · Nov 24, 2025 · 10 minute read

Self‑Service Analytics Without the Monday Morning Pager: Building a Data Viz Platform That Actually Holds Up

Dashboards people trust, without a hero culture. The blueprint we use to ship self‑service analytics that don’t burn the team.

Alex Kaye

Principal Consultant, Data Platforms at GitPlumbers

20 years building and rescuing data stacks from Fortune 100s to scrappy unicorns. Ex‑Google Cloud, ex‑Databricks partner lead. I’ve broken enough pipelines to know how to keep yours boring and reliable.

“Stop treating dashboards as art projects; treat them like products with SLOs.”

Back to all posts

The Scene You’ve Lived Through

Two quarters into your “self‑service” push, Finance has three versions of revenue, Product can’t find churn, and the Monday AM Looker dashboard 500s because a downstream model changed a column from INT to STRING. I’ve seen this movie at a Series C fintech on BigQuery and at a Fortune 100 on Snowflake. Same plot: heroic analysts, vibe dashboards, and a data team playing whack‑a‑mole.

Here’s the version that doesn’t page you: treat dashboards as products, data as APIs with contracts, and your platform as code. No silver bullets—just the boring, repeatable stuff that works.

The Platform Pattern That Doesn’t Flake

Self‑service analytics that stick share the same spine:

Contracts at the edges: Producers publish schemas with SLAs; consumers rely on stable shapes. Use data contracts and CDC (e.g., Debezium + Kafka) into Delta Lake/Iceberg.
Transformations as code: dbt models with tests; versioned in Git; deployed via ArgoCD/GitOps or CI.
Quality gates: Great Expectations + dbt tests as blockers, not FYIs.
Observability and lineage: OpenLineage + Marquez or DataHub; pipeline and dataset metrics to Prometheus and Grafana.
Metrics/semantic layer: Fix definitions in code, not in slide decks (dbt metrics, Lightdash, or Looker’s semantic model).
Thin BI: Tools like Apache Superset, Metabase, or Looker consuming governed models—no rogue SQL against raw tables.
Security at the warehouse: Row‑level and column masking in Snowflake/BigQuery/Databricks, not per‑dashboard.

You can swap vendors, but the contract→quality→metrics→BI flow is non‑negotiable if you want reliability.

Make It Measurable: SLOs for Data Products

If you don’t measure reliability, “self‑service” will regress to “ping the data team.” Define 3-4 SLOs and wire alerts.

Freshness SLO: e.g., Orders model updated within 15 minutes, 95% of the time.
Completeness SLO: <1% missing critical fields per day.
Accuracy SLO: Reconciles to source within 0.5% daily.
Timeliness SLO: Key dashboards render in <5s p95.

Expose metrics from your jobs and datasets to Prometheus. Don’t be fancy at first—export “last load timestamp” and “row count” labels, then write an alert. Example alert rule:

# prometheus/alerts/data-freshness.yaml
groups:
  - name: data-freshness
    rules:
      - alert: OrdersModelStale
        expr: (time() - dataset_last_load_timestamp_seconds{dataset="orders_model"}) > 900
        for: 10m
        labels:
          severity: page
        annotations:
          summary: "Orders model freshness SLO violation"
          description: "Orders model has not updated in >15m. Check Airflow DAG and upstream CDC."

Keep SLOs visible in Grafana, next to run logs and lineage. Your on‑call will thank you.

Quality Gates That Actually Block Bad Data

I’ve lost count of teams that “monitor quality” but ship broken dashboards because tests don’t fail the build. Fix that.

Write dbt schema tests for shape and known constraints.
Use Great Expectations for richer field‑level expectations and distribution checks.
Fail fast: wire both into CI so bad data never reaches BI.

Example dbt tests:

# models/orders.yml
version: 2
models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      - name: order_total
        tests:
          - not_null
          - accepted_values:
              values: 
                - ">=0"
      - name: order_status
        tests:
          - accepted_values:
              values: ["pending", "paid", "shipped", "canceled"]

A simple Great Expectations suite:

{
  "dataset_name": "orders_model",
  "expectations": [
    {"expectation_type": "expect_column_values_to_not_be_null", "kwargs": {"column": "order_id"}},
    {"expectation_type": "expect_column_values_to_be_between", "kwargs": {"column": "order_total", "min_value": 0}},
    {"expectation_type": "expect_column_values_to_be_in_set", "kwargs": {"column": "order_status", "value_set": ["pending","paid","shipped","canceled"]}}
  ]
}

And wire it into a pipeline that fails on violations. Airflow example with OpenLineage:

# dags/orders_pipeline.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
import os

os.environ["OPENLINEAGE_URL"] = "http://marquez:5000"
os.environ["OPENLINEAGE_NAMESPACE"] = "analytics"

with DAG(
    dag_id="orders_pipeline",
    start_date=datetime(2025, 1, 1),
    schedule_interval="*/15 * * * *",
    catchup=False,
) as dag:
    dbt_run = BashOperator(
        task_id="dbt_run",
        bash_command="dbt build --project-dir /opt/dbt --profiles-dir /opt/dbt",
    )

    ge_validate = BashOperator(
        task_id="ge_validate",
        bash_command="great_expectations checkpoint run orders_checkpoint",
    )

    dbt_run >> ge_validate

If ge_validate fails, nothing moves downstream. That’s the point.

Stop Debating “Revenue” in Slack: Freeze It in a Metrics Layer

I’ve seen entire quarters lost to semantic drift. A sane metrics layer stops it.

Pick your layer: dbt metrics (with MetricFlow), Lightdash, or Looker’s semantic model.
Put metrics in Git: names, grain, dimensions, filters—reviewed in PRs.
Expose as APIs: BI tools query the layer, not raw tables.

A simple dbt metric:

# models/metrics.yml
metrics:
  - name: revenue
    label: Revenue
    model: ref('orders')
    calculation_method: sum
    expression: order_total
    timestamp: order_date
    time_grains: [day, week, month]
    dimensions: [country, channel]
    filters:
      - field: order_status
        operator: is
        value: paid

Pair this with row‑level security at the warehouse. Example in Snowflake:

-- Restrict to a user's region via session tag
CREATE OR REPLACE ROW ACCESS POLICY region_rls AS (region STRING) RETURNS BOOLEAN ->
  CURRENT_ROLE() IN ('ANALYST_GLOBAL') OR region = CURRENT_ACCOUNT();

ALTER TABLE analytics.orders ADD ROW ACCESS POLICY region_rls ON (region);

Now Finance, Sales, and Product all pull “Revenue” with the same filter semantics and security guarantees.

GitOps the Whole Stack: Version, Review, Deploy

The day we put Superset dashboards, dbt models, GE suites, and alert rules under Git with review gates, the 6AM pages stopped.

Version everything: SQL, metrics, dashboard JSON, alert rules, even Superset database config.
Deploy with ArgoCD: Argo watches the repo; changes flow to staging then prod with approvals.
Canary datasets: materialize orders_canary off a subset and run dashboards against it in parallel before flipping.

Example ArgoCD Application for Superset config:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: superset
spec:
  project: default
  source:
    repoURL: 'https://github.com/yourorg/analytics-infra'
    path: k8s/superset
    targetRevision: main
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: analytics
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Pair it with SRE basics: canary deployment of transformations, circuit breakers on flaky sources, and a rollback playbook you can run half‑asleep.

What “Good” Looks Like in 90 Days

This is the rollout we run at GitPlumbers when we’re called to rescue a “self‑service” effort that’s bleeding trust.

Days 0‑30: Contracts and Gates
- Map the top 10 datasets that drive the business (ARR, orders, churn).
- Define SLOs and publish them in Grafana.
- Add dbt tests + GE suites to those datasets; fail the build on violations.
- Turn on lineage (OpenLineage + Marquez/DataHub).
Days 31‑60: Metrics Layer and GitOps
- Stand up metrics in dbt/Lightdash/Looker; codify 5 core metrics with PR review.
- GitOps the BI tool and pipeline configs with ArgoCD.
- Implement RLS and masking in the warehouse; remove BI‑tool‑level hacks.
Days 61‑90: Adoption and Cleanup
- Publish 10 “golden” dashboards; deprecate duplicates.
- Office hours, docs in the catalog (OpenMetadata/DataHub), and dashboard ownership.
- Instrument p95 render time; add alerts on slow queries.
- Tackle vibe code cleanup—replace LLM‑generated spaghetti SQL with tested models.

KPIs we track:

MTTR for broken dashboards: target <60 minutes.
SLO compliance: >95% freshness, >99% completeness on tier‑1 datasets.
Adoption: +30% MAU on golden dashboards, -50% zombie dashboard views.
Time‑to‑answer for core questions (e.g., “yesterday’s revenue by channel”): <10s.

Receipts: A Real Outcome, Not Hype

At a consumer fintech (Snowflake + dbt + Superset), we:

Reduced broken-dashboard incidents by 68% in 8 weeks by failing builds on dbt/GE tests.
Cut MTTR from ~9h to 45m with OpenLineage + Prometheus run metrics in Grafana.
Collapsed 14 revenue definitions into 1 metric in Git; Finance stopped arguing after two sprints.
Improved dashboard p95 render time from 11.2s to 3.8s by materializing aggregates + warehouse tuning (virtual warehouse auto‑suspend 60s, result cache on).

No heroics. Just good engineering and a platform that enforces reality over vibes.

What I’d Do Differently (and What to Do Tomorrow)

Don’t start with the BI tool. Start with contracts, quality, and semantics.
Keep the first SLOs blunt and achievable. Fancy comes later.
Treat LLM‑generated “starter” SQL as scaffolding, not production. Do a vibe code cleanup pass.
Make product owners own their metrics. Engineering can’t arbitrate “ARR” forever.

Tomorrow morning:

Pick one dataset and one dashboard. Add tests, wire a freshness alert, define a metric. Ship the change via Git.
In two weeks, measure MTTR and adoption. If they’re not moving, call us. We’ve pulled a lot of teams out of this ditch.

Related Resources

Key takeaways

Self‑service works only when data products have explicit contracts and SLOs for freshness, completeness, and accuracy.
Quality gates belong in the pipeline, not in a PM’s calendar—use dbt tests and Great Expectations to block bad data.
A metrics layer (dbt metrics, Lightdash, or Looker’s semantic layer) prevents “N different definitions of revenue.”
GitOps your analytics: version dashboards, tests, and alerts; deploy with ArgoCD; observe with Prometheus and lineage tooling.
Track business value: adoption, time‑to‑insight, and MTTR. Kill zombie dashboards and celebrate the ones that move the needle.

Implementation checklist

Define 3-4 data SLOs (freshness, completeness, accuracy, timeliness) and wire Prometheus alerts.
Add dbt schema tests + Great Expectations suites; fail the pipeline on contract violations.
Stand up a metrics layer and freeze business definitions in code.
Version dashboards and semantic configs; deploy via GitOps (ArgoCD).
Implement row‑level security and PII policies at the warehouse, not in the BI tool.
Instrument lineage (OpenLineage + Marquez/DataHub) to cut MTTR on incidents.
Run a 90‑day adoption program: office hours, golden datasets, and dashboard deprecation.

Questions we hear from teams

Which BI tool should we pick for self‑service?: Pick the one that best fits your metrics layer and security model. Superset and Metabase are great for open tooling and GitOps; Looker is strong if you’ll commit to its semantic model. The tool is secondary to contracts, tests, and a metrics layer. Thin BI on top of governed models beats feature‑rich BI on raw data every time.
dbt tests or Great Expectations?: Both. Use dbt tests for structural guarantees (nulls, uniques, referential integrity). Use Great Expectations for richer distribution and field‑level checks. Run both in CI and fail the pipeline on violations.
How do we measure if self‑service is delivering business value?: Track adoption (MAU on golden dashboards), time‑to‑answer for key questions, MTTR for incidents, and SLO compliance on tier‑1 datasets. Tie dashboards to OKRs—if a dashboard doesn’t support a decision, it’s a candidate for deprecation.
We have a lot of LLM‑generated SQL. Safe to use?: Treat it as scaffolding. Run a vibe code cleanup: refactor into dbt models, add tests, and codify metrics. We routinely see 20–40% performance and reliability gains by replacing AI‑generated ad‑hoc SQL with modeled transformations.
Do we need data catalog and lineage from day one?: Turn on lineage early (OpenLineage + Marquez/DataHub). A catalog (DataHub/OpenMetadata) becomes important as you scale past ~50 models or multiple teams. Lineage is crucial for cutting MTTR during incidents and for safe deprecations.

Ready to modernize your codebase?

Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.

Get a 30‑minute self‑service analytics diagnostic See how we fix failing data platforms