Stop Shipping Dashboards on Sand: Building a Self‑Service Analytics Platform That Won’t Wake You at 2 a.m.
You don’t get self‑service by buying a BI license. You get it by treating data reliability like SRE treats uptime—and by wiring your stack so analysts don’t need to beg a data engineer for every column.
Self‑service without SLOs is just shadow IT with prettier charts.Back to all posts
Stop shipping dashboards on sand
If you’ve ever been paged because the CFO’s dashboard is blank on QBR morning, you know the feeling. I’ve watched teams roll out Looker/Power BI/Superset, announce “self‑service,” and then drown in Slack pings: columns missing, numbers don’t match, dashboards time out, Snowflake spend doubling overnight. The problem wasn’t the BI tool. It was everything under it.
Self‑service only works when the platform makes the reliable thing the easy thing. That means contracts, tests, lineage, a semantic layer, and guardrails. GitPlumbers has rebuilt a half‑dozen of these in the last two years, cleaning up AI‑generated “vibe code” along the way. Here’s the pattern that sticks.
What goes wrong (and why your “self‑service” isn’t)
- No contracts: Upstream teams rename
user_idtocustomer_id, your ELT silently casts nulls, dashboards lie. - Dashboards before semantics: Three definitions of MRR in production; finance, product, and ops fight over who’s “right.”
- Invisible data debt: No lineage, no tests, so MTTR is a guessing game.
- Governance theater: SSO without RBAC or row‑level security; everyone queries everything; warehouse melts.
- BI as a compute engine: Users write 400‑line SQL with CROSS JOINs; your
XSwarehouse weeps.
I’ve seen these exact failures at startups and at FAANG‑adjacent orgs. The fix isn’t a new tool. It’s a small set of hard edges and a cultural shift backed by infrastructure.
The architecture that actually delivers
You don’t need every logo on a slide. You need a clean path from contract to metric to chart, with SLOs and rollback points.
- Ingestion & contracts:
Kafka+Debeziumfor CDC (or Fivetran if you must), schemas inConfluent Schema Registry(Avro/Protobuf/JSON Schema). Enforce backward compatibility in CI. - Lake/warehouse:
SnowflakeorBigQueryfor serving;Delta LakeorIcebergif you already have lake gravity. - Transformations:
dbtfor ELT with tests;Great Expectationsfor richer suites on critical facts. - Orchestration & lineage:
AirfloworDagsterwithOpenLineageenabled. Emit lineage so users can self‑debug. - Semantic/metric layer: dbt metrics,
Cube, or Looker’s model. One place to define “MRR,” “Active Users,” “Churn.” - BI & access: Looker, Superset, or Power BI wired to the semantic layer. OAuth/SSO, RBAC, and row‑level security.
- Catalog & docs:
DataHuborOpenMetadataindexing tables, lineage, owners, and glossary. - GitOps: Everything declarative via
Terraform; deployments viaArgoCDor CI runners; no click‑ops in prod.
This stack lets you treat data like a product with SLOs, owners, and error budgets—exactly how your SREs run prod.
Build it with guardrails: configs and code
Talk is cheap. Here’s what the controls look like in code.
- Provision Snowflake with governance baked in (Terraform)
# Terraform
resource "snowflake_warehouse" "bi" {
name = "BI_WH"
warehouse_size = "XSMALL"
auto_suspend = 60
auto_resume = true
statement_timeout_in_seconds = 300
}
resource "snowflake_role" "bi_analyst" { name = "BI_ANALYST" }
resource "snowflake_role" "bi_admin" { name = "BI_ADMIN" }
resource "snowflake_database" "analytics" { name = "ANALYTICS" }
resource "snowflake_schema" "mart" { database = snowflake_database.analytics.name name = "MART" }
resource "snowflake_warehouse_grant" "bi_use" {
warehouse_name = snowflake_warehouse.bi.name
privilege = "USAGE"
roles = [snowflake_role.bi_analyst.name]
}
resource "snowflake_database_grant" "db_usage" {
database_name = snowflake_database.analytics.name
privilege = "USAGE"
roles = [snowflake_role.bi_analyst.name]
}
resource "snowflake_schema_grant" "mart_select" {
database_name = snowflake_database.analytics.name
schema_name = snowflake_schema.mart.name
privilege = "SELECT"
roles = [snowflake_role.bi_analyst.name]
}
# Cost guardrail
resource "snowflake_resource_monitor" "bi_cap" {
name = "BI_CAP"
credit_quota = 200
notify_triggers = [50, 75, 90]
suspend_triggers = [95]
suspend_immediate_triggers = [100]
warehouses = [snowflake_warehouse.bi.name]
}- Lock in quality with
dbttests and metrics
# models/marts/fct_subscriptions.yml
version: 2
models:
- name: fct_subscriptions
description: Daily subscription facts
columns:
- name: user_id
tests: [not_null]
- name: day
tests: [not_null]
- name: mrr
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [user_id, day]
metrics:
- name: mrr
label: Monthly Recurring Revenue
model: ref('fct_subscriptions')
type: sum
sql: mrr
time_grains: [day, week, month]
dimensions: [plan, region]- Deeper checks with Great Expectations on critical facts
# great_expectations/expectations/fct_subscriptions_suite.json
expectation_suite_name: fct_subscriptions_suite
meta: {}
expectations:
- expect_table_row_count_to_be_between:
min_value: 100000
- expect_column_values_to_be_between:
column: mrr
min_value: 0
- expect_column_values_to_not_be_null:
column: user_id- Row‑level security that doesn’t require duct tape
-- Snowflake RLS: restrict by region
create or replace row access policy rlp_region as (region string) returns boolean ->
case
when current_role() in ('BI_ADMIN','SECURITYADMIN') then true
when current_role() = 'BI_ANALYST_US' then region = 'US'
when current_role() = 'BI_ANALYST_EU' then region = 'EU'
else false
end;
alter table ANALYTICS.MART.DIM_CUSTOMER add row access policy rlp_region on (region);Wire lineage via OpenLineage so when a model fails a test, you can see exactly which dashboards are impacted. Airflow and Dagster both have plugins that emit lineage without you rewriting DAGs.
Make reliability measurable: SLOs for data
If you don’t measure it, it will rot. Treat pipelines like services.
- Freshness SLOs: e.g., “95% of hourly models under 15 min late in any rolling 7‑day window.” Use
dbt source freshnessand a small exporter to push latency toPrometheus; alert inPagerDutywhen you breach. - Quality SLOs: “<0.1% nulls in
user_idforfct_subscriptions.” Fail the job when the budget is exhausted; roll back to the last green snapshot. - MTTR: Instrument lineage and tests so you can get MTTR for data incidents under 30 minutes. We track “time to detect” (TTD) via test failures and “time to recover” (TTR) via successful reruns.
- Error budgets for experiments: For canary tables, allow tighter budgets; automatically drop canary exposure to BI if error budget is spent.
We’ve put these on real on‑call rotations. After a month, pages drop because the platform pushes bad data out of prod by default.
Turn access into business value
Self‑service isn’t “everybody runs SQL in prod.” It’s:
- Publish certified datasets: Only
MARTschemas with owners and SLOs are BI‑visible. - Centralize metrics: The metric layer is the API. BI models reference metrics, not arbitrary SQL. When finance updates MRR logic, everyone gets it at once.
- Curate explores/datasets: Ship a minimal set of explores with dimensions users actually use. Kill or archive the rest.
- Usage telemetry: Track P95 dashboard load time (<5s), daily active explorers, and % ad‑hoc queries hitting raw tables. If P95 creeps up, scale warehouses or optimize models.
- Guard costs: Put resource monitors on warehouses; auto‑suspend; block BI roles from creating transient tables in raw.
This is where we clean up a lot of AI‑generated transform code. “Vibe coding” is great for a prototype; in prod we refactor with tests, contracts, and ownership. GitPlumbers does the vibe code cleanup and code rescue so your team ships safely.
What changed when we did this for a subscription app
At a late‑stage consumer subscription company (~5M MAU), we replaced dashboard sprawl with a contract‑to‑metric flow in 8 weeks.
- Replaced ad‑hoc SQL with
dbtmodels + metrics; Looker pointed only atMART. - Added Great Expectations on three critical facts; wired alerts to Slack and PagerDuty.
- Enforced CDC schemas with Schema Registry; blocked incompatible changes in CI.
- Cataloged everything in DataHub; owners and descriptions on every certified table.
- Terraform’d Snowflake RBAC and resource monitors; added row‑level security by region.
Results in first full quarter:
- -62% data incidents reaching BI (from 34 to 13); MTTR down from 2h12m to 28m.
- P95 dashboard load from 9.4s to 3.1s; warehouse credits for BI down 31%.
- Ad‑hoc data requests to data engineering down 45%; 72% of analysts active weekly in BI explores.
- C‑level trust recovered: finance and product reporting variance on MRR under 0.5%.
No heroics. Just guardrails and discipline.
A 60‑day rollout plan that holds
- Days 1–10: Define top 10 business metrics; write contracts; set freshness/quality SLOs. Terraform your warehouse RBAC and monitors.
- Days 11–25: Build bronze/silver/gold (
dbt+ tests); wire Great Expectations to gold facts. Stand up lineage and catalog. Ship 1–2 certified datasets and owners. - Days 26–45: Bring up semantic layer (dbt metrics or Cube). Point BI at
MARTonly. Add RLS. Publish glossary and “how to get a chart in prod” doc. - Days 46–60: Migrate top 10 dashboards to metrics; set deprecation windows; add alerting; train champions. Start killing orphaned content.
If you do nothing else: define metrics centrally and enforce contracts. Everything good cascades from that.
If you want help untangling AI‑generated ELT or migrating Looker models without breaking finance’s quarter, GitPlumbers has done this before. See our case study and the playbook we use with teams like yours.
structuredSections':[{
Related Resources
Key takeaways
- Self‑service starts with reliability: define freshness and quality SLOs before enabling access.
- Use data contracts, tests, and lineage to keep upstream changes from breaking dashboards.
- Adopt a semantic/metric layer so “MRR” means one thing everywhere—then wire BI to it.
- Govern with RBAC and row‑level security; automate everything via IaC (Terraform) and GitOps.
- Measure success with adoption and toil reductions: fewer ad‑hoc asks, lower MTTR, faster dashboard loads.
Implementation checklist
- Stand up a semantic/metric layer (dbt metrics, Cube, or Looker) before opening BI floodgates.
- Instrument quality with `dbt` tests and Great Expectations; alert on SLO breaches.
- Lock schemas with data contracts (Avro/JSON Schema) and enforce via CI/CD.
- Automate RBAC, warehouses, and resource monitors with Terraform.
- Expose lineage (OpenLineage) and a catalog (DataHub/OpenMetadata) so users self‑debug.
- Roll out dashboards with owners, SLOs, and deprecation windows; kill orphaned content monthly.
- Publish business KPIs and definitions in one place; ban ad‑hoc definitions in prod BI.
Questions we hear from teams
- Which BI tool should we pick for self‑service?
- Pick the one your team can model in and govern. Looker excels with a strong semantic model. Superset + dbt metrics is solid and OSS. Power BI lands well in Microsoft shops. The tool matters less than enforcing a metric layer, RBAC/RLS, and performance SLOs.
- How do we prevent dashboard sprawl?
- Publish only certified datasets, require owners on every dashboard, and enforce a 90‑day auto‑archive for unused content. Track usage, deprecate loudly, and wire a monthly “content cleanup” job. Tie BI visibility to the `MART` schema only.
- Can we do self‑service without Looker or a commercial semantic layer?
- Yes. Start with dbt metrics or Cube, expose metrics via SQL/REST, and point Superset/Metabase at those models. You lose some UX polish but keep definitions centralized.
- How do we control Snowflake/BigQuery costs as access grows?
- Resource monitors, auto‑suspend, and warehouse sizing by persona. Cap BI warehouses with credit quotas, block raw access for BI roles, and pre‑aggregate heavy explores. Track P95 dashboard load and cost per 1k queries; optimize models when thresholds breach.
- Where does AI‑generated code fit in?
- Use AI to scaffold models and tests, but enforce contracts and CI. We regularly refactor AI‑generated SQL in code rescue engagements—convert ad‑hoc queries into dbt models, add tests, and attach owners. Treat AI output like junior dev code: review and harden it.
Ready to modernize your codebase?
Let GitPlumbers help you transform AI-generated chaos into clean, scalable applications.
