Here’s a complete guide to architecting and implementing data governance using Unity Catalog on Databricks — the unified governance layer designed to manage access, lineage, compliance, and auditing across all workspaces and data assets.


✅ Why Unity Catalog for Governance?

Unity Catalog offers:

FeaturePurpose
Centralized metadataUnified across all workspaces
Fine-grained access controlTable, column, row-level security
Data lineageVisual trace from source to consumption
Tags & classificationPII/Sensitivity tagging, policy enforcement
Audit logsWho accessed what, when, how

🏗️ Unity Catalog Governance Framework – Architecture

🔸 Architecture Overview:

                 +---------------------+
                 |      Users/Roles    |
                 +----------+----------+
                            |
                     [Access Policies]
                            ↓
          +-----------------+------------------+
          |                                    |
     Unity Catalog                         Audit Logs
  (Access + Lineage)                      (Compliance)
          |
   +------+-------+-------+-------+
   |      |       |       |       |
Catalogs Schemas Tables Views Tags

🔧 Governance Setup Components

ComponentDescription
CatalogTop-level container for schemas and tables (e.g., sales_catalog)
SchemaLogical grouping of tables (e.g., finance_schema)
Tables/ViewsActual datasets to govern
TagsAdd metadata like PII, sensitivity, classification
Access ControlSQL-based GRANT/REVOKE system
Audit LogsLog access events (workspace + Unity Catalog)

✅ Step-by-Step: Implement Data Governance Framework


🔹 Step 1: Enable Unity Catalog (One-Time Setup)

  1. Enable Unity Catalog from your Azure Databricks admin console.
  2. Set up metastore (shared governance layer across workspaces).
  3. Assign Metastore Admins and Workspace Bindings.

🔹 Step 2: Define Catalog Hierarchy

Organize your data assets like this:

sales_catalog
├── raw_schema
│   └── sales_raw
├── clean_schema
│   └── sales_clean
└── curated_schema
    └── sales_summary

Create them:

CREATE CATALOG IF NOT EXISTS sales_catalog;
CREATE SCHEMA IF NOT EXISTS sales_catalog.clean_schema;

🔹 Step 3: Apply Fine-Grained Access Control

🔸 Example 1: Table-Level Permissions

GRANT SELECT ON TABLE sales_catalog.clean_schema.sales_clean TO `data_analyst_group`;
GRANT ALL PRIVILEGES ON TABLE sales_catalog.curated_schema.sales_summary TO `finance_admins`;

🔸 Example 2: Column Masking for PII

CREATE MASKING POLICY mask_email
AS (email STRING) RETURNS STRING ->
  CASE
    WHEN is_account_group_member('pii_viewers') THEN email
    ELSE '*** MASKED ***'
  END;

ALTER TABLE sales_catalog.customer_schema.customers
  ALTER COLUMN email
  SET MASKING POLICY mask_email;

🔹 Step 4: Add Tags for Classification

ALTER TABLE sales_catalog.customer_schema.customers
  ALTER COLUMN email
  SET TAGS ('classification' = 'pii', 'sensitivity' = 'high');

ALTER SCHEMA customer_schema SET TAGS ('owner' = 'data-steward@company.com');

🔹 Step 5: Enable and Use Data Lineage

Once enabled, Unity Catalog automatically tracks full lineage:

  • View lineage in Data Explorer
  • Shows source → intermediate → gold table/view
  • Tracks jobs, notebooks, queries

🔹 Step 6: Monitor & Audit Usage

Enable audit logging:

  • At the workspace and Unity Catalog level
  • Export logs to Azure Log Analytics, Storage, or SIEM

Includes:

  • Query execution history
  • Access violations
  • Data masking attempts

🧰 Optional Add-ons for Governance

FeatureTool / Method
Row-level securityDynamic views with WHERE clause
External data catalogSync with Purview or Collibra
CI/CD policiesUse Terraform to manage UC config
Alerts & notificationsADF alerts, Azure Monitor, notebooks

🧪 Example: Row-Level Security (Dynamic View)

CREATE OR REPLACE VIEW sales_catalog.curated_schema.secure_sales_summary AS
SELECT * FROM sales_summary
WHERE region = current_user_region();

Where current_user_region() is a UDF tied to user metadata.


🧾 Best Practices

CategoryTip
TaggingTag all PII fields early (email, phone, address)
Separation of dutiesCatalogs per department (e.g., sales, HR)
Schema versioningAdd _v1, _v2 suffix for critical datasets
AutomationUse Terraform or APIs to manage grants, tags
Access reviewsQuarterly role audits with logs

✅ Summary Framework Checklist

AreaCovered?
🔹 Catalog/Schema Design
🔹 Access Policies
🔹 Data Classification
🔹 Auditing + Logging
🔹 Lineage Tracking
🔹 Masking & RLS
🔹 Integration with BI


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Leave a Reply

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading