Here’s a complete guide to architecting and implementing data governance using Unity Catalog on Databricks — the unified governance layer designed to manage access, lineage, compliance, and auditing across all workspaces and data assets.
✅ Why Unity Catalog for Governance?
Unity Catalog offers:
Feature | Purpose |
---|---|
Centralized metadata | Unified across all workspaces |
Fine-grained access control | Table, column, row-level security |
Data lineage | Visual trace from source to consumption |
Tags & classification | PII/Sensitivity tagging, policy enforcement |
Audit logs | Who accessed what, when, how |
🏗️ Unity Catalog Governance Framework – Architecture
🔸 Architecture Overview:
+---------------------+
| Users/Roles |
+----------+----------+
|
[Access Policies]
↓
+-----------------+------------------+
| |
Unity Catalog Audit Logs
(Access + Lineage) (Compliance)
|
+------+-------+-------+-------+
| | | | |
Catalogs Schemas Tables Views Tags
🔧 Governance Setup Components
Component | Description |
---|---|
Catalog | Top-level container for schemas and tables (e.g., sales_catalog ) |
Schema | Logical grouping of tables (e.g., finance_schema ) |
Tables/Views | Actual datasets to govern |
Tags | Add metadata like PII, sensitivity, classification |
Access Control | SQL-based GRANT/REVOKE system |
Audit Logs | Log access events (workspace + Unity Catalog) |
✅ Step-by-Step: Implement Data Governance Framework
🔹 Step 1: Enable Unity Catalog (One-Time Setup)
- Enable Unity Catalog from your Azure Databricks admin console.
- Set up metastore (shared governance layer across workspaces).
- Assign Metastore Admins and Workspace Bindings.
🔹 Step 2: Define Catalog Hierarchy
Organize your data assets like this:
sales_catalog
├── raw_schema
│ └── sales_raw
├── clean_schema
│ └── sales_clean
└── curated_schema
└── sales_summary
Create them:
CREATE CATALOG IF NOT EXISTS sales_catalog;
CREATE SCHEMA IF NOT EXISTS sales_catalog.clean_schema;
🔹 Step 3: Apply Fine-Grained Access Control
🔸 Example 1: Table-Level Permissions
GRANT SELECT ON TABLE sales_catalog.clean_schema.sales_clean TO `data_analyst_group`;
GRANT ALL PRIVILEGES ON TABLE sales_catalog.curated_schema.sales_summary TO `finance_admins`;
🔸 Example 2: Column Masking for PII
CREATE MASKING POLICY mask_email
AS (email STRING) RETURNS STRING ->
CASE
WHEN is_account_group_member('pii_viewers') THEN email
ELSE '*** MASKED ***'
END;
ALTER TABLE sales_catalog.customer_schema.customers
ALTER COLUMN email
SET MASKING POLICY mask_email;
🔹 Step 4: Add Tags for Classification
ALTER TABLE sales_catalog.customer_schema.customers
ALTER COLUMN email
SET TAGS ('classification' = 'pii', 'sensitivity' = 'high');
ALTER SCHEMA customer_schema SET TAGS ('owner' = 'data-steward@company.com');
🔹 Step 5: Enable and Use Data Lineage
Once enabled, Unity Catalog automatically tracks full lineage:
- View lineage in Data Explorer
- Shows source → intermediate → gold table/view
- Tracks jobs, notebooks, queries
🔹 Step 6: Monitor & Audit Usage
Enable audit logging:
- At the workspace and Unity Catalog level
- Export logs to Azure Log Analytics, Storage, or SIEM
Includes:
- Query execution history
- Access violations
- Data masking attempts
🧰 Optional Add-ons for Governance
Feature | Tool / Method |
---|---|
Row-level security | Dynamic views with WHERE clause |
External data catalog | Sync with Purview or Collibra |
CI/CD policies | Use Terraform to manage UC config |
Alerts & notifications | ADF alerts, Azure Monitor, notebooks |
🧪 Example: Row-Level Security (Dynamic View)
CREATE OR REPLACE VIEW sales_catalog.curated_schema.secure_sales_summary AS
SELECT * FROM sales_summary
WHERE region = current_user_region();
Where current_user_region()
is a UDF tied to user metadata.
🧾 Best Practices
Category | Tip |
---|---|
Tagging | Tag all PII fields early (email, phone, address) |
Separation of duties | Catalogs per department (e.g., sales, HR) |
Schema versioning | Add _v1 , _v2 suffix for critical datasets |
Automation | Use Terraform or APIs to manage grants, tags |
Access reviews | Quarterly role audits with logs |
✅ Summary Framework Checklist
Area | Covered? |
---|---|
🔹 Catalog/Schema Design | ✅ |
🔹 Access Policies | ✅ |
🔹 Data Classification | ✅ |
🔹 Auditing + Logging | ✅ |
🔹 Lineage Tracking | ✅ |
🔹 Masking & RLS | ✅ |
🔹 Integration with BI | ✅ |
Leave a Reply