Here’s a complete practical guide to integrate Azure Data Factory (ADF) with Unity Catalog (UC) in Azure Databricks. This enables secure, governed, and scalable data workflows that comply with enterprise data governance policies.
✅ Why Integrate ADF with Unity Catalog?
Benefit | Description |
---|---|
🔐 Centralized Governance | Enforce data access using Unity Catalog policies |
🧾 Audit & Lineage | Full traceability and lineage via Unity Catalog |
🔄 Secure Pipeline Execution | Use ADF to trigger Databricks notebooks that access UC |
🧠 Role-Based Security (RBAC) | Column- and row-level access control from Unity Catalog |
🏗️ ADF + Unity Catalog: High-Level Architecture
+------------------------+
| Azure Data Factory |
|------------------------|
| - Pipeline |
| - Trigger |
| - Databricks Notebook |
+------------------------+
|
↓
+------------------------+
| Azure Databricks |
|------------------------|
| - Cluster (UC-enabled) |
| - Notebooks |
| - SQL Warehouses |
+------------------------+
↓
+------------------------+
| Unity Catalog (Metastore) |
| - Access policies |
| - Audit logs |
| - Tags & masking |
+------------------------+
↓
Delta Tables
in ADLS Gen2
🪜 Step-by-Step Guide to Integrate ADF with Unity Catalog
✅ Step 1: Set Up Unity Catalog in Databricks
- Create Unity Catalog Metastore in Azure Databricks.
- Assign the metastore to all relevant workspaces.
- Create a catalog and schemas, e.g.:
CREATE CATALOG IF NOT EXISTS finance_catalog;
CREATE SCHEMA IF NOT EXISTS finance_catalog.sales_schema;
- Create Delta tables under Unity Catalog:
df.write.format("delta").saveAsTable("finance_catalog.sales_schema.sales_data")
✅ Step 2: Enable Access Policies in Unity Catalog
- Assign access rights to data consumers:
GRANT SELECT ON TABLE finance_catalog.sales_schema.sales_data TO `data_analyst_group`;
- Optionally apply column masking:
CREATE MASKING POLICY mask_email
AS (email STRING) RETURNS STRING ->
CASE
WHEN is_account_group_member('email_viewers') THEN email
ELSE '***'
END;
ALTER TABLE finance_catalog.sales_schema.customers
ALTER COLUMN email SET MASKING POLICY mask_email;
✅ Step 3: Create ADF Linked Service to Azure Databricks (Unity-Enabled)
🧾 JSON (ADF → Linked Service to UC Cluster)
{
"type": "AzureDatabricks",
"typeProperties": {
"domain": "https://<databricks-instance>.azuredatabricks.net",
"authentication": "MSI", // Or "Token"
"workspaceResourceId": "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Databricks/workspaces/xxx",
"clusterId": "<UC-enabled-cluster-id>"
}
}
✅ Ensure:
- Cluster is Unity Catalog-enabled (set from Databricks workspace).
- ADF Managed Identity has access to read workspace and trigger jobs.
✅ Step 4: Configure Databricks Notebook to Read Unity Catalog Table
Example notebook code:
# Ensure the cluster is Unity Catalog-enabled
df = spark.read.table("finance_catalog.sales_schema.sales_data")
df_filtered = df.filter("amount > 0")
df_filtered.write.format("delta").mode("overwrite") \
.saveAsTable("finance_catalog.sales_schema.sales_filtered")
✅ Tables created via saveAsTable()
will automatically register under Unity Catalog.
✅ Step 5: Trigger Notebook from ADF Pipeline
- Use Databricks Notebook activity in ADF pipeline.
- Configure the notebook path, parameters (if any), and linked service (from Step 3).
Example ADF activity config:
{
"type": "DatabricksNotebook",
"typeProperties": {
"notebookPath": "/Repos/finance/load_sales_data",
"baseParameters": {
"env": "prod"
}
},
"linkedServiceName": {
"referenceName": "LS_Databricks_UC",
"type": "LinkedServiceReference"
}
}
✅ Step 6: Validate Security with Unity Catalog
- Use Unity Catalog audit logs to track:
- What data was accessed
- Who accessed it
- What policy was enforced
- View data lineage in Data Explorer > Table > Lineage
🧠 Best Practices
Area | Practice |
---|---|
Security | Use Managed Identity to connect ADF to Databricks |
Table Naming | Always use 3-level naming: catalog.schema.table |
Environment | Use config/params to switch across dev/test/prod |
Tags | Apply sensitivity , owner , and pii tags |
Debugging | Use dbutils.notebook.exit() to return status to ADF |
🧾 Example: Returning Success/Failure to ADF
try:
# Logic
dbutils.notebook.exit("Success")
except Exception as e:
dbutils.notebook.exit(f"Failure: {e}")
Use this in ADF to conditionally trigger alerts or retries.
✅ Final Summary Table
Component | What You Do |
---|---|
Unity Catalog | Manage access, policies, tags, lineage |
ADF Pipeline | Orchestrate data workflows |
Linked Service | Securely connect ADF to Unity Catalog cluster |
Notebooks | Use spark.read.table() to access UC tables |
Triggers | Schedule or event-driven orchestration |
Audit + Lineage | Use Databricks logs + Data Explorer |
Leave a Reply