Here’s a complete practical guide to integrate Azure Data Factory (ADF) with Unity Catalog (UC) in Azure Databricks. This enables secure, governed, and scalable data workflows that comply with enterprise data governance policies.


✅ Why Integrate ADF with Unity Catalog?

BenefitDescription
🔐 Centralized GovernanceEnforce data access using Unity Catalog policies
🧾 Audit & LineageFull traceability and lineage via Unity Catalog
🔄 Secure Pipeline ExecutionUse ADF to trigger Databricks notebooks that access UC
🧠 Role-Based Security (RBAC)Column- and row-level access control from Unity Catalog

🏗️ ADF + Unity Catalog: High-Level Architecture

+------------------------+
|   Azure Data Factory   |
|------------------------|
| - Pipeline             |
| - Trigger              |
| - Databricks Notebook  |
+------------------------+
           |
           ↓
+------------------------+
| Azure Databricks       |
|------------------------|
| - Cluster (UC-enabled) |
| - Notebooks            |
| - SQL Warehouses       |
+------------------------+
           ↓
+------------------------+
| Unity Catalog (Metastore) |
| - Access policies        |
| - Audit logs             |
| - Tags & masking         |
+------------------------+
           ↓
        Delta Tables
        in ADLS Gen2

🪜 Step-by-Step Guide to Integrate ADF with Unity Catalog


✅ Step 1: Set Up Unity Catalog in Databricks

  1. Create Unity Catalog Metastore in Azure Databricks.
  2. Assign the metastore to all relevant workspaces.
  3. Create a catalog and schemas, e.g.:
CREATE CATALOG IF NOT EXISTS finance_catalog;
CREATE SCHEMA IF NOT EXISTS finance_catalog.sales_schema;
  1. Create Delta tables under Unity Catalog:
df.write.format("delta").saveAsTable("finance_catalog.sales_schema.sales_data")

✅ Step 2: Enable Access Policies in Unity Catalog

  • Assign access rights to data consumers:
GRANT SELECT ON TABLE finance_catalog.sales_schema.sales_data TO `data_analyst_group`;
  • Optionally apply column masking:
CREATE MASKING POLICY mask_email
AS (email STRING) RETURNS STRING ->
  CASE
    WHEN is_account_group_member('email_viewers') THEN email
    ELSE '***'
  END;

ALTER TABLE finance_catalog.sales_schema.customers
ALTER COLUMN email SET MASKING POLICY mask_email;

✅ Step 3: Create ADF Linked Service to Azure Databricks (Unity-Enabled)

🧾 JSON (ADF → Linked Service to UC Cluster)

{
  "type": "AzureDatabricks",
  "typeProperties": {
    "domain": "https://<databricks-instance>.azuredatabricks.net",
    "authentication": "MSI",  // Or "Token"
    "workspaceResourceId": "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Databricks/workspaces/xxx",
    "clusterId": "<UC-enabled-cluster-id>"
  }
}

✅ Ensure:

  • Cluster is Unity Catalog-enabled (set from Databricks workspace).
  • ADF Managed Identity has access to read workspace and trigger jobs.

✅ Step 4: Configure Databricks Notebook to Read Unity Catalog Table

Example notebook code:

# Ensure the cluster is Unity Catalog-enabled
df = spark.read.table("finance_catalog.sales_schema.sales_data")

df_filtered = df.filter("amount > 0")
df_filtered.write.format("delta").mode("overwrite") \
    .saveAsTable("finance_catalog.sales_schema.sales_filtered")

✅ Tables created via saveAsTable() will automatically register under Unity Catalog.


✅ Step 5: Trigger Notebook from ADF Pipeline

  • Use Databricks Notebook activity in ADF pipeline.
  • Configure the notebook path, parameters (if any), and linked service (from Step 3).

Example ADF activity config:

{
  "type": "DatabricksNotebook",
  "typeProperties": {
    "notebookPath": "/Repos/finance/load_sales_data",
    "baseParameters": {
      "env": "prod"
    }
  },
  "linkedServiceName": {
    "referenceName": "LS_Databricks_UC",
    "type": "LinkedServiceReference"
  }
}

✅ Step 6: Validate Security with Unity Catalog

  • Use Unity Catalog audit logs to track:
    • What data was accessed
    • Who accessed it
    • What policy was enforced
  • View data lineage in Data Explorer > Table > Lineage

🧠 Best Practices

AreaPractice
SecurityUse Managed Identity to connect ADF to Databricks
Table NamingAlways use 3-level naming: catalog.schema.table
EnvironmentUse config/params to switch across dev/test/prod
TagsApply sensitivity, owner, and pii tags
DebuggingUse dbutils.notebook.exit() to return status to ADF

🧾 Example: Returning Success/Failure to ADF

try:
    # Logic
    dbutils.notebook.exit("Success")
except Exception as e:
    dbutils.notebook.exit(f"Failure: {e}")

Use this in ADF to conditionally trigger alerts or retries.


✅ Final Summary Table

ComponentWhat You Do
Unity CatalogManage access, policies, tags, lineage
ADF PipelineOrchestrate data workflows
Linked ServiceSecurely connect ADF to Unity Catalog cluster
NotebooksUse spark.read.table() to access UC tables
TriggersSchedule or event-driven orchestration
Audit + LineageUse Databricks logs + Data Explorer

Pages: 1 2


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Leave a Reply

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading