Tuesday, January 6, 2026

How IdP Groups Are Tied to Databricks Groups (Unity Catalog)

 

🔗 How IdP Groups Are Tied to Databricks Groups (Unity Catalog)

🔑 Key Principle (Read This First)

Databricks does NOT “map” IdP groups to Databricks groups manually.
The linkage happens through SCIM provisioning.

SCIM = the binding glue between IdP and Databricks.


1️⃣ High-Level Flow

Identity Provider (Okta / Azure AD) | | SCIM Provisioning v Databricks Account Console | | Group sync v Unity Catalog Authorization
  • IdP creates & owns the group

  • SCIM syncs it into Databricks

  • Unity Catalog grants privileges to the group

  • Databricks enforces access


2️⃣ Where Each Thing Is Defined

ItemWhere It Lives
Groups (finance-readers)IdP (Okta / Azure AD)
Group membershipIdP
Group syncSCIM
Group visibilityDatabricks Account Console
Data privilegesUnity Catalog (SQL)

3️⃣ Step-by-Step: Tie IdP Groups to Databricks


STEP 1: Create Groups in the IdP

Example: Azure AD / Okta

Create these groups:

  • finance-readers

  • finance-writers

  • finance-engineers

  • finance-data-owners

Add users and service principals only in the IdP.

📌 Databricks should never be the source of truth.


STEP 2: Enable SCIM Provisioning in Databricks

In Databricks Account Console:

  1. Go to User Management

  2. Enable SCIM provisioning

  3. Generate SCIM Token

  4. Copy SCIM endpoint URL

📌 This is one-time setup.


STEP 3: Configure SCIM in the IdP

Example: Azure AD

  • Add Databricks SCIM app

  • Configure:

    • SCIM endpoint

    • Bearer token

  • Assign:

    • Groups

    • Users

    • Service principals

Example: Okta

  • Enable SCIM provisioning

  • Assign groups to the Databricks app

  • Push groups & memberships

✔ Groups now auto-sync.


STEP 4: Verify Groups in Databricks

Since you’re not an admin, ask an admin to verify in:

Databricks Account Console → User Management → Groups

Or verify yourself using SQL:

SHOW GROUPS;

You should now see:

finance-readers finance-writers finance-engineers

These groups are:
✔ SCIM-managed
✔ Read-only in Databricks
✔ Governed by IdP


STEP 5: Grant Unity Catalog Privileges to SCIM Groups

Now comes the binding to data.

GRANT USE CATALOG ON CATALOG finance TO `finance-readers`; GRANT USE SCHEMA ON SCHEMA finance.gl TO `finance-readers`; GRANT SELECT ON TABLE finance.gl.transactions TO `finance-readers`; GRANT SELECT, MODIFY ON TABLE finance.gl.transactions TO `finance-writers`; GRANT CREATE TABLE ON SCHEMA finance.gl TO `finance-engineers`;

🎯 This is where “roles” become real.


4️⃣ How Membership Changes Are Enforced (Important)

ChangeWhere DoneResult
User added to groupIdPAccess granted automatically
User removedIdPAccess revoked automatically
User terminatedIdPImmediate loss of access
New user onboardedIdPGroup membership applies

🚀 No Databricks admin action required.


5️⃣ Service Principals (ETL / Genie)

Same exact model.

In IdP:

  • Create service account / app registration

  • Add to group finance-etl-sp

SCIM:

  • Syncs service principal

Databricks:

GRANT MODIFY ON TABLE finance.gl.transactions TO `finance-etl-sp`;

Jobs and Genie now run securely.


6️⃣ How to Tell If a Group Is SCIM-Managed

In SQL:

DESCRIBE GROUP `finance-readers`;

You’ll see:

  • External ID

  • Read-only membership

📌 If it’s editable → it’s a local group (anti-pattern).


7️⃣ Common Mistakes (Avoid These 🚫)

❌ Manually creating groups in Databricks for prod
❌ Adding users directly in Databricks
❌ Granting privileges to individual users
❌ Using workspace-local groups
❌ Mixing SCIM and local groups


8️⃣ One-Screen Mental Model

IdP (truth) → SCIM → Databricks Groups → Unity Catalog Grants → Enforcement

Authorization Using Unity Catalog Security Model

 

Authorization Using Unity Catalog Security Model

Author: Harvinder Singh — Resident Solution Architect, Databricks
 


1. Overview

Unity Catalog (UC) is Databricks’ centralized governance and authorization layer for all data and AI assets across the Lakehouse.
It enforces fine-grained, secure access control on:

  • Catalogs, Schemas, Tables, Views

  • Volumes & Files

  • Functions & Models

  • Clusters, SQL Warehouses, External Locations

  • AI/ML assets, Feature Tables, Vector Indexes

  • Databricks Agents (Genie)

This document defines:

  1. Authorization architecture

  2. Role-based access control (RBAC) structure

  3. Identity and resource model

  4. Step-by-step implementation

  5. Best practices for enterprise deployment

  6. Operational processes (audit, lineage, monitoring)


2. Authorization Architecture

Unity Catalog Authorization operates at 4 layers:

2.1 Identity Layer

  • Users (human identities)

  • Service Principals (machine identities)

  • Groups (SCIM/SSO synced)

  • Workspace-local groups (limited usage)

Identity is managed through:
✔ SCIM Provisioning
✔ SSO (Okta, Azure AD, Ping, ADFS)
✔ Databricks Account Console


2.2 Resource Layer

Unity Catalog governs:

Resource TypeExamples
MetastoreUnified governance boundary
Catalogfinance, sales, engineering
Schemasales.crm, finance.gl
Table / ViewManaged or External Delta Tables
VolumesUnstructured files
FunctionsSQL/Python functions
ModelsMLflow models
Vector IndexFor RAG/AI
External Locations / Storage CredentialsS3/ADLS locations
Clean RoomsCross-organization sharing

2.3 Privileges (Authorization Rules)

Privileges define what identity can perform on a resource:

High-level privileges

  • USE CATALOG

  • USE SCHEMA

  • SELECT (read rows)

  • MODIFY (update, delete, merge)

  • CREATE TABLE

  • CREATE FUNCTION

  • EXECUTE (for models/functions/AI tasks)

Advanced privileges

  • READ FILES, WRITE FILES (Volumes)

  • BYPASS GOVERNANCE (Admin only)

  • APPLY TAG / MANAGE TAG

  • MANAGE GRANTS

  • OWNERSHIP


2.4 Enforcement Engine

Unity Catalog enforces authorization:

  • At SQL execution time

  • At API call time

  • At Notebook execution time

  • Inside Genie / Agents

  • Through lineage and audit logs

  • Across all workspaces connected to the same metastore

Because UC is part of the Databricks control plane, enforcement is real-time and cannot be bypassed.


3. RBAC Design Patterns

Below are the best-practice authorization models.


3.1 Layered RBAC Structure

Administrative Roles

RolePurpose
Account AdminControls accounts, workspaces, identity
Metastore AdminManages governance boundary
Data StewardApplies tags, controls lineage
Data OwnersOwn schemas/tables

Data Access Roles

RolePrivileges
Data ReaderSELECT on tables/views
Data WriterSELECT, MODIFY, INSERT, DELETE
Data EngineerCREATE TABLE, MODIFY
BI AnalystSELECT + USE SCHEMA
ML EngineerEXECUTE models + SELECT

Job / Service Roles

IdentityUse Case
Workflow Service PrincipalETL jobs
Dashboard Service PrincipalMaterialized view refresh
Genie Agent PrincipalAgentic workflows

Each service principal receives only the minimum privileges needed.


4. Detailed Implementation Steps (Databricks)

This section walks through exact steps to implement authorization in UC.


STEP 1: Enable Unity Catalog and Create a Metastore

  1. Log into Databricks Account Console

  2. Create a Metastore

  3. Assign root storage (S3/ADLS secure path)

  4. Assign Metastore to one or more workspaces

# Validate metastore assignment databricks metastores get --metastore-id <ID>

STEP 2: Configure Identity Sync (Okta / Azure AD)

Enable SCIM provisioning

  • Sync users & groups to Databricks

  • Assign proper roles such as data-analysts, bi-users, etl-jobs

Validate Groups:

databricks groups list

STEP 3: Create Catalogs & Schemas and Assign Ownership

CREATE CATALOG finance; CREATE SCHEMA finance.gl; CREATE SCHEMA finance.ap; -- Assign ownership to Finance Data Owner group GRANT OWNERSHIP ON CATALOG finance TO `finance-data-owners`;

Catalog ownership allows delegated grants.


STEP 4: Define Access Roles

Example groups (from SCIM):

  • finance-readers

  • finance-writers

  • finance-engineers

  • etl-service-principals


STEP 5: Grant Privileges (RBAC Implementation)

Catalog Level

GRANT USE CATALOG ON CATALOG finance TO `finance-readers`;

Schema Level

GRANT USE SCHEMA ON SCHEMA finance.gl TO `finance-readers`; GRANT CREATE TABLE ON SCHEMA finance.gl TO `finance-writers`;

Table Level

GRANT SELECT ON TABLE finance.gl.transactions TO `finance-readers`; GRANT MODIFY ON TABLE finance.gl.transactions TO `finance-writers`;

Volume Level

GRANT READ FILES ON VOLUME finance.rawfiles TO `finance-readers`;

Function & Model Level

GRANT EXECUTE ON FUNCTION finance.gl.clean_data TO `etl-service-principals`;

STEP 6: Implement Data Masking / Row-Level Security (Optional)

Example: PII Masking

CREATE OR REPLACE VIEW finance.gl.transaction_masked AS SELECT account_id, CASE WHEN is_account_owner() THEN ssn ELSE '***-**-****' END AS ssn_masked, amount FROM finance.gl.transactions;

Then grant:

GRANT SELECT ON VIEW finance.gl.transaction_masked TO finance-readers;

STEP 7: Configure External Locations (Secure Access to S3/ADLS)

Create a storage credential (IAM role or SAS token):

CREATE STORAGE CREDENTIAL finance_sc WITH IAM_ROLE = 'arn:aws:iam::123456789012:role/finance-access';

Create external location:

CREATE EXTERNAL LOCATION finance_loc URL 's3://company-data/finance/' WITH CREDENTIAL finance_sc;

Grant access to engineers:

GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION finance_loc TO `finance-engineers`;

STEP 8: Configure Job/Workflow Authorization

Service principal access:

GRANT SELECT, MODIFY ON TABLE finance.gl.transactions TO `etl-service-principals`;

Workflow must run under the service principal identity.


STEP 9: Audit Logging, Lineage, and Monitoring

Databricks automatically logs:

  • Permission changes

  • Data accesses

  • Notebook executions

  • Model inferences

  • Workflow runs

  • Genie agent actions

Enable audit log delivery:

AWS → S3 bucket
Azure → Monitor / EventHub

Query audit logs:

SELECT * FROM system.access.audit WHERE user_name = 'john.doe@example.com';

Lineage Tracking

Unity Catalog automatically tracks lineage across:

  • SQL

  • Notebooks

  • Jobs

  • Delta Live Tables

  • ML pipelines

  • Databricks Agents

No extra configuration needed.


5. Operational Governance Model

Daily Operations

  • Assign users to appropriate SCIM groups

  • Keep least-privilege enforcement

  • Review lineage before modifying objects

Monthly

  • Permission review with data owners

  • Audit policy approvals

  • Schema evolution reviews

Quarterly

  • Access certification

  • Tagging verification (e.g., PII, Restricted)

  • AI Agent resource permission review


6. Best Practices

Access Control

✔ Use Groups, never direct user grants
✔ Grant only required privileges (principle of least privilege)
✔ Use views for masking sensitive columns
✔ Use Metastore-level admins sparingly
✔ Delegate schema/table ownership to data owners (not IT)


Storage

✔ Use External Locations with IAM roles
✔ One S3/ADLS path per domain
✔ Disable direct bucket access; enforce UC controls


Automation

✔ Use service principals for jobs, dashboards, and agents
✔ Do not run jobs as human users
✔ Enforce cluster policies that restrict external hosts


AI / Genie Authorization

Genie can only access:

  • Catalogs/Schemas/Tables the agent identity has rights to

  • Notebooks the identity can EXECUTE

  • Volumes the identity can READ FILES / WRITE FILES

No privilege escalation is possible.


7. Example End-to-End Setup Script

-- Create catalog CREATE CATALOG sales; -- Schema CREATE SCHEMA sales.orders; -- Ownership GRANT OWNERSHIP ON CATALOG sales TO `sales-data-owner`; -- Readers/Writers GRANT USE CATALOG ON CATALOG sales TO `sales-readers`; GRANT USE SCHEMA ON SCHEMA sales.orders TO `sales-readers`; GRANT SELECT ON TABLE sales.orders.order_delta TO `sales-readers`; GRANT MODIFY ON TABLE sales.orders.order_delta TO `sales-writers`; GRANT CREATE TABLE ON SCHEMA sales.orders TO `sales-engineers`; -- Service principal (ETL) GRANT SELECT, MODIFY ON TABLE sales.orders.order_delta TO `etl-service-principal`;

8. Final Architecture Diagram (Text-Based)

+------------------------------+ | Identity Layer | | Users, Groups, SSO, SCIM | +---------------+--------------+ | v +-------------------------------------------+ | Unity Catalog Authorization | | Catalog → SchemaTable/ViewColumn | | External Locations → Volumes → Models | +----------------------+----------------------+ | v +-------------------+ +------------------+ +--------------------+ | Analytics & SQL | | ETL / ML Jobs | | AI/Genie Agents | | Warehouses | | Service Principals| | Notebooks/Functions | +-------------------+ +------------------+ +--------------------+

How IdP Groups Are Tied to Databricks Groups (Unity Catalog)

  🔗 How IdP Groups Are Tied to Databricks Groups (Unity Catalog) 🔑 Key Principle (Read This First) Databricks does NOT “map” IdP groups...