Ops BLOG

This BLOG focuses on "hands on approach" around AWS, OCI Oracle Cloud Infrastructure, Dev/Ops, MicroServices, OKTA, Oracle Fusion Middleware, Oracle Service Bus, Oracle AIA, Oracle SOA Suite, Oracle SOA Cloud/Developer Cloud, Oracle Identity Management including OID, OAM, OIM, OSSO, Oracle Big Data, WLST Scripts and Oracle Edifecs B2B Engine for HIPAA/HL7/X12/EDIFACT EDI., Kafka, Spark, Spring Boot, DevOps, AWS, GCP and Oracle Cloud

Tuesday, January 6, 2026

How IdP Groups Are Tied to Databricks Groups (Unity Catalog)

🔗 How IdP Groups Are Tied to Databricks Groups (Unity Catalog)

🔑 Key Principle (Read This First)

Databricks does NOT “map” IdP groups to Databricks groups manually.
The linkage happens through SCIM provisioning.

SCIM = the binding glue between IdP and Databricks.

1️⃣ High-Level Flow


Identity Provider (Okta / Azure AD)
        |
        |  SCIM Provisioning
        v
Databricks Account Console
        |
        |  Group sync
        v
Unity Catalog Authorization

IdP creates & owns the group
SCIM syncs it into Databricks
Unity Catalog grants privileges to the group
Databricks enforces access

2️⃣ Where Each Thing Is Defined

Item	Where It Lives
Groups (`finance-readers`)	IdP (Okta / Azure AD)
Group membership	IdP
Group sync	SCIM
Group visibility	Databricks Account Console
Data privileges	Unity Catalog (SQL)

3️⃣ Step-by-Step: Tie IdP Groups to Databricks

STEP 1: Create Groups in the IdP

Example: Azure AD / Okta

Create these groups:

finance-readers
finance-writers
finance-engineers
finance-data-owners

Add users and service principals only in the IdP.

📌 Databricks should never be the source of truth.

STEP 2: Enable SCIM Provisioning in Databricks

In Databricks Account Console:

Go to User Management
Enable SCIM provisioning
Generate SCIM Token
Copy SCIM endpoint URL

📌 This is one-time setup.

STEP 3: Configure SCIM in the IdP

Example: Azure AD

Add Databricks SCIM app
Configure:
- SCIM endpoint
- Bearer token
Assign:
- Groups
- Users
- Service principals

Example: Okta

Enable SCIM provisioning
Assign groups to the Databricks app
Push groups & memberships

✔ Groups now auto-sync.

STEP 4: Verify Groups in Databricks

Since you’re not an admin, ask an admin to verify in:


Databricks Account Console
→ User Management
→ Groups

Or verify yourself using SQL:


SHOW GROUPS;

You should now see:


finance-readers
finance-writers
finance-engineers

These groups are:
✔ SCIM-managed
✔ Read-only in Databricks
✔ Governed by IdP

STEP 5: Grant Unity Catalog Privileges to SCIM Groups

Now comes the binding to data.


GRANT USE CATALOG ON CATALOG finance TO `finance-readers`;
GRANT USE SCHEMA ON SCHEMA finance.gl TO `finance-readers`;

GRANT SELECT ON TABLE finance.gl.transactions TO `finance-readers`;
GRANT SELECT, MODIFY ON TABLE finance.gl.transactions TO `finance-writers`;

GRANT CREATE TABLE ON SCHEMA finance.gl TO `finance-engineers`;

🎯 This is where “roles” become real.

4️⃣ How Membership Changes Are Enforced (Important)

Change	Where Done	Result
User added to group	IdP	Access granted automatically
User removed	IdP	Access revoked automatically
User terminated	IdP	Immediate loss of access
New user onboarded	IdP	Group membership applies

🚀 No Databricks admin action required.

5️⃣ Service Principals (ETL / Genie)

Same exact model.

In IdP:

Create service account / app registration
Add to group finance-etl-sp

SCIM:

Syncs service principal

Databricks:


GRANT MODIFY ON TABLE finance.gl.transactions TO `finance-etl-sp`;

Jobs and Genie now run securely.

6️⃣ How to Tell If a Group Is SCIM-Managed

In SQL:


DESCRIBE GROUP `finance-readers`;

You’ll see:

External ID
Read-only membership

📌 If it’s editable → it’s a local group (anti-pattern).

7️⃣ Common Mistakes (Avoid These 🚫)

❌ Manually creating groups in Databricks for prod
❌ Adding users directly in Databricks
❌ Granting privileges to individual users
❌ Using workspace-local groups
❌ Mixing SCIM and local groups

8️⃣ One-Screen Mental Model


IdP (truth) → SCIM → Databricks Groups → Unity Catalog Grants → Enforcement

Authorization Using Unity Catalog Security Model

Author: Harvinder Singh — Resident Solution Architect, Databricks

1. Overview

Unity Catalog (UC) is Databricks’ centralized governance and authorization layer for all data and AI assets across the Lakehouse.
It enforces fine-grained, secure access control on:

Catalogs, Schemas, Tables, Views
Volumes & Files
Functions & Models
Clusters, SQL Warehouses, External Locations
AI/ML assets, Feature Tables, Vector Indexes
Databricks Agents (Genie)

This document defines:

Authorization architecture
Role-based access control (RBAC) structure
Identity and resource model
Step-by-step implementation
Best practices for enterprise deployment
Operational processes (audit, lineage, monitoring)

2. Authorization Architecture

Unity Catalog Authorization operates at 4 layers:

2.1 Identity Layer

Users (human identities)
Service Principals (machine identities)
Groups (SCIM/SSO synced)
Workspace-local groups (limited usage)

Identity is managed through:
✔ SCIM Provisioning
✔ SSO (Okta, Azure AD, Ping, ADFS)
✔ Databricks Account Console

2.2 Resource Layer

Unity Catalog governs:

Resource Type	Examples
Metastore	Unified governance boundary
Catalog	`finance`, `sales`, `engineering`
Schema	`sales.crm`, `finance.gl`
Table / View	Managed or External Delta Tables
Volumes	Unstructured files
Functions	SQL/Python functions
Models	MLflow models
Vector Index	For RAG/AI
External Locations / Storage Credentials	S3/ADLS locations
Clean Rooms	Cross-organization sharing

2.3 Privileges (Authorization Rules)

Privileges define what identity can perform on a resource:

High-level privileges

USE CATALOG
USE SCHEMA
SELECT (read rows)
MODIFY (update, delete, merge)
CREATE TABLE
CREATE FUNCTION
EXECUTE (for models/functions/AI tasks)

Advanced privileges

READ FILES, WRITE FILES (Volumes)
BYPASS GOVERNANCE (Admin only)
APPLY TAG / MANAGE TAG
MANAGE GRANTS
OWNERSHIP

2.4 Enforcement Engine

Unity Catalog enforces authorization:

At SQL execution time
At API call time
At Notebook execution time
Inside Genie / Agents
Through lineage and audit logs
Across all workspaces connected to the same metastore

Because UC is part of the Databricks control plane, enforcement is real-time and cannot be bypassed.

3. RBAC Design Patterns

Below are the best-practice authorization models.

3.1 Layered RBAC Structure

Administrative Roles

Role	Purpose
Account Admin	Controls accounts, workspaces, identity
Metastore Admin	Manages governance boundary
Data Steward	Applies tags, controls lineage
Data Owners	Own schemas/tables

Data Access Roles

Role	Privileges
Data Reader	SELECT on tables/views
Data Writer	SELECT, MODIFY, INSERT, DELETE
Data Engineer	CREATE TABLE, MODIFY
BI Analyst	SELECT + USE SCHEMA
ML Engineer	EXECUTE models + SELECT

Job / Service Roles

Identity	Use Case
Workflow Service Principal	ETL jobs
Dashboard Service Principal	Materialized view refresh
Genie Agent Principal	Agentic workflows

Each service principal receives only the minimum privileges needed.

4. Detailed Implementation Steps (Databricks)

This section walks through exact steps to implement authorization in UC.

STEP 1: Enable Unity Catalog and Create a Metastore

Log into Databricks Account Console
Create a Metastore
Assign root storage (S3/ADLS secure path)
Assign Metastore to one or more workspaces


# Validate metastore assignment
databricks metastores get --metastore-id <ID>

STEP 2: Configure Identity Sync (Okta / Azure AD)

Enable SCIM provisioning

Sync users & groups to Databricks
Assign proper roles such as data-analysts, bi-users, etl-jobs

Validate Groups:


databricks groups list

STEP 3: Create Catalogs & Schemas and Assign Ownership


CREATE CATALOG finance;
CREATE SCHEMA finance.gl;
CREATE SCHEMA finance.ap;

-- Assign ownership to Finance Data Owner group
GRANT OWNERSHIP ON CATALOG finance TO `finance-data-owners`;

Catalog ownership allows delegated grants.

STEP 4: Define Access Roles

Example groups (from SCIM):

finance-readers
finance-writers
finance-engineers
etl-service-principals

STEP 5: Grant Privileges (RBAC Implementation)

Catalog Level


GRANT USE CATALOG ON CATALOG finance TO `finance-readers`;

Schema Level


GRANT USE SCHEMA ON SCHEMA finance.gl TO `finance-readers`;
GRANT CREATE TABLE ON SCHEMA finance.gl TO `finance-writers`;

Table Level


GRANT SELECT ON TABLE finance.gl.transactions TO `finance-readers`;
GRANT MODIFY ON TABLE finance.gl.transactions TO `finance-writers`;

Volume Level


GRANT READ FILES ON VOLUME finance.rawfiles TO `finance-readers`;

Function & Model Level


GRANT EXECUTE ON FUNCTION finance.gl.clean_data TO `etl-service-principals`;

STEP 6: Implement Data Masking / Row-Level Security (Optional)

Example: PII Masking


CREATE OR REPLACE VIEW finance.gl.transaction_masked AS
SELECT
  account_id,
  CASE WHEN is_account_owner() THEN ssn ELSE '***-**-****' END AS ssn_masked,
  amount
FROM finance.gl.transactions;

Then grant:


GRANT SELECT ON VIEW finance.gl.transaction_masked TO finance-readers;

STEP 7: Configure External Locations (Secure Access to S3/ADLS)

Create a storage credential (IAM role or SAS token):


CREATE STORAGE CREDENTIAL finance_sc
WITH IAM_ROLE = 'arn:aws:iam::123456789012:role/finance-access';

Create external location:


CREATE EXTERNAL LOCATION finance_loc
URL 's3://company-data/finance/'
WITH CREDENTIAL finance_sc;

Grant access to engineers:


GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION finance_loc TO `finance-engineers`;

STEP 8: Configure Job/Workflow Authorization

Service principal access:


GRANT SELECT, MODIFY ON TABLE finance.gl.transactions TO `etl-service-principals`;

Workflow must run under the service principal identity.

STEP 9: Audit Logging, Lineage, and Monitoring

Databricks automatically logs:

Permission changes
Data accesses
Notebook executions
Model inferences
Workflow runs
Genie agent actions

Enable audit log delivery:

AWS → S3 bucket
Azure → Monitor / EventHub

Query audit logs:


SELECT * FROM system.access.audit WHERE user_name = 'john.doe@example.com';

Lineage Tracking

Unity Catalog automatically tracks lineage across:

SQL
Notebooks
Jobs
Delta Live Tables
ML pipelines
Databricks Agents

No extra configuration needed.

5. Operational Governance Model

Daily Operations

Assign users to appropriate SCIM groups
Keep least-privilege enforcement
Review lineage before modifying objects

Monthly

Permission review with data owners
Audit policy approvals
Schema evolution reviews

Quarterly

Access certification
Tagging verification (e.g., PII, Restricted)
AI Agent resource permission review

6. Best Practices

Access Control

✔ Use Groups, never direct user grants
✔ Grant only required privileges (principle of least privilege)
✔ Use views for masking sensitive columns
✔ Use Metastore-level admins sparingly
✔ Delegate schema/table ownership to data owners (not IT)

Storage

✔ Use External Locations with IAM roles
✔ One S3/ADLS path per domain
✔ Disable direct bucket access; enforce UC controls

Automation

✔ Use service principals for jobs, dashboards, and agents
✔ Do not run jobs as human users
✔ Enforce cluster policies that restrict external hosts

AI / Genie Authorization

Genie can only access:

Catalogs/Schemas/Tables the agent identity has rights to
Notebooks the identity can EXECUTE
Volumes the identity can READ FILES / WRITE FILES

No privilege escalation is possible.

7. Example End-to-End Setup Script


-- Create catalog
CREATE CATALOG sales;

-- Schema
CREATE SCHEMA sales.orders;

-- Ownership
GRANT OWNERSHIP ON CATALOG sales TO `sales-data-owner`;

-- Readers/Writers
GRANT USE CATALOG ON CATALOG sales TO `sales-readers`;
GRANT USE SCHEMA ON SCHEMA sales.orders TO `sales-readers`;
GRANT SELECT ON TABLE sales.orders.order_delta TO `sales-readers`;

GRANT MODIFY ON TABLE sales.orders.order_delta TO `sales-writers`;
GRANT CREATE TABLE ON SCHEMA sales.orders TO `sales-engineers`;

-- Service principal (ETL)
GRANT SELECT, MODIFY ON TABLE sales.orders.order_delta TO `etl-service-principal`;

8. Final Architecture Diagram (Text-Based)


                 +------------------------------+
                 |        Identity Layer        |
                 | Users, Groups, SSO, SCIM     |
                 +---------------+--------------+
                                 |
                                 v
           +-------------------------------------------+
           |        Unity Catalog Authorization         |
           | Catalog → Schema → Table/View → Column     |
           | External Locations → Volumes → Models      |
           +----------------------+----------------------+
                                  |
                                  v
     +-------------------+   +------------------+   +--------------------+
     | Analytics & SQL   |   | ETL / ML Jobs    |   | AI/Genie Agents     |
     | Warehouses        |   | Service Principals|   | Notebooks/Functions |
     +-------------------+   +------------------+   +--------------------+