Tuesday, January 6, 2026

Authorization Using Unity Catalog Security Model

 

Authorization Using Unity Catalog Security Model

Author: Harvinder Singh — Resident Solution Architect, Databricks
 


1. Overview

Unity Catalog (UC) is Databricks’ centralized governance and authorization layer for all data and AI assets across the Lakehouse.
It enforces fine-grained, secure access control on:

  • Catalogs, Schemas, Tables, Views

  • Volumes & Files

  • Functions & Models

  • Clusters, SQL Warehouses, External Locations

  • AI/ML assets, Feature Tables, Vector Indexes

  • Databricks Agents (Genie)

This document defines:

  1. Authorization architecture

  2. Role-based access control (RBAC) structure

  3. Identity and resource model

  4. Step-by-step implementation

  5. Best practices for enterprise deployment

  6. Operational processes (audit, lineage, monitoring)


2. Authorization Architecture

Unity Catalog Authorization operates at 4 layers:

2.1 Identity Layer

  • Users (human identities)

  • Service Principals (machine identities)

  • Groups (SCIM/SSO synced)

  • Workspace-local groups (limited usage)

Identity is managed through:
✔ SCIM Provisioning
✔ SSO (Okta, Azure AD, Ping, ADFS)
✔ Databricks Account Console


2.2 Resource Layer

Unity Catalog governs:

Resource TypeExamples
MetastoreUnified governance boundary
Catalogfinance, sales, engineering
Schemasales.crm, finance.gl
Table / ViewManaged or External Delta Tables
VolumesUnstructured files
FunctionsSQL/Python functions
ModelsMLflow models
Vector IndexFor RAG/AI
External Locations / Storage CredentialsS3/ADLS locations
Clean RoomsCross-organization sharing

2.3 Privileges (Authorization Rules)

Privileges define what identity can perform on a resource:

High-level privileges

  • USE CATALOG

  • USE SCHEMA

  • SELECT (read rows)

  • MODIFY (update, delete, merge)

  • CREATE TABLE

  • CREATE FUNCTION

  • EXECUTE (for models/functions/AI tasks)

Advanced privileges

  • READ FILES, WRITE FILES (Volumes)

  • BYPASS GOVERNANCE (Admin only)

  • APPLY TAG / MANAGE TAG

  • MANAGE GRANTS

  • OWNERSHIP


2.4 Enforcement Engine

Unity Catalog enforces authorization:

  • At SQL execution time

  • At API call time

  • At Notebook execution time

  • Inside Genie / Agents

  • Through lineage and audit logs

  • Across all workspaces connected to the same metastore

Because UC is part of the Databricks control plane, enforcement is real-time and cannot be bypassed.


3. RBAC Design Patterns

Below are the best-practice authorization models.


3.1 Layered RBAC Structure

Administrative Roles

RolePurpose
Account AdminControls accounts, workspaces, identity
Metastore AdminManages governance boundary
Data StewardApplies tags, controls lineage
Data OwnersOwn schemas/tables

Data Access Roles

RolePrivileges
Data ReaderSELECT on tables/views
Data WriterSELECT, MODIFY, INSERT, DELETE
Data EngineerCREATE TABLE, MODIFY
BI AnalystSELECT + USE SCHEMA
ML EngineerEXECUTE models + SELECT

Job / Service Roles

IdentityUse Case
Workflow Service PrincipalETL jobs
Dashboard Service PrincipalMaterialized view refresh
Genie Agent PrincipalAgentic workflows

Each service principal receives only the minimum privileges needed.


4. Detailed Implementation Steps (Databricks)

This section walks through exact steps to implement authorization in UC.


STEP 1: Enable Unity Catalog and Create a Metastore

  1. Log into Databricks Account Console

  2. Create a Metastore

  3. Assign root storage (S3/ADLS secure path)

  4. Assign Metastore to one or more workspaces

# Validate metastore assignment databricks metastores get --metastore-id <ID>

STEP 2: Configure Identity Sync (Okta / Azure AD)

Enable SCIM provisioning

  • Sync users & groups to Databricks

  • Assign proper roles such as data-analysts, bi-users, etl-jobs

Validate Groups:

databricks groups list

STEP 3: Create Catalogs & Schemas and Assign Ownership

CREATE CATALOG finance; CREATE SCHEMA finance.gl; CREATE SCHEMA finance.ap; -- Assign ownership to Finance Data Owner group GRANT OWNERSHIP ON CATALOG finance TO `finance-data-owners`;

Catalog ownership allows delegated grants.


STEP 4: Define Access Roles

Example groups (from SCIM):

  • finance-readers

  • finance-writers

  • finance-engineers

  • etl-service-principals


STEP 5: Grant Privileges (RBAC Implementation)

Catalog Level

GRANT USE CATALOG ON CATALOG finance TO `finance-readers`;

Schema Level

GRANT USE SCHEMA ON SCHEMA finance.gl TO `finance-readers`; GRANT CREATE TABLE ON SCHEMA finance.gl TO `finance-writers`;

Table Level

GRANT SELECT ON TABLE finance.gl.transactions TO `finance-readers`; GRANT MODIFY ON TABLE finance.gl.transactions TO `finance-writers`;

Volume Level

GRANT READ FILES ON VOLUME finance.rawfiles TO `finance-readers`;

Function & Model Level

GRANT EXECUTE ON FUNCTION finance.gl.clean_data TO `etl-service-principals`;

STEP 6: Implement Data Masking / Row-Level Security (Optional)

Example: PII Masking

CREATE OR REPLACE VIEW finance.gl.transaction_masked AS SELECT account_id, CASE WHEN is_account_owner() THEN ssn ELSE '***-**-****' END AS ssn_masked, amount FROM finance.gl.transactions;

Then grant:

GRANT SELECT ON VIEW finance.gl.transaction_masked TO finance-readers;

STEP 7: Configure External Locations (Secure Access to S3/ADLS)

Create a storage credential (IAM role or SAS token):

CREATE STORAGE CREDENTIAL finance_sc WITH IAM_ROLE = 'arn:aws:iam::123456789012:role/finance-access';

Create external location:

CREATE EXTERNAL LOCATION finance_loc URL 's3://company-data/finance/' WITH CREDENTIAL finance_sc;

Grant access to engineers:

GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION finance_loc TO `finance-engineers`;

STEP 8: Configure Job/Workflow Authorization

Service principal access:

GRANT SELECT, MODIFY ON TABLE finance.gl.transactions TO `etl-service-principals`;

Workflow must run under the service principal identity.


STEP 9: Audit Logging, Lineage, and Monitoring

Databricks automatically logs:

  • Permission changes

  • Data accesses

  • Notebook executions

  • Model inferences

  • Workflow runs

  • Genie agent actions

Enable audit log delivery:

AWS → S3 bucket
Azure → Monitor / EventHub

Query audit logs:

SELECT * FROM system.access.audit WHERE user_name = 'john.doe@example.com';

Lineage Tracking

Unity Catalog automatically tracks lineage across:

  • SQL

  • Notebooks

  • Jobs

  • Delta Live Tables

  • ML pipelines

  • Databricks Agents

No extra configuration needed.


5. Operational Governance Model

Daily Operations

  • Assign users to appropriate SCIM groups

  • Keep least-privilege enforcement

  • Review lineage before modifying objects

Monthly

  • Permission review with data owners

  • Audit policy approvals

  • Schema evolution reviews

Quarterly

  • Access certification

  • Tagging verification (e.g., PII, Restricted)

  • AI Agent resource permission review


6. Best Practices

Access Control

✔ Use Groups, never direct user grants
✔ Grant only required privileges (principle of least privilege)
✔ Use views for masking sensitive columns
✔ Use Metastore-level admins sparingly
✔ Delegate schema/table ownership to data owners (not IT)


Storage

✔ Use External Locations with IAM roles
✔ One S3/ADLS path per domain
✔ Disable direct bucket access; enforce UC controls


Automation

✔ Use service principals for jobs, dashboards, and agents
✔ Do not run jobs as human users
✔ Enforce cluster policies that restrict external hosts


AI / Genie Authorization

Genie can only access:

  • Catalogs/Schemas/Tables the agent identity has rights to

  • Notebooks the identity can EXECUTE

  • Volumes the identity can READ FILES / WRITE FILES

No privilege escalation is possible.


7. Example End-to-End Setup Script

-- Create catalog CREATE CATALOG sales; -- Schema CREATE SCHEMA sales.orders; -- Ownership GRANT OWNERSHIP ON CATALOG sales TO `sales-data-owner`; -- Readers/Writers GRANT USE CATALOG ON CATALOG sales TO `sales-readers`; GRANT USE SCHEMA ON SCHEMA sales.orders TO `sales-readers`; GRANT SELECT ON TABLE sales.orders.order_delta TO `sales-readers`; GRANT MODIFY ON TABLE sales.orders.order_delta TO `sales-writers`; GRANT CREATE TABLE ON SCHEMA sales.orders TO `sales-engineers`; -- Service principal (ETL) GRANT SELECT, MODIFY ON TABLE sales.orders.order_delta TO `etl-service-principal`;

8. Final Architecture Diagram (Text-Based)

+------------------------------+ | Identity Layer | | Users, Groups, SSO, SCIM | +---------------+--------------+ | v +-------------------------------------------+ | Unity Catalog Authorization | | Catalog → SchemaTable/ViewColumn | | External Locations → Volumes → Models | +----------------------+----------------------+ | v +-------------------+ +------------------+ +--------------------+ | Analytics & SQL | | ETL / ML Jobs | | AI/Genie Agents | | Warehouses | | Service Principals| | Notebooks/Functions | +-------------------+ +------------------+ +--------------------+

How IdP Groups Are Tied to Databricks Groups (Unity Catalog)

  🔗 How IdP Groups Are Tied to Databricks Groups (Unity Catalog) 🔑 Key Principle (Read This First) Databricks does NOT “map” IdP groups...