Authorization Using Unity Catalog Security Model
Author: Harvinder Singh — Resident Solution Architect, Databricks
1. Overview
Unity Catalog (UC) is Databricks’ centralized governance and authorization layer for all data and AI assets across the Lakehouse.
It enforces fine-grained, secure access control on:
-
Catalogs, Schemas, Tables, Views
-
Volumes & Files
-
Functions & Models
-
Clusters, SQL Warehouses, External Locations
-
AI/ML assets, Feature Tables, Vector Indexes
-
Databricks Agents (Genie)
This document defines:
-
Authorization architecture
-
Role-based access control (RBAC) structure
-
Identity and resource model
-
Step-by-step implementation
-
Best practices for enterprise deployment
-
Operational processes (audit, lineage, monitoring)
2. Authorization Architecture
Unity Catalog Authorization operates at 4 layers:
2.1 Identity Layer
Identity is managed through:
✔ SCIM Provisioning
✔ SSO (Okta, Azure AD, Ping, ADFS)
✔ Databricks Account Console
2.2 Resource Layer
Unity Catalog governs:
| Resource Type | Examples |
|---|
| Metastore | Unified governance boundary |
| Catalog | finance, sales, engineering |
| Schema | sales.crm, finance.gl |
| Table / View | Managed or External Delta Tables |
| Volumes | Unstructured files |
| Functions | SQL/Python functions |
| Models | MLflow models |
| Vector Index | For RAG/AI |
| External Locations / Storage Credentials | S3/ADLS locations |
| Clean Rooms | Cross-organization sharing |
2.3 Privileges (Authorization Rules)
Privileges define what identity can perform on a resource:
High-level privileges
-
USE CATALOG
-
USE SCHEMA
-
SELECT (read rows)
-
MODIFY (update, delete, merge)
-
CREATE TABLE
-
CREATE FUNCTION
-
EXECUTE (for models/functions/AI tasks)
Advanced privileges
2.4 Enforcement Engine
Unity Catalog enforces authorization:
-
At SQL execution time
-
At API call time
-
At Notebook execution time
-
Inside Genie / Agents
-
Through lineage and audit logs
-
Across all workspaces connected to the same metastore
Because UC is part of the Databricks control plane, enforcement is real-time and cannot be bypassed.
3. RBAC Design Patterns
Below are the best-practice authorization models.
3.1 Layered RBAC Structure
Administrative Roles
| Role | Purpose |
|---|
| Account Admin | Controls accounts, workspaces, identity |
| Metastore Admin | Manages governance boundary |
| Data Steward | Applies tags, controls lineage |
| Data Owners | Own schemas/tables |
Data Access Roles
| Role | Privileges |
|---|
| Data Reader | SELECT on tables/views |
| Data Writer | SELECT, MODIFY, INSERT, DELETE |
| Data Engineer | CREATE TABLE, MODIFY |
| BI Analyst | SELECT + USE SCHEMA |
| ML Engineer | EXECUTE models + SELECT |
Job / Service Roles
| Identity | Use Case |
|---|
| Workflow Service Principal | ETL jobs |
| Dashboard Service Principal | Materialized view refresh |
| Genie Agent Principal | Agentic workflows |
Each service principal receives only the minimum privileges needed.
4. Detailed Implementation Steps (Databricks)
This section walks through exact steps to implement authorization in UC.
STEP 1: Enable Unity Catalog and Create a Metastore
-
Log into Databricks Account Console
-
Create a Metastore
-
Assign root storage (S3/ADLS secure path)
-
Assign Metastore to one or more workspaces
STEP 2: Configure Identity Sync (Okta / Azure AD)
Enable SCIM provisioning
-
Sync users & groups to Databricks
-
Assign proper roles such as data-analysts, bi-users, etl-jobs
Validate Groups:
STEP 3: Create Catalogs & Schemas and Assign Ownership
Catalog ownership allows delegated grants.
STEP 4: Define Access Roles
Example groups (from SCIM):
-
finance-readers
-
finance-writers
-
finance-engineers
-
etl-service-principals
STEP 5: Grant Privileges (RBAC Implementation)
Catalog Level
Schema Level
Table Level
Volume Level
Function & Model Level
STEP 6: Implement Data Masking / Row-Level Security (Optional)
Example: PII Masking
Then grant:
STEP 7: Configure External Locations (Secure Access to S3/ADLS)
Create a storage credential (IAM role or SAS token):
Create external location:
Grant access to engineers:
STEP 8: Configure Job/Workflow Authorization
Service principal access:
Workflow must run under the service principal identity.
STEP 9: Audit Logging, Lineage, and Monitoring
Databricks automatically logs:
-
Permission changes
-
Data accesses
-
Notebook executions
-
Model inferences
-
Workflow runs
-
Genie agent actions
Enable audit log delivery:
AWS → S3 bucket
Azure → Monitor / EventHub
Query audit logs:
Lineage Tracking
Unity Catalog automatically tracks lineage across:
-
SQL
-
Notebooks
-
Jobs
-
Delta Live Tables
-
ML pipelines
-
Databricks Agents
No extra configuration needed.
5. Operational Governance Model
Daily Operations
-
Assign users to appropriate SCIM groups
-
Keep least-privilege enforcement
-
Review lineage before modifying objects
Monthly
Quarterly
-
Access certification
-
Tagging verification (e.g., PII, Restricted)
-
AI Agent resource permission review
6. Best Practices
Access Control
✔ Use Groups, never direct user grants
✔ Grant only required privileges (principle of least privilege)
✔ Use views for masking sensitive columns
✔ Use Metastore-level admins sparingly
✔ Delegate schema/table ownership to data owners (not IT)
Storage
✔ Use External Locations with IAM roles
✔ One S3/ADLS path per domain
✔ Disable direct bucket access; enforce UC controls
Automation
✔ Use service principals for jobs, dashboards, and agents
✔ Do not run jobs as human users
✔ Enforce cluster policies that restrict external hosts
AI / Genie Authorization
Genie can only access:
-
Catalogs/Schemas/Tables the agent identity has rights to
-
Notebooks the identity can EXECUTE
-
Volumes the identity can READ FILES / WRITE FILES
No privilege escalation is possible.
7. Example End-to-End Setup Script
8. Final Architecture Diagram (Text-Based)