databricks-unity-catalog

$npx mdskill add databricks/databricks-agent-skills/databricks-unity-catalog

Query Unity Catalog system tables and manage volume files.

  • Access audit logs, lineage, billing, and compute metrics.
  • Integrates with Unity Catalog system tables and volume APIs.
  • Executes queries based on user intent and data requirements.
  • Returns structured results for system tables and file lists.
SKILL.md
.github/skills/databricks-unity-catalogView on GitHub ↗
---
name: databricks-unity-catalog
description: "Unity Catalog system tables and volumes. Use when querying system tables (audit, lineage, billing) or working with volume file operations (upload, download, list files in /Volumes/)."
---

# Unity Catalog

Guidance for Unity Catalog system tables, volumes, and governance.

## When to Use This Skill

Use this skill when:
- Working with **volumes** (upload, download, list files in `/Volumes/`)
- Querying **lineage** (table dependencies, column-level lineage)
- Analyzing **audit logs** (who accessed what, permission changes)
- Monitoring **billing and usage** (DBU consumption, cost analysis)
- Tracking **compute resources** (cluster usage, warehouse metrics)
- Reviewing **job execution** (run history, success rates, failures)
- Analyzing **query performance** (slow queries, warehouse utilization)
- Profiling **data quality** (data profiling, drift detection, metric tables)

## Reference Files

| Topic | File | Description |
|-------|------|-------------|
| System Tables | [references/5-system-tables.md](references/5-system-tables.md) | Lineage, audit, billing, compute, jobs, query history |
| Volumes | [references/6-volumes.md](references/6-volumes.md) | Volume file operations, permissions, best practices |
| Data Profiling | [references/7-data-profiling.md](references/7-data-profiling.md) | Data profiling, drift detection, profile metrics |

## Quick Start

### Create Unity Catalog Objects (CLI)

**IMPORTANT**: Use `--json` for creating UC objects. Positional args vary by command and version.

```bash
# Create a catalog
databricks catalogs create my_catalog

# Create a schema  (args: NAME CATALOG_NAME — positional, name first)
databricks schemas create my_schema my_catalog

# Create a volume  (args: CATALOG_NAME SCHEMA_NAME NAME VOLUME_TYPE — catalog first)
databricks volumes create my_catalog my_schema my_volume MANAGED

# List catalogs, schemas, volumes
databricks catalogs list
databricks schemas list my_catalog
databricks volumes list my_catalog.my_schema
```

### Volume File Operations (CLI)

`databricks fs` requires the `dbfs:` scheme prefix even for UC Volume paths — without it the CLI treats the path as local filesystem and errors with `no such directory`.

```bash
# List files in a volume
databricks fs ls dbfs:/Volumes/catalog/schema/volume/path/

# Upload a directory's contents to a volume (-r copies contents, not the directory itself)
databricks fs cp -r --overwrite /tmp/data dbfs:/Volumes/catalog/schema/volume/dest

# Download a file from a volume
databricks fs cp dbfs:/Volumes/catalog/schema/volume/file.csv /tmp/file.csv

# Create a directory in a volume
databricks fs mkdirs dbfs:/Volumes/catalog/schema/volume/new_folder
```

### Enable System Tables Access

```sql
-- Grant access to system tables
GRANT USE CATALOG ON CATALOG system TO `data_engineers`;
GRANT USE SCHEMA ON SCHEMA system.access TO `data_engineers`;
GRANT SELECT ON SCHEMA system.access TO `data_engineers`;
```

### Common Queries

```sql
-- Table lineage: What tables feed into this table?
SELECT source_table_full_name, source_column_name
FROM system.access.table_lineage
WHERE target_table_full_name = 'catalog.schema.table'
  AND event_date >= current_date() - 7;

-- Audit: Recent permission changes
SELECT event_time, user_identity.email, action_name, request_params
FROM system.access.audit
WHERE action_name LIKE '%GRANT%' OR action_name LIKE '%REVOKE%'
ORDER BY event_time DESC
LIMIT 100;

-- Billing: DBU usage by workspace
SELECT workspace_id, sku_name, SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE usage_date >= current_date() - 30
GROUP BY workspace_id, sku_name;
```

## SQL Queries via CLI

Use `databricks experimental aitools tools query` for system table queries:

```bash
# Query lineage via CLI
databricks experimental aitools tools query --warehouse WAREHOUSE_ID "
  SELECT source_table_full_name, target_table_full_name
  FROM system.access.table_lineage
  WHERE event_date >= current_date() - 7
"
```

## Best Practices

1. **Filter by date** - System tables can be large; always use date filters
2. **Use appropriate retention** - Check your workspace's retention settings
3. **Grant minimal access** - System tables contain sensitive metadata
4. **Schedule reports** - Create scheduled queries for regular monitoring

## Related Skills

- **databricks-pipelines** - for pipelines that write to Unity Catalog tables
- **databricks-jobs** - for job execution data visible in system tables
- **[databricks-synthetic-data-gen](../databricks-synthetic-data-gen/SKILL.md)** - for generating data stored in Unity Catalog Volumes
- **[databricks-aibi-dashboards](../databricks-aibi-dashboards/SKILL.md)** - for building dashboards on top of Unity Catalog data

## Resources

- [Unity Catalog System Tables](https://docs.databricks.com/administration-guide/system-tables/)
- [Audit Log Reference](https://docs.databricks.com/administration-guide/account-settings/audit-logs.html)
More from databricks/databricks-agent-skills