sql-queries
$
npx mdskill add anthropics/knowledge-work-plugins/sql-queriesWrites and optimizes SQL queries across major data warehouse dialects like Snowflake and BigQuery.
- Helps with writing, optimizing, translating, and building complex analytical queries.
- Integrates with data warehouse services such as PostgreSQL, Databricks, and Snowflake.
- Decides recommendations based on dialect-specific syntax and performance best practices.
- Presents results as formatted SQL code with examples and tips for readability.
SKILL.md
.github/skills/sql-queriesView on GitHub ↗
---
name: sql-queries
description: Write correct, performant SQL across all major data warehouse dialects (Snowflake, BigQuery, Databricks, PostgreSQL, etc.). Use when writing queries, optimizing slow SQL, translating between dialects, or building complex analytical queries with CTEs, window functions, or aggregations.
user-invocable: false
---
# SQL Queries Skill
Write correct, performant, readable SQL across all major data warehouse dialects.
## Dialect-Specific Reference
### PostgreSQL (including Aurora, RDS, Supabase, Neon)
**Date/time:**
```sql
-- Current date/time
CURRENT_DATE, CURRENT_TIMESTAMP, NOW()
-- Date arithmetic
date_column + INTERVAL '7 days'
date_column - INTERVAL '1 month'
-- Truncate to period
DATE_TRUNC('month', created_at)
-- Extract parts
EXTRACT(YEAR FROM created_at)
EXTRACT(DOW FROM created_at) -- 0=Sunday
-- Format
TO_CHAR(created_at, 'YYYY-MM-DD')
```
**String functions:**
```sql
-- Concatenation
first_name || ' ' || last_name
CONCAT(first_name, ' ', last_name)
-- Pattern matching
column ILIKE '%pattern%' -- case-insensitive
column ~ '^regex_pattern$' -- regex
-- String manipulation
LEFT(str, n), RIGHT(str, n)
SPLIT_PART(str, delimiter, position)
REGEXP_REPLACE(str, pattern, replacement)
```
**Arrays and JSON:**
```sql
-- JSON access
data->>'key' -- text
data->'nested'->'key' -- json
data#>>'{path,to,key}' -- nested text
-- Array operations
ARRAY_AGG(column)
ANY(array_column)
array_column @> ARRAY['value']
```
**Performance tips:**
- Use `EXPLAIN ANALYZE` to profile queries
- Create indexes on frequently filtered/joined columns
- Use `EXISTS` over `IN` for correlated subqueries
- Partial indexes for common filter conditions
- Use connection pooling for concurrent access
---
### Snowflake
**Date/time:**
```sql
-- Current date/time
CURRENT_DATE(), CURRENT_TIMESTAMP(), SYSDATE()
-- Date arithmetic
DATEADD(day, 7, date_column)
DATEDIFF(day, start_date, end_date)
-- Truncate to period
DATE_TRUNC('month', created_at)
-- Extract parts
YEAR(created_at), MONTH(created_at), DAY(created_at)
DAYOFWEEK(created_at)
-- Format
TO_CHAR(created_at, 'YYYY-MM-DD')
```
**String functions:**
```sql
-- Case-insensitive by default (depends on collation)
column ILIKE '%pattern%'
REGEXP_LIKE(column, 'pattern')
-- Parse JSON
column:key::string -- dot notation for VARIANT
PARSE_JSON('{"key": "value"}')
GET_PATH(variant_col, 'path.to.key')
-- Flatten arrays/objects
SELECT f.value FROM table, LATERAL FLATTEN(input => array_col) f
```
**Semi-structured data:**
```sql
-- VARIANT type access
data:customer:name::STRING
data:items[0]:price::NUMBER
-- Flatten nested structures
SELECT
t.id,
item.value:name::STRING as item_name,
item.value:qty::NUMBER as quantity
FROM my_table t,
LATERAL FLATTEN(input => t.data:items) item
```
**Performance tips:**
- Use clustering keys on large tables (not traditional indexes)
- Filter on clustering key columns for partition pruning
- Set appropriate warehouse size for query complexity
- Use `RESULT_SCAN(LAST_QUERY_ID())` to avoid re-running expensive queries
- Use transient tables for staging/temp data
---
### BigQuery (Google Cloud)
**Date/time:**
```sql
-- Current date/time
CURRENT_DATE(), CURRENT_TIMESTAMP()
-- Date arithmetic
DATE_ADD(date_column, INTERVAL 7 DAY)
DATE_SUB(date_column, INTERVAL 1 MONTH)
DATE_DIFF(end_date, start_date, DAY)
TIMESTAMP_DIFF(end_ts, start_ts, HOUR)
-- Truncate to period
DATE_TRUNC(created_at, MONTH)
TIMESTAMP_TRUNC(created_at, HOUR)
-- Extract parts
EXTRACT(YEAR FROM created_at)
EXTRACT(DAYOFWEEK FROM created_at) -- 1=Sunday
-- Format
FORMAT_DATE('%Y-%m-%d', date_column)
FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', ts_column)
```
**String functions:**
```sql
-- No ILIKE, use LOWER()
LOWER(column) LIKE '%pattern%'
REGEXP_CONTAINS(column, r'pattern')
REGEXP_EXTRACT(column, r'pattern')
-- String manipulation
SPLIT(str, delimiter) -- returns ARRAY
ARRAY_TO_STRING(array, delimiter)
```
**Arrays and structs:**
```sql
-- Array operations
ARRAY_AGG(column)
UNNEST(array_column)
ARRAY_LENGTH(array_column)
value IN UNNEST(array_column)
-- Struct access
struct_column.field_name
```
**Performance tips:**
- Always filter on partition columns (usually date) to reduce bytes scanned
- Use clustering for frequently filtered columns within partitions
- Use `APPROX_COUNT_DISTINCT()` for large-scale cardinality estimates
- Avoid `SELECT *` -- billing is per-byte scanned
- Use `DECLARE` and `SET` for parameterized scripts
- Preview query cost with dry run before executing large queries
---
### Redshift (Amazon)
**Date/time:**
```sql
-- Current date/time
CURRENT_DATE, GETDATE(), SYSDATE
-- Date arithmetic
DATEADD(day, 7, date_column)
DATEDIFF(day, start_date, end_date)
-- Truncate to period
DATE_TRUNC('month', created_at)
-- Extract parts
EXTRACT(YEAR FROM created_at)
DATE_PART('dow', created_at)
```
**String functions:**
```sql
-- Case-insensitive
column ILIKE '%pattern%'
REGEXP_INSTR(column, 'pattern') > 0
-- String manipulation
SPLIT_PART(str, delimiter, position)
LISTAGG(column, ', ') WITHIN GROUP (ORDER BY column)
```
**Performance tips:**
- Design distribution keys for collocated joins (DISTKEY)
- Use sort keys for frequently filtered columns (SORTKEY)
- Use `EXPLAIN` to check query plan
- Avoid cross-node data movement (watch for DS_BCAST and DS_DIST)
- `ANALYZE` and `VACUUM` regularly
- Use late-binding views for schema flexibility
---
### Databricks SQL
**Date/time:**
```sql
-- Current date/time
CURRENT_DATE(), CURRENT_TIMESTAMP()
-- Date arithmetic
DATE_ADD(date_column, 7)
DATEDIFF(end_date, start_date)
ADD_MONTHS(date_column, 1)
-- Truncate to period
DATE_TRUNC('MONTH', created_at)
TRUNC(date_column, 'MM')
-- Extract parts
YEAR(created_at), MONTH(created_at)
DAYOFWEEK(created_at)
```
**Delta Lake features:**
```sql
-- Time travel
SELECT * FROM my_table TIMESTAMP AS OF '2024-01-15'
SELECT * FROM my_table VERSION AS OF 42
-- Describe history
DESCRIBE HISTORY my_table
-- Merge (upsert)
MERGE INTO target USING source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
```
**Performance tips:**
- Use Delta Lake's `OPTIMIZE` and `ZORDER` for query performance
- Leverage Photon engine for compute-intensive queries
- Use `CACHE TABLE` for frequently accessed datasets
- Partition by low-cardinality date columns
---
## Common SQL Patterns
### Window Functions
```sql
-- Ranking
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC)
RANK() OVER (PARTITION BY category ORDER BY revenue DESC)
DENSE_RANK() OVER (ORDER BY score DESC)
-- Running totals / moving averages
SUM(revenue) OVER (ORDER BY date_col ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as running_total
AVG(revenue) OVER (ORDER BY date_col ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7d
-- Lag / Lead
LAG(value, 1) OVER (PARTITION BY entity ORDER BY date_col) as prev_value
LEAD(value, 1) OVER (PARTITION BY entity ORDER BY date_col) as next_value
-- First / Last value
FIRST_VALUE(status) OVER (PARTITION BY user_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
LAST_VALUE(status) OVER (PARTITION BY user_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
-- Percent of total
revenue / SUM(revenue) OVER () as pct_of_total
revenue / SUM(revenue) OVER (PARTITION BY category) as pct_of_category
```
### CTEs for Readability
```sql
WITH
-- Step 1: Define the base population
base_users AS (
SELECT user_id, created_at, plan_type
FROM users
WHERE created_at >= DATE '2024-01-01'
AND status = 'active'
),
-- Step 2: Calculate user-level metrics
user_metrics AS (
SELECT
u.user_id,
u.plan_type,
COUNT(DISTINCT e.session_id) as session_count,
SUM(e.revenue) as total_revenue
FROM base_users u
LEFT JOIN events e ON u.user_id = e.user_id
GROUP BY u.user_id, u.plan_type
),
-- Step 3: Aggregate to summary level
summary AS (
SELECT
plan_type,
COUNT(*) as user_count,
AVG(session_count) as avg_sessions,
SUM(total_revenue) as total_revenue
FROM user_metrics
GROUP BY plan_type
)
SELECT * FROM summary ORDER BY total_revenue DESC;
```
### Cohort Retention
```sql
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', first_activity_date) as cohort_month
FROM users
),
activity AS (
SELECT
user_id,
DATE_TRUNC('month', activity_date) as activity_month
FROM user_activity
)
SELECT
c.cohort_month,
COUNT(DISTINCT c.user_id) as cohort_size,
COUNT(DISTINCT CASE
WHEN a.activity_month = c.cohort_month THEN a.user_id
END) as month_0,
COUNT(DISTINCT CASE
WHEN a.activity_month = c.cohort_month + INTERVAL '1 month' THEN a.user_id
END) as month_1,
COUNT(DISTINCT CASE
WHEN a.activity_month = c.cohort_month + INTERVAL '3 months' THEN a.user_id
END) as month_3
FROM cohorts c
LEFT JOIN activity a ON c.user_id = a.user_id
GROUP BY c.cohort_month
ORDER BY c.cohort_month;
```
### Funnel Analysis
```sql
WITH funnel AS (
SELECT
user_id,
MAX(CASE WHEN event = 'page_view' THEN 1 ELSE 0 END) as step_1_view,
MAX(CASE WHEN event = 'signup_start' THEN 1 ELSE 0 END) as step_2_start,
MAX(CASE WHEN event = 'signup_complete' THEN 1 ELSE 0 END) as step_3_complete,
MAX(CASE WHEN event = 'first_purchase' THEN 1 ELSE 0 END) as step_4_purchase
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
COUNT(*) as total_users,
SUM(step_1_view) as viewed,
SUM(step_2_start) as started_signup,
SUM(step_3_complete) as completed_signup,
SUM(step_4_purchase) as purchased,
ROUND(100.0 * SUM(step_2_start) / NULLIF(SUM(step_1_view), 0), 1) as view_to_start_pct,
ROUND(100.0 * SUM(step_3_complete) / NULLIF(SUM(step_2_start), 0), 1) as start_to_complete_pct,
ROUND(100.0 * SUM(step_4_purchase) / NULLIF(SUM(step_3_complete), 0), 1) as complete_to_purchase_pct
FROM funnel;
```
### Deduplication
```sql
-- Keep the most recent record per key
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY entity_id
ORDER BY updated_at DESC
) as rn
FROM source_table
)
SELECT * FROM ranked WHERE rn = 1;
```
## Error Handling and Debugging
When a query fails:
1. **Syntax errors**: Check for dialect-specific syntax (e.g., `ILIKE` not available in BigQuery, `SAFE_DIVIDE` only in BigQuery)
2. **Column not found**: Verify column names against schema -- check for typos, case sensitivity (PostgreSQL is case-sensitive for quoted identifiers)
3. **Type mismatches**: Cast explicitly when comparing different types (`CAST(col AS DATE)`, `col::DATE`)
4. **Division by zero**: Use `NULLIF(denominator, 0)` or dialect-specific safe division
5. **Ambiguous columns**: Always qualify column names with table alias in JOINs
6. **Group by errors**: All non-aggregated columns must be in GROUP BY (except in BigQuery which allows grouping by alias)
More from anthropics/knowledge-work-plugins
- accessibility-reviewRun a WCAG 2.1 AA accessibility audit on a design or page. Trigger with "audit accessibility", "check a11y", "is this accessible?", or when reviewing a design for color contrast, keyboard navigation, touch target size, or screen reader behavior before handoff.
- account-research"Research a company using Common Room data. Triggers on 'research [company]', 'tell me about [domain]', 'pull up signals for [account]', 'what's going on with [company]', or any account-level question."
- analyzeAnswer data questions -- from quick lookups to full analyses. Use when looking up a single metric, investigating what's driving a trend or drop, comparing segments over time, or preparing a formal data report for stakeholders.
- architectureCreate or evaluate an architecture decision record (ADR). Use when choosing between technologies (e.g., Kafka vs SQS), documenting a design decision with trade-offs and consequences, reviewing a system design proposal, or designing a new component from requirements and constraints.
- audit-supportSupport SOX 404 compliance with control testing methodology, sample selection, and documentation standards. Use when generating testing workpapers, selecting audit samples, classifying control deficiencies, or preparing for internal or external audits.
- brand-reviewReview content against your brand voice, style guide, and messaging pillars, flagging deviations by severity with specific before/after fixes. Use when checking a draft before it ships, when auditing copy for voice consistency and terminology, or when screening for unsubstantiated claims, missing disclaimers, and other legal flags.
- brand-voice-enforcement>
- briefGenerate contextual briefings for legal work — daily summary, topic research, or incident response. Use when starting your day and need a scan of legal-relevant items across email, calendar, and contracts, when researching a specific legal question across internal sources, or when a developing situation (data breach, litigation threat, regulatory inquiry) needs rapid context.
- build-dashboardBuild an interactive HTML dashboard with charts, filters, and tables. Use when creating an executive overview with KPI cards, turning query results into a shareable self-contained report, building a team monitoring snapshot, or needing multiple charts with filters in one browser-openable file.
- build-zoom-botBuild a Zoom meeting bot, recorder, or real-time media workflow. Use when joining meetings programmatically, processing live media or transcripts, or combining Meeting SDK, RTMS, and backend services.