Data analysis is a high-value use case for AI agents — but without a skill, the output is inconsistent. One run gives you a detailed statistical breakdown; the next gives you a paragraph of observations. A skill standardises the procedure: what to compute, how to structure the output, what format to return.
Here's a look at the most useful data analysis skills for AI coding agents in 2026.
What data skills actually do
A data analysis skill doesn't run queries itself — it defines the procedure the agent follows when you ask it to analyse data. That means:
- Which statistical measures to compute (mean, median, percentiles, outliers)
- How to describe distributions and skew
- What to flag as anomalous
- What format to return (table, JSON, prose summary)
- What to recommend as next steps
The consistency comes from the procedure being explicit and version-controlled rather than re-derived each time.
SQL query generation skills
SQL generation is where inconsistency causes real problems. An agent that writes slightly different query patterns each time — different join styles, different CTE usage, different aggregation approaches — produces a codebase that's hard to review and maintain.
A SQL generation skill specifies your team's conventions:
| Convention | Example specification |
|---|---|
| Join style | Always use explicit JOIN ON, never implicit WHERE joins |
| CTE vs subquery | Use CTEs for anything referenced more than once |
| Aggregations | Always alias aggregate columns explicitly |
| Filters | Date filters always use half-open intervals (>= start, < end) |
| Naming | Snake_case, no abbreviations, table alias matches first letter |
A skill that encodes these generates queries that look like the rest of your codebase — not like they came from five different agents.
Data exploration skills
Exploration skills give the agent a procedure for profiling a dataset before analysis:
# data-profiler
## Purpose
Profile a dataset to understand its shape, quality, and statistical properties
before analysis. Use when given a CSV, dataframe, or query result to examine.
## Instructions
1. Count rows and columns
2. For each column: data type, null count, unique value count
3. For numeric columns: min, max, mean, median, std dev, p25/p75/p95
4. For string columns: top 10 values by frequency
5. Flag columns with >5% nulls as data quality issues
6. Flag columns where >90% of values are the same (low cardinality)
7. Identify likely primary key candidates (unique, non-null)
## Output format
## Dataset overview
Rows: N | Columns: N | Memory: ~X MB
## Column profiles
| Column | Type | Nulls | Unique | Notes |
[table rows]
## Data quality flags
[list issues found]
## Suggested next steps
[2-3 specific analysis directions based on what was found]
Pipeline audit skills
Data pipelines degrade silently — a schema change upstream, a new null pattern in source data, a business rule that changed. A pipeline audit skill gives the agent a procedure for checking pipeline health on demand.
What a pipeline audit skill covers:
- Row count trends (is today's volume within expected range?)
- Null rates by column (are new nulls appearing?)
- Duplicate detection (are records being written more than once?)
- Schema drift (are column types or names changing?)
- Freshness (when did the most recent record arrive?)
Statistical summary skills
For reporting and stakeholder communication, a statistical summary skill tells the agent what level of detail to include, what to omit, and how to frame findings:
| Audience | Skill should specify |
|---|---|
| Technical (data team) | Full distribution, outlier analysis, correlation matrix |
| Product | Key metrics only, comparison to prior period, trend direction |
| Executive | 3-5 headline numbers, anomalies, recommendation |
Different output formats for the same underlying data — controlled by which skill is active.
Finding data skills in the directory
Search for what you need:
npx mdskill search "data analysis"
npx mdskill search "sql generation"
npx mdskill search "data quality"
Or browse the data category for top-rated skills with security scores.
npx mdskill add owner/repo/skill-name
Building a custom data skill
If your team works with specific schemas, databases, or reporting formats, a custom skill is worth the investment. Encode your conventions — query style, output format, the specific metrics your stakeholders care about.
See how to build an agent skill for the full process.
What's next?
- Browse data skills in the directory
- Read about MCP vs SKILL.md — MCP database connectors and SKILL.md procedures work well together
- Build a custom SQL skill for your team's conventions