exploring-data

$npx mdskill add oaustegard/claude-skills/exploring-data

Analyze uploaded datasets and generate interactive reports instantly.

  • Handles CSV, Excel, JSON, and Parquet file uploads for immediate profiling.
  • Depends on ydata-profiling library for statistical analysis and visualization.
  • Executes minimal or full analysis modes based on user request parameters.
  • Delivers interactive HTML reports and machine-readable JSON summaries.

SKILL.md

.github/skills/exploring-dataView on GitHub ↗
---
name: exploring-data
description: Exploratory data analysis using ydata-profiling. Use when users upload .csv/.xlsx/.json/.parquet files or request "explore data", "analyze dataset", "EDA", "profile data". Generates interactive HTML or JSON reports with statistics, visualizations, correlations, and quality alerts.
metadata:
  version: 0.0.3
---

# Exploring Data

## Workflow

### 1. Check if installed (instant)
```bash
bash /mnt/skills/user/exploring-data/scripts/check_install.sh
```
Returns: `installed` or `not_installed`

### 2. Install if needed (one-time, ~19s)
```bash
if [ "$(bash check_install.sh)" = "not_installed" ]; then
    bash /mnt/skills/user/exploring-data/scripts/install_ydata.sh
fi
```

### 3. Run analysis (always generates JSON + HTML by default)
```bash
bash /mnt/skills/user/exploring-data/scripts/analyze.sh <filepath> [minimal|full] [html|json]
```

**Defaults:** minimal + html (also generates JSON)

**Output:**
- `eda_report.html` - Interactive report for user
- `eda_report.json` - Machine-readable for Claude analysis

### 4. If Claude needs to analyze (user asks "what do you think?" etc.)
```bash
python /mnt/skills/user/exploring-data/scripts/summarize_insights.py /mnt/user-data/outputs/eda_report.json
```

**Reads:** `eda_report.json` (comprehensive ydata output)  
**Writes:** `eda_insights_summary.md` (condensed for Claude)  
**Outputs to stdout:** Formatted markdown summary

Claude should read the stdout markdown summary, NOT the full JSON report.

## Invocation Examples

```bash
# Standard workflow (user views HTML)
bash analyze.sh /mnt/user-data/uploads/data.csv
# Produces: eda_report.html + eda_report.json
# Link user to: computer:///mnt/user-data/outputs/eda_report.html

# User asks Claude to analyze
bash analyze.sh /mnt/user-data/uploads/data.csv
python summarize_insights.py /mnt/user-data/outputs/eda_report.json
# Claude reads the stdout markdown summary
# Claude can then provide analysis based on patterns/insights

# Full mode for comprehensive analysis
bash analyze.sh /mnt/user-data/uploads/data.csv full

# JSON-only output (skip HTML generation)
bash analyze.sh /mnt/user-data/uploads/data.csv minimal json
```

## Modes

**Minimal (default, 5-10s):**
Dataset overview, variable analysis, correlations, missing values, alerts

**Full (10-20s):**
Everything in minimal + scatter matrices, sample data, character analysis, more visualizations

## User Triggers for Full Mode
"comprehensive analysis", "detailed EDA", "full profiling", "deep analysis"

Otherwise use minimal.

More from oaustegard/claude-skills

SkillDescription
accessing-github-reposGitHub repository access in containerized environments using REST API and credential detection. Use when git clone fails, or when accessing private repos/writing files via API.
api-credentialsSecurely manages API credentials for multiple providers (Anthropic Claude, Google Gemini, GitHub). Use when skills need to access stored API keys for external service invocations.
asking-questionsGuidance for asking clarifying questions when user requests are ambiguous, have multiple valid approaches, or require critical decisions. Use when implementation choices exist that could significantly affect outcomes.
browsing-blueskyBrowse Bluesky content via API and firehose - search posts, fetch user activity, sample trending topics, read feeds and lists, analyze and categorize accounts. Supports authenticated access for personalized feeds. Use for Bluesky research, user monitoring, trend analysis, feed reading, firehose sampling, account categorization.
building-github-indexGenerate progressive disclosure indexes for GitHub repositories to use as Claude project knowledge. Use when setting up projects referencing external documentation, creating searchable indexes of technical blogs or knowledge bases, combining multiple repos into one index, or when user mentions "index", "github repo", "project knowledge", or "documentation reference".
categorizing-bsky-accountsAnalyze and categorize Bluesky accounts by topic using keyword extraction. Use when users mention Bluesky account analysis, following/follower lists, topic discovery, account curation, or network analysis.
chartingSelect the right Python charting library (seaborn, matplotlib, graphviz) and produce publication-quality static visualizations. Use when creating charts, plots, graphs, diagrams, heatmaps, visualizations from data, or when choosing between matplotlib/seaborn/graphviz. Also triggers for network diagrams, flowcharts, dependency trees, state machines, and entity-relationship diagrams. For interactive browser-rendered charts or uploaded data exploration, defer to charting-vega-lite instead.
charting-vega-liteCreate interactive data visualizations using Vega-Lite declarative JSON grammar. Supports 20+ chart types (bar, line, scatter, histogram, boxplot, grouped/stacked variations, etc.) via templates and programmatic builders. Use when users upload data for charting, request specific chart types, or mention visualizations. Produces portable JSON specs with inline data islands that work in Claude artifacts and can be adapted for production.
check-toolsValidates development tool installations across Python, Node.js, Java, Go, Rust, C/C++, Git, and system utilities. Use when verifying environments or troubleshooting dependencies.
cloning-projectExports project instructions and knowledge files from the current Claude project. Use when users want to clone, copy, backup, or export a project's configuration and files.