investigate-dataset

Name: investigate-dataset
Author: UKGovernmentBEIS/inspect_evals

$npx mdskill add UKGovernmentBEIS/inspect_evals/investigate-dataset

Explore HuggingFace, CSV, and JSON datasets for structure and quality.

Analyzes raw data files to reveal schema, fields, and integrity issues.
Integrates with HuggingFace Hub, pandas, and inspect_ai libraries.
Selects analysis methods based on detected file format and content patterns.
Outputs structured reports detailing discovered columns and data anomalies.

SKILL.md

.github/skills/investigate-datasetView on GitHub ↗

---
name: investigate-dataset
description: Investigate datasets from HuggingFace, CSV, or JSON files to understand their structure, fields, and data quality. Trigger whenever you need to explore or inspect a dataset yourself without using pre-written scripts.
---

# Investigate Dataset

This workflow helps you explore and understand datasets used in evaluations. It covers HuggingFace datasets, CSV files, and JSON/JSONL files.

## Key Concepts

For detailed information on Inspect's dataset types (`datasets.Dataset` vs `inspect_ai.dataset.Dataset`), the `hf_dataset()` pipeline, caching behaviour, and test utilities, see `references/inspect-dataset-patterns.md`.

### Common Patterns in Evals

Evals typically define:

- `DATASET_PATH`: HuggingFace repo path (e.g., `"qiaojin/PubMedQA"`)
- `DATASET_REVISION`: Optional git revision/tag for reproducibility
- `record_to_sample()`: Function converting raw records to `Sample` objects

## Prerequisites

- Access to the evaluation code to find dataset configuration
- Python environment with `datasets`, `pandas`, and `inspect_ai` installed

## Steps

### 1. Identify the Dataset Source

Look for these patterns in the evaluation code:

```python
# HuggingFace dataset
DATASET_PATH = "org/dataset-name"
DATASET_REVISION = "v1.0"  # optional
hf_dataset(path=DATASET_PATH, name="subset", split="train", ...)

# CSV dataset
csv_dataset("path/to/file.csv", ...)
load_csv_dataset("https://example.com/file.csv", eval_name="myeval", ...)

# JSON/JSONL dataset
json_dataset("path/to/file.json", ...)
load_json_dataset("https://example.com/file.jsonl", eval_name="myeval", ...)
```

### 2. Load the Raw Dataset

For investigation, load the raw data directly (not through Inspect's `sample_fields` transformation). Use standard `datasets.load_dataset()` for HuggingFace, `pd.read_csv()` for CSV, or `pd.read_json()` for JSON/JSONL. For gated datasets, ensure `HF_TOKEN` is set or run `huggingface-cli login`.

### 3. Explore Structure and Quality

Use standard pandas/datasets methods to explore:

- **Schema**: `ds.features` (HF) or `df.dtypes` (pandas)
- **Shape**: `len(ds)`, `ds.column_names` (HF) or `df.info()`, `df.columns` (pandas)
- **Sample data**: `ds[:3]` (HF) or `df.head()` (pandas)
- **Missing values**: Check for `None`, empty strings, empty lists
- **Duplicates**: Check ID uniqueness if an ID field exists
- **Value distributions**: `value_counts()` for categorical columns, length stats for text fields

For converting an Inspect `Dataset` (which has no `.to_pandas()`) to a DataFrame, see `references/inspect-dataset-patterns.md`.

### 4. Understand the Sample Conversion

Look at the `record_to_sample` function to understand how raw data maps to Inspect samples. Key questions:

- Which fields become `input`? Are they combined/formatted?
- What is the `target` format? (letter, text, JSON, etc.)
- Are there `choices` for multiple choice?
- What goes into `metadata`?
- Are any records filtered out?

### 5. Test the Inspect Loading Pipeline

See `references/inspect-dataset-patterns.md` for the pattern to load through Inspect's `hf_dataset()` and verify sample conversion works correctly.

## Quick Reference Commands

```bash
# View HF dataset info without downloading
uv run python -c "from datasets import load_dataset_builder; b = load_dataset_builder('org/name'); print(b.info)"

# List available configs/subsets
uv run python -c "from datasets import get_dataset_config_names; print(get_dataset_config_names('org/name'))"

# List available splits
uv run python -c "from datasets import load_dataset; print(load_dataset('org/name', split=None).keys())"
```

## Caching and Troubleshooting

For cache locations (HuggingFace native, Inspect AI, Inspect Evals), force re-download commands, and test utilities, see `references/inspect-dataset-patterns.md`.

- **Gated dataset**: Run `huggingface-cli login` or set `HF_TOKEN`
- **Rate limited**: The `hf_dataset` wrapper in `inspect_evals.utils.huggingface` has built-in retry with backoff
- **Large dataset**: Use `streaming=True` or `split="train[:1000]"` for sampling
- **Missing revision**: Check the dataset's "Files and versions" tab on HuggingFace

More from UKGovernmentBEIS/inspect_evals

Skill	Description
build-repo-context	Crawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request.
check-trajectories-workflow	Use Inspect Scout to analyze agent trajectories from evaluation log files. Runs default and custom scanners to detect external failures, formatting issues, reward hacking, and ethical refusals. Use when user asks to check/analyze agent trajectories. Trigger when the user asks you to run the "Check Agent Trajectories" workflow.
ci-maintenance-workflow	CI and GitHub Actions maintenance workflows — fix a failing test from a CI URL, fix a failing smoke test, add @pytest.mark.slow markers to slow tests, or review a PR against agent-checkable standards. Use when user asks to fix a failing test, fix a smoke test, mark slow tests, or review a PR. Trigger when the user asks you to run the "Write a PR For A Failing Test", "Fix A Failing Smoke Test", "Mark Slow Tests", or "Review PR According to Agent-Checkable Standards" workflow.
code-quality-fix-all	Fix code quality issues identified in a code quality review stored in agent_artefacts/code_quality/<topic>/. Systematically addresses issues found by the code-quality-review-all skill for ANY code quality topic, with validation and testing at each step. Use when user asks to fix issues from a code quality review, or asks to fix issues from agent_artefacts/code_quality/<topic>.
code-quality-review-all	Review all evaluations in the repository against a single code quality standard. Checks ALL evals against ONE standard for periodic quality reviews. Use when user asks to review/audit/check all evaluations for a specific topic or standard. Do NOT use for reviewing a single eval (use eval-quality-workflow instead) or for test coverage (use ensure-test-coverage instead).
create-eval	Redirect to the inspect-evals-template for creating new evaluations. New evals are no longer created in this repository — they live in standalone repos. Use when user asks to create/implement/build a new evaluation.
ensure-test-coverage	Ensure test coverage for a single evaluation - both reviewing existing tests and creating missing ones. Analyzes testable components, checks tests against repository conventions, reports coverage gaps, and creates or improves tests. Use when user asks to check/review/create/add/ensure tests for an eval. Use whenever you are asked to review an evaluation that contains tests, or whenever you need to write a suite of tests. Do NOT use for fixing a specific failing CI test (use ci-maintenance-workflow instead).
eval-quality-workflow	Fix or review a single evaluation against all EVALUATION_CHECKLIST.md standards. Use "fix" mode to refactor an eval into compliance, or "review" mode to assess compliance without making changes. Use when user asks to fix, review, or check an evaluation's quality. Trigger when the user asks you to run the "Fix An Evaluation" or "Review An Evaluation" workflow. Do NOT use for reviewing ALL evals against a single code quality standard (use code-quality-review-all instead).
eval-report-workflow	Create an evaluation report for a README by selecting models, estimating costs, running evaluations, and formatting results tables. Use when user asks to make/create/generate an evaluation report. Trigger when the user asks you to run the "Make An Evaluation Report" workflow.
eval-validity-review	Review a single evaluation's validity — whether its claims hold up, whether its name is accurate, whether samples can be both succeeded and failed at, and whether scoring measures ground truth. Use when user asks to check validity of an eval, or as part of the Master Checklist workflow. Do NOT use for code quality or test coverage (use eval-quality-workflow or ensure-test-coverage instead).