ngs-bcl-to-fastq

Name: ngs-bcl-to-fastq
Author: openai/plugins

$npx mdskill add openai/plugins/ngs-bcl-to-fastq

Validate Illumina BCL runs and convert to FASTQ with demux planning.

Handles folder validation, demux planning, and conversion execution.
Depends on bcl-convert, bcl2fastq, and ngs_preflight scripts.
Decides tool usage based on license status and legacy compatibility.
Delivers metrics, license boundaries, and installation plans.

SKILL.md

.github/skills/ngs-bcl-to-fastqView on GitHub ↗

---
name: ngs-bcl-to-fastq
description: Validate Illumina BCL run folders and sample sheets, plan demultiplexing, review index/UMI/lane choices, run BCL-to-FASTQ conversion, and interpret demux metrics while surfacing license/download boundaries.
---

# BCL To FASTQ

Use this skill when the input is an Illumina BCL run folder or the user asks to demultiplex a sequencing run. This is a deep demultiplexing and run-validation skill, not only a command wrapper.

## Essential Inputs

Confirm:

- run folder path with `RunInfo.xml`
- sample sheet path and format
- output directory
- instrument/run metadata from `RunInfo.xml` and `RunParameters.xml`
- lane handling: split by lane or combine lanes
- index mismatch tolerance
- index read structure and dual-index orientation
- UMI layout, if any
- whether adapter trimming/masking should happen during conversion
- whether undetermined reads and demultiplexing metrics should be reviewed before downstream analysis

## Public Tool Boundary

Prefer `bcl-convert` if it is already installed. It is free for local use but proprietary and RPM-distributed by Illumina, so do not auto-download without explicit user approval.

Legacy `bcl2fastq` may exist in older environments. Use it only when BCL Convert is unavailable or the run requires legacy compatibility.

## Preflight

```bash
python plugins/ngs-analysis/scripts/ngs_preflight.py --pipeline bcl_to_fastq --emit-install-plan
```

Also check run-folder structure:

```bash
test -f /path/to/run/RunInfo.xml
test -f /path/to/SampleSheet.csv
find /path/to/run -maxdepth 4 -type d -name BaseCalls
```

## Local Execution Package

Use the plugin-owned runner when the user provides a local run folder and sample sheet:

```bash
python plugins/ngs-analysis/scripts/run_bcl_to_fastq.py \
--run-folder /path/to/run \
--sample-sheet /path/to/SampleSheet.csv \
--output-directory /path/to/fastq_out
```

Add `--execute` only when conversion is requested. The runner validates `RunInfo.xml`, optional `RunParameters.xml`, the BaseCalls directory, sample-sheet rows, duplicate lane/index combinations, and index length compatibility. With `--execute`, it uses installed `bcl-convert`, then legacy `bcl2fastq` if available; if neither exists, it records the blocker instead of downloading proprietary software.

## Validation Checklist

Before conversion, validate:

- `RunInfo.xml` exists and its read structure matches the expected sequencing design.
- `SampleSheet.csv` exists, is the intended version, and has no duplicate sample/index combinations within each lane.
- Index sequence lengths match the index reads and any trimming/masking requested by the sample sheet.
- Dual-index orientation is explicit for the instrument and library prep; do not infer i5 orientation from filenames.
- UMI bases are assigned to the intended read or index read and carried through to FASTQ headers or output metadata as needed.
- Lane-splitting, sample-name normalization, and output directory behavior are agreed before running.
- Disk space is sufficient for output FASTQs, reports, and temporary files.

## Kickoff Pattern

First produce a preflight plan with paths and sample sheet validation. Then run conversion only after the user confirms:

```bash
bcl-convert \
--bcl-input-directory /path/to/run \
--output-directory /path/to/fastq_out \
--sample-sheet /path/to/SampleSheet.csv
```

## Metrics Review

After conversion, inspect and report:

- total clusters, clusters passing filter, and yield by lane
- percent assigned by sample and percent undetermined by lane
- top undetermined index sequences when available
- per-sample FASTQ counts and read-pair consistency
- unexpected index hopping, barcode collision, or sample-sheet mismatch signals

Record software version, command, sample sheet checksum, run-folder path, output path, and conversion metrics. Do not start downstream analysis until severe demultiplexing anomalies are surfaced.

More from openai/plugins

Skill	Description
accessibility-and-inclusive-visualization	Make data visualizations accessible and inclusive. Use when the user needs chart or diagram accessibility guidance, text alternatives for complex visuals, color and contrast review, keyboard support, reduced-motion behavior for animation or parallax, or an accessibility QA workflow for exported figures, UML-like diagrams, and dashboards.
agent-browser	Browser automation CLI for AI agents. Use when the user needs to interact with websites, verify dev server output, test web apps, navigate pages, fill forms, click buttons, take screenshots, extract data, or automate any browser task. Also triggers when a dev server starts so you can verify it visually.
agent-browser-verify	Automated browser verification for dev servers. Triggers when a dev server starts to run a visual gut-check with agent-browser — verifies the page loads, checks for console errors, validates key UI elements, and reports pass/fail before continuing.
agents-sdk	Build AI agents on Cloudflare Workers using the Agents SDK. Load when creating stateful agents, durable workflows, real-time WebSocket apps, scheduled tasks, MCP servers, or chat applications. Covers Agent class, state management, callable RPC, Workflows integration, and React hooks. Biases towards retrieval from Cloudflare docs over pre-trained knowledge.
ai-elements	AI Elements component library guidance — pre-built React components for AI interfaces built on shadcn/ui. Use when building chat UIs, message displays, tool call rendering, streaming responses, reasoning panels, or any AI-native interface with the AI SDK.
ai-gateway	Vercel AI Gateway expert guidance. Use when configuring model routing, provider failover, cost tracking, or managing multiple AI providers through a unified API.
ai-generation-persistence	AI generation persistence patterns — unique IDs, addressable URLs, database storage, and cost tracking for every LLM generation
ai-sdk	Vercel AI SDK expert guidance. Use when building AI-powered features — chat interfaces, text generation, structured output, tool calling, agents, MCP integration, streaming, embeddings, reranking, image generation, or working with any LLM provider.
aiq-deploy	\|
aiq-research	\|