bio-workflow-management-cwl-workflows
$
npx mdskill add GPTomics/bioSkills/bio-workflow-management-cwl-workflowsBuild portable bioinformatics pipelines with CWL standards.
- Enables cross-platform sharing and registry contribution.
- Integrates cwltool, FastQC, Nextflow, Salmon, Snakemake, and fastp.
- Validates tool versions and adapts to actual API signatures.
- Delivers executable YAML workflow definitions for analysis.
SKILL.md
.github/skills/bio-workflow-management-cwl-workflowsView on GitHub ↗
---
name: bio-workflow-management-cwl-workflows
description: Create portable, standards-based bioinformatics pipelines with Common Workflow Language (CWL). Use when building workflows that need maximum portability across execution platforms, sharing pipelines with collaborators using different systems, or contributing to community workflow registries.
tool_type: cli
primary_tool: cwltool
---
## Version Compatibility
Reference examples tested with: FastQC 0.12+, Nextflow 23.10+, Salmon 1.10+, Snakemake 8.0+, fastp 0.23+
Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# CWL Workflows
**"Write a portable CWL workflow for my analysis"** → Define tools and workflows in YAML using the Common Workflow Language standard for maximum cross-platform portability and sharing through workflow registries.
- CLI: `cwltool` for local execution of CWL documents
- YAML: CWL v1.2 CommandLineTool and Workflow class definitions
## Basic Tool Definition
```yaml
# fastqc.cwl
cwlVersion: v1.2
class: CommandLineTool
baseCommand: fastqc
inputs:
fastq:
type: File
inputBinding:
position: 1
outputs:
html:
type: File
outputBinding:
glob: "*_fastqc.html"
zip:
type: File
outputBinding:
glob: "*_fastqc.zip"
```
## Tool with Parameters
```yaml
# bwa_mem.cwl
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [bwa, mem]
requirements:
DockerRequirement:
dockerPull: biocontainers/bwa:v0.7.17
ResourceRequirement:
coresMin: 8
ramMin: 16000
inputs:
threads:
type: int
default: 8
inputBinding:
prefix: -t
position: 1
reference:
type: File
secondaryFiles:
- .amb
- .ann
- .bwt
- .pac
- .sa
inputBinding:
position: 2
reads_1:
type: File
inputBinding:
position: 3
reads_2:
type: File?
inputBinding:
position: 4
stdout: aligned.sam
outputs:
sam:
type: stdout
```
## Basic Workflow
```yaml
# rnaseq.cwl
cwlVersion: v1.2
class: Workflow
inputs:
fastq_1: File
fastq_2: File
salmon_index: Directory
outputs:
quant_results:
type: Directory
outputSource: salmon/quant_dir
steps:
fastp:
run: fastp.cwl
in:
reads_1: fastq_1
reads_2: fastq_2
out: [trimmed_1, trimmed_2, json_report]
salmon:
run: salmon_quant.cwl
in:
index: salmon_index
reads_1: fastp/trimmed_1
reads_2: fastp/trimmed_2
out: [quant_dir]
```
## Scatter (Parallel Execution)
```yaml
cwlVersion: v1.2
class: Workflow
requirements:
ScatterFeatureRequirement: {}
inputs:
fastq_files:
type: File[]
reference: File
outputs:
bam_files:
type: File[]
outputSource: align/bam
steps:
align:
run: bwa_mem.cwl
scatter: fastq
in:
fastq: fastq_files
reference: reference
out: [bam]
```
## Multi-Scatter
```yaml
requirements:
ScatterFeatureRequirement: {}
MultipleInputFeatureRequirement: {}
steps:
align:
run: bwa_mem.cwl
scatter: [reads_1, reads_2]
scatterMethod: dotproduct
in:
reads_1: fastq_1_array
reads_2: fastq_2_array
reference: reference
out: [bam]
```
## Input File (Job)
```yaml
# job.yaml
fastq_1:
class: File
path: data/sample1_R1.fq.gz
fastq_2:
class: File
path: data/sample1_R2.fq.gz
salmon_index:
class: Directory
path: ref/salmon_index
threads: 8
```
## Secondary Files
```yaml
inputs:
bam:
type: File
secondaryFiles:
- .bai
reference:
type: File
secondaryFiles:
- pattern: .fai
required: true
- pattern: .dict
required: false
```
## Docker and Singularity
```yaml
requirements:
DockerRequirement:
dockerPull: quay.io/biocontainers/salmon:1.10.0--h7e5ed60_0
hints:
SoftwareRequirement:
packages:
salmon:
version: ["1.10.0"]
```
```bash
# Run with Docker
cwltool --docker workflow.cwl job.yaml
# Run with Singularity
cwltool --singularity workflow.cwl job.yaml
```
## Resource Requirements
```yaml
requirements:
ResourceRequirement:
coresMin: 4
coresMax: 16
ramMin: 8000
ramMax: 32000
outdirMin: 10000
tmpdirMin: 10000
```
## Conditional Steps
```yaml
cwlVersion: v1.2
class: Workflow
requirements:
InlineJavascriptRequirement: {}
inputs:
run_qc: boolean
fastq: File
steps:
fastqc:
run: fastqc.cwl
when: $(inputs.run_qc)
in:
run_qc: run_qc
fastq: fastq
out: [html]
```
## Subworkflows
```yaml
# main.cwl
steps:
qc_workflow:
run: subworkflows/qc.cwl
in:
reads_1: fastq_1
reads_2: fastq_2
out: [qc_report, trimmed_1, trimmed_2]
alignment_workflow:
run: subworkflows/align.cwl
in:
reads_1: qc_workflow/trimmed_1
reads_2: qc_workflow/trimmed_2
out: [bam]
```
## File Arrays and Directories
```yaml
inputs:
bam_files:
type: File[]
output_dir:
type: string
default: "results"
outputs:
results:
type: Directory
outputBinding:
glob: $(inputs.output_dir)
```
## JavaScript Expressions
```yaml
requirements:
InlineJavascriptRequirement: {}
inputs:
sample_name: string
outputs:
output_bam:
type: File
outputBinding:
glob: $(inputs.sample_name + ".sorted.bam")
arguments:
- prefix: -o
valueFrom: $(inputs.sample_name).sorted.bam
```
## InitialWorkDirRequirement
```yaml
requirements:
InitialWorkDirRequirement:
listing:
- entry: $(inputs.reference)
writable: false
- entryname: config.txt
entry: |
threads=$(inputs.threads)
memory=$(inputs.memory)
```
## Complete RNA-seq Tool
```yaml
# salmon_quant.cwl
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [salmon, quant]
requirements:
DockerRequirement:
dockerPull: quay.io/biocontainers/salmon:1.10.0--h7e5ed60_0
ResourceRequirement:
coresMin: 8
ramMin: 16000
inputs:
index:
type: Directory
inputBinding:
prefix: -i
reads_1:
type: File
inputBinding:
prefix: "-1"
reads_2:
type: File
inputBinding:
prefix: "-2"
lib_type:
type: string
default: A
inputBinding:
prefix: -l
threads:
type: int
default: 8
inputBinding:
prefix: --threads
output_dir:
type: string
default: quant_output
inputBinding:
prefix: -o
outputs:
quant_dir:
type: Directory
outputBinding:
glob: $(inputs.output_dir)
```
## Run Commands
```bash
# Validate CWL file
cwltool --validate workflow.cwl
# Run workflow
cwltool workflow.cwl job.yaml
# Run with Docker
cwltool --docker workflow.cwl job.yaml
# Run with Singularity
cwltool --singularity workflow.cwl job.yaml
# Run with caching
cwltool --cachedir ./cache workflow.cwl job.yaml
# Run on Toil
toil-cwl-runner workflow.cwl job.yaml
```
## Execution Engines
| Engine | Use Case |
|--------|----------|
| cwltool | Reference implementation, local execution |
| Toil | HPC clusters, cloud (AWS, Google, Azure) |
| Arvados | Enterprise workflow management |
| CWL-Airflow | Airflow integration |
## Related Skills
- workflow-management/wdl-workflows - WDL alternative
- workflow-management/snakemake-workflows - Python-based alternative
- workflow-management/nextflow-pipelines - Groovy-based alternative
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.