bio-immunoinformatics-epitope-prediction
$
npx mdskill add GPTomics/bioSkills/bio-immunoinformatics-epitope-predictionPredicts B-cell and T-cell epitopes for vaccine and antibody design
- Identifies immunogenic regions in antigens for vaccine development
- Uses BepiPred, IEDB API, and mhcflurry for sequence and structure-based prediction
- Analyzes protein sequences to determine likely B-cell and T-cell binding sites
- Returns predicted epitopes as tab-separated results or structured data
SKILL.md
.github/skills/bio-immunoinformatics-epitope-predictionView on GitHub ↗
---
name: bio-immunoinformatics-epitope-prediction
description: Predict B-cell and T-cell epitopes using BepiPred, IEDB tools, and structure-based methods for vaccine and antibody design. Identify immunogenic regions in antigens. Use when designing vaccines, mapping antibody binding sites, or predicting immunogenic peptides.
tool_type: python
primary_tool: BepiPred
---
## Version Compatibility
Reference examples tested with: pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Epitope Prediction
**"Predict B-cell and T-cell epitopes in my protein"** → Identify immunogenic regions in antigens for vaccine design using sequence-based and structure-based prediction tools.
- Python: IEDB API for B-cell epitope prediction (BepiPred)
- Python: `mhcflurry` for T-cell epitope MHC binding prediction
## B-Cell Epitope Prediction
**Goal:** Predict linear B-cell epitopes from protein sequence using IEDB prediction tools.
**Approach:** Submit sequence to IEDB B-cell prediction API with selectable method (BepiPred-2.0 recommended) and parse tab-separated results.
### BepiPred-2.0 (Sequence-Based)
```python
import requests
def predict_bcell_epitopes_iedb(sequence, method='bepipred2'):
'''Predict B-cell epitopes using IEDB API
Methods:
- bepipred2: Deep learning (recommended)
- bepipred: Original BepiPred
- emini: Surface accessibility
- kolaskar-tongaonkar: Antigenicity
- parker: Hydrophilicity
BepiPred-2.0 uses deep learning on crystal structures
Threshold: >0.5 predicted as epitope (default)
'''
url = 'http://tools-cluster-interface.iedb.org/tools_api/bcell/'
params = {
'method': method,
'sequence_text': sequence
}
response = requests.post(url, data=params)
# Parse response (tab-separated)
lines = response.text.strip().split('\n')
header = lines[0].split('\t')
data = [line.split('\t') for line in lines[1:]]
return header, data
```
### Parse BepiPred Results
```python
import pandas as pd
def parse_bepipred_results(header, data, threshold=0.5):
'''Parse BepiPred output and identify epitope regions
Output columns:
- Position: Amino acid position
- Residue: Amino acid
- Score: BepiPred score (higher = more likely epitope)
Epitope threshold:
- >0.5: Default, balanced sensitivity/specificity
- >0.6: More stringent, fewer false positives
- >0.4: More sensitive, more candidates
'''
df = pd.DataFrame(data, columns=header)
df['Score'] = df['Score'].astype(float)
df['Position'] = df['Position'].astype(int)
# Identify epitope regions
df['is_epitope'] = df['Score'] > threshold
# Find continuous epitope regions
epitopes = []
current_epitope = []
for _, row in df.iterrows():
if row['is_epitope']:
current_epitope.append(row)
else:
if len(current_epitope) >= 5: # Minimum epitope length
epitopes.append({
'start': current_epitope[0]['Position'],
'end': current_epitope[-1]['Position'],
'sequence': ''.join(r['Residue'] for r in current_epitope),
'avg_score': sum(r['Score'] for r in current_epitope) / len(current_epitope)
})
current_epitope = []
return df, epitopes
```
## T-Cell Epitope Prediction
**Goal:** Predict T-cell epitopes by MHC-I binding across multiple HLA alleles.
**Approach:** Query IEDB MHC-I API for each allele-sequence combination and aggregate predictions.
```python
def predict_tcell_epitopes_iedb(sequence, alleles, method='recommended'):
'''Predict T-cell epitopes using IEDB
MHC-I methods:
- recommended: Consensus of methods
- netmhcpan_ba: NetMHCpan binding affinity
- netmhcpan_el: NetMHCpan eluted ligand
MHC-II methods:
- recommended
- netmhciipan
'''
url = 'http://tools-cluster-interface.iedb.org/tools_api/mhci/'
results = []
for allele in alleles:
params = {
'method': method,
'sequence_text': sequence,
'allele': allele,
'length': '9' # Most common for MHC-I
}
response = requests.post(url, data=params)
# Parse results...
return results
```
## Linear vs Conformational Epitopes
**Goal:** Classify epitopes as linear (continuous) or conformational (discontinuous) and predict structure-based epitopes.
**Approach:** Distinguish by residue continuity in primary sequence; for conformational epitopes, use structure-based tools (DiscoTope, ElliPro) via web servers.
```python
def classify_epitope_type(epitope_info):
'''Classify epitope as linear or conformational
Linear (continuous) epitopes:
- Consecutive amino acids in primary sequence
- ~10% of B-cell epitopes
- Easier to predict from sequence
Conformational (discontinuous) epitopes:
- Non-consecutive residues brought together by folding
- ~90% of B-cell epitopes
- Requires structure for prediction
'''
pass
def predict_conformational_epitopes(pdb_file, chain='A'):
'''Predict conformational B-cell epitopes from structure
Uses surface accessibility and protrusion index.
Requires 3D structure (PDB/mmCIF).
Tools:
- DiscoTope 2.0 (structure-based)
- ElliPro (protrusion)
- SEPPA 3.0
'''
# Structure-based prediction requires specialized tools
# Usually accessed via web servers
print('For conformational epitopes:')
print('- DiscoTope: http://tools.iedb.org/discotope/')
print('- ElliPro: http://tools.iedb.org/ellipro/')
pass
```
## Combine Multiple Predictions
**Goal:** Improve epitope prediction reliability by combining multiple methods into a consensus score.
**Approach:** Run each method independently, threshold per method, then count agreements per position and assign confidence levels.
```python
def consensus_epitope_prediction(sequence, methods=['bepipred2', 'emini', 'parker']):
'''Combine multiple prediction methods
Consensus approach improves reliability:
- Regions predicted by multiple methods more reliable
- Different methods capture different properties
Scoring:
- 3/3 methods: High confidence
- 2/3 methods: Moderate confidence
- 1/3 methods: Low confidence
'''
all_results = {}
for method in methods:
header, data = predict_bcell_epitopes_iedb(sequence, method)
df = pd.DataFrame(data, columns=header)
all_results[method] = df
# Combine scores
consensus = all_results[methods[0]][['Position', 'Residue']].copy()
for method in methods:
threshold = 0.5 if method == 'bepipred2' else 0 # Method-specific thresholds
all_results[method]['is_epitope'] = all_results[method]['Score'].astype(float) > threshold
consensus[method] = all_results[method]['is_epitope'].astype(int)
consensus['consensus_score'] = consensus[methods].sum(axis=1)
consensus['confidence'] = consensus['consensus_score'].map({
3: 'high', 2: 'moderate', 1: 'low', 0: 'none'
})
return consensus
```
## Epitope Mapping from Experimental Data
**Goal:** Map epitope regions from overlapping peptide array binding data.
**Approach:** Process signal intensity values from overlapping peptide arrays and identify continuous high-signal regions as epitopes.
```python
def map_epitopes_from_peptide_array(array_results, overlap=11):
'''Map epitopes from peptide array experiments
Peptide arrays test binding of overlapping peptides
covering the entire antigen sequence.
Args:
array_results: Dict mapping peptide -> signal intensity
overlap: Overlap between consecutive peptides
Returns:
Epitope map with per-residue scores
'''
# Implementation would process experimental binding data
pass
```
## Related Skills
- immunoinformatics/mhc-binding-prediction - T-cell epitope prediction
- immunoinformatics/immunogenicity-scoring - Epitope ranking
- structural-biology/geometric-analysis - Structure-based epitopes
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.