uniprot-protein-retrieval
$
npx mdskill add InternScience/scp/uniprot-protein-retrievalUse the standard MCP client pattern for Origene-UniProt server.
SKILL.md
.github/skills/uniprot-protein-retrievalView on GitHub ↗
---
name: uniprot-protein-retrieval
description: Retrieve protein sequences and functional information from UniProt database by protein name, enabling protein analysis and bioinformatics workflows.
license: MIT license
metadata:
skill-author: PJLab
---
# UniProt Protein Sequence Retrieval
## Usage
### 1. MCP Server Definition
Use the standard MCP client pattern for Origene-UniProt server.
### 2. Protein Sequence Retrieval Workflow
This workflow retrieves protein sequences and associated information from the UniProt database using protein names or identifiers.
**Workflow Steps:**
1. **Query by Protein Name** - Search UniProt using common protein names
2. **Retrieve Sequence Data** - Get amino acid sequence and metadata
**Implementation:**
```python
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession
import json
class OrigeneClient:
def __init__(self, server_url: str):
self.server_url = server_url
self.session = None
async def connect(self):
try:
self.transport = streamablehttp_client(
url=self.server_url,
headers={"SCP-HUB-API-KEY": "<your-api-key>"}
)
self.read, self.write, self.get_session_id = await self.transport.__aenter__()
self.session_ctx = ClientSession(self.read, self.write)
self.session = await self.session_ctx.__aenter__()
await self.session.initialize()
print("✓ Connected to Origene-UniProt")
return True
except Exception as e:
print(f"✗ Connection failed: {e}")
return False
async def disconnect(self):
try:
if self.session:
await self.session_ctx.__aexit__(None, None, None)
if hasattr(self, 'transport'):
await self.transport.__aexit__(None, None, None)
print("✓ Disconnected")
except Exception as e:
print(f"✗ Disconnect error: {e}")
def parse_result(self, result):
try:
if hasattr(result, 'content') and result.content:
content = result.content[0]
if hasattr(content, 'text'):
return json.loads(content.text)
return str(result)
except Exception as e:
return {"error": f"Parse error: {e}", "raw": str(result)}
## Initialize client
client = OrigeneClient("https://scp.intern-ai.org.cn/api/v1/mcp/10/Origene-UniProt")
if not await client.connect():
print("Connection failed")
return
## Step 1: Retrieve protein sequence by name
protein_name = "insulin" # Can be common name, gene symbol, or UniProt ID
result = await client.session.call_tool(
"get_protein_sequence_by_name",
arguments={
"protein_name": protein_name
}
)
result_data = client.parse_result(result)
## Display results
print(f"\nProtein: {protein_name}")
print("=" * 80)
if "sequence" in result_data:
sequence = result_data["sequence"]
print(f"Amino Acid Sequence ({len(sequence)} residues):")
print(sequence)
# Format sequence in blocks of 60
print("\nFormatted Sequence:")
for i in range(0, len(sequence), 60):
position = i + 1
block = sequence[i:i+60]
print(f"{position:6d} {block}")
if "uniprot_id" in result_data:
print(f"\nUniProt ID: {result_data['uniprot_id']}")
if "protein_names" in result_data:
print(f"Protein Names: {result_data['protein_names']}")
if "organism" in result_data:
print(f"Organism: {result_data['organism']}")
if "function" in result_data:
print(f"Function: {result_data['function'][:200]}...")
await client.disconnect()
```
### Extended Example: Multiple Protein Retrieval
```python
## Retrieve multiple proteins
protein_list = ["p53", "BRCA1", "insulin", "hemoglobin"]
sequences = {}
for protein in protein_list:
result = await client.session.call_tool(
"get_protein_sequence_by_name",
arguments={"protein_name": protein}
)
data = client.parse_result(result)
if "sequence" in data:
sequences[protein] = {
"sequence": data["sequence"],
"length": len(data["sequence"]),
"uniprot_id": data.get("uniprot_id", "N/A")
}
## Display summary
print("\nProtein Sequence Summary:")
print(f"{'Protein':<15} {'UniProt ID':<12} {'Length':<10}")
print("-" * 40)
for name, info in sequences.items():
print(f"{name:<15} {info['uniprot_id']:<12} {info['length']:<10}")
```
### Tool Description
**Origene-UniProt Server:**
- `get_protein_sequence_by_name`: Retrieve protein sequence from UniProt database
- Args:
- `protein_name` (str): Protein common name, gene symbol, or UniProt ID
- Returns:
- `sequence` (str): Amino acid sequence (one-letter code)
- `uniprot_id` (str): UniProt accession number
- `protein_names` (str): Official and alternative protein names
- `organism` (str): Source organism
- `function` (str): Protein function description
- `length` (int): Sequence length in residues
- `mass` (float): Molecular mass (Da)
### Input/Output
**Input:**
- `protein_name`: Protein identifier (flexible format)
- Examples: "insulin", "P53", "BRCA1", "P01308"
- Supports: common names, gene symbols, UniProt IDs
**Output:**
- Protein sequence and comprehensive metadata
- Ready for downstream analysis (alignment, structure prediction, etc.)
### Supported Query Types
1. **Common Names**: "insulin", "hemoglobin", "actin"
2. **Gene Symbols**: "TP53", "BRCA1", "EGFR"
3. **UniProt IDs**: "P01308", "P04637"
4. **Protein Families**: "kinase", "protease" (returns multiple entries)
### Applications
**Use retrieved sequences for:**
- Protein alignment and homology analysis
- Structure prediction (AlphaFold, ESM Fold)
- Primer design for cloning
- Antibody epitope mapping
- Conservation analysis
- Mutation impact assessment
- Phylogenetic studies
### Integration with Other Workflows
**Combine with:**
1. **Protein BLAST** → Find homologs
2. **InterProScan** → Identify domains
3. **AlphaFold** → Predict 3D structure
4. **STRING** → Find protein interactions
5. **OpenTargets** → Link to diseases
### Example: Complete Protein Analysis Pipeline
```python
## 1. Retrieve sequence
result = await uniprot_client.session.call_tool(
"get_protein_sequence_by_name",
arguments={"protein_name": "BRCA1"}
)
sequence = uniprot_client.parse_result(result)["sequence"]
## 2. Find similar proteins (BLAST)
result = await biotools_client.session.call_tool(
"blast_search",
arguments={
"sequence": sequence,
"evalue": 1e-10,
"max_hits": 20
}
)
homologs = biotools_client.parse_result(result)
## 3. Identify domains (InterProScan)
result = await biotools_client.session.call_tool(
"interproscan_analyze",
arguments={
"sequence": sequence,
"databases": ["Pfam", "SMART"]
}
)
domains = biotools_client.parse_result(result)
## 4. Get disease associations (OpenTargets)
result = await opentargets_client.session.call_tool(
"get_target_associated_diseases",
arguments={"gene_symbol": "BRCA1"}
)
diseases = opentargets_client.parse_result(result)
print(f"Complete analysis for BRCA1:")
print(f"- Sequence length: {len(sequence)} amino acids")
print(f"- Homologs found: {len(homologs)}")
print(f"- Functional domains: {len(domains)}")
print(f"- Associated diseases: {len(diseases)}")
```
### Error Handling
**Common issues:**
- **Protein not found**: Check spelling, try alternative names or UniProt ID
- **Multiple matches**: Use more specific identifier (UniProt ID preferred)
- **No sequence available**: Some entries may lack sequence data
- **Network timeout**: Retry with exponential backoff
### Data Quality Notes
- UniProt is manually curated (Swiss-Prot) and computationally annotated (TrEMBL)
- Sequence quality: Swiss-Prot entries are highly reliable
- Updates: UniProt is updated regularly; sequences may change
- Isoforms: Multiple isoforms may exist; canonical sequence is returned by default
More from InternScience/scp
- admet_druglikeness_reportADMET & Drug-Likeness Report - Generate comprehensive ADMET and drug-likeness report: molecular properties, H-bond analysis, hydrophobicity, topology, and ADMET prediction. Use this skill for medicinal chemistry tasks involving calculate mol basic info calculate mol hbond calculate mol hydrophobicity calculate mol topology pred molecule admet. Combines 5 tools from 2 SCP server(s).
- affinity_maturationAffinity Maturation Pipeline - Affinity maturation: compute binding affinity, predict mutations, compute hydrophilicity, and predict drug-target interaction. Use this skill for antibody engineering tasks involving ComputeAffinityCalculator zero shot sequence prediction ComputeHydrophilicity PredictDrugTargetInteraction. Combines 4 tools from 3 SCP server(s).
- alanine_scanning_pipelineAlanine Scanning Mutagenesis Pipeline - Alanine scanning: design scan, compute properties for each mutant, predict interactions, and compare. Use this skill for protein biochemistry tasks involving AlanineScanningDesigner ComputeProtPara PredictDrugTargetInteraction calculate protein sequence properties. Combines 4 tools from 3 SCP server(s).
- aliphatic_ring_analysisRing System Analysis - Analyze ring systems: count aliphatic carbocycles, analyze aromaticity, compute topology, and structure complexity. Use this skill for organic chemistry tasks involving GetAliphaticCarbocyclesNum AromaticityAnalyzer calculate mol topology calculate mol structure complexity. Combines 4 tools from 3 SCP server(s).
- alphafold_structure_pipelineAlphaFold Structure Analysis Pipeline - AlphaFold pipeline: download predicted structure, predict pockets, extract sequence, and compute properties. Use this skill for computational biology tasks involving download alphafold structure run fpocket extract pdb sequence calculate pdb basic info. Combines 4 tools from 3 SCP server(s).
- antibody_drug_developmentAntibody Drug Development - Develop antibody drug: target protein analysis, biotherapeutic lookup, protein properties, and interaction prediction. Use this skill for biologics tasks involving get uniprotkb entry by accession get biotherapeutic by name ComputeProtPara ComputeHydrophilicity. Combines 4 tools from 3 SCP server(s).
- antibody_target_analysisAntibody-Target Analysis - Analyze an antibody target: UniProt protein info, InterPro domains, protein properties, and biotherapeutic data from ChEMBL. Use this skill for immunology tasks involving get uniprotkb entry by accession query interpro ComputeProtPara get biotherapeutic by name. Combines 4 tools from 4 SCP server(s).
- atc_drug_classificationATC Drug Classification Lookup - Look up drug in ATC classification: ChEMBL ATC class, FDA drug info, PubChem compound, and mechanism of action. Use this skill for pharmacology tasks involving get atc class by level5 get mechanism of action by drug name get compound by name get drug by name. Combines 4 tools from 3 SCP server(s).
- atmospheric-science-calculationsCalculate atmospheric parameters including Coriolis parameter, geostrophic wind, heat index, potential temperature, and dewpoint for meteorology and climate science.
- binding_site_characterizationBinding Site Characterization - Characterize binding sites: predict pockets with fpocket and P2Rank, get binding site info from ChEMBL, and visualize. Use this skill for structural biology tasks involving run fpocket pred pocket prank get binding site by id visualize protein. Combines 4 tools from 3 SCP server(s).