molecular-similarity-search

$npx mdskill add InternScience/scp/molecular-similarity-search

Identifies structurally similar molecules using Tanimoto similarity and Morgan fingerprints

  • Helps researchers find compounds with similar chemical structures
  • Uses Morgan fingerprints and Tanimoto similarity algorithm
  • Analyzes molecular structures to calculate similarity scores
  • Returns a list of related compounds with similarity metrics

SKILL.md

.github/skills/molecular-similarity-searchView on GitHub ↗
---
name: molecular-similarity-search
description: Search for similar molecules using Tanimoto similarity with Morgan fingerprints to identify structurally related compounds.
license: MIT license
metadata:
    skill-author: PJLab
---

# Molecular Similarity Search

## Usage

### 1. MCP Server Definition

```python
import asyncio
import json
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession

class DrugSDAClient:
    """DrugSDA-Tool MCP Client"""

    def __init__(self, server_url: str, api_key: str):
        self.server_url = server_url
        self.api_key = api_key
        self.session = None

    async def connect(self):
        """Establish connection and initialize session"""
        print(f"server url: {self.server_url}")
        try:
            self.transport = streamablehttp_client(
                url=self.server_url,
                headers={"SCP-HUB-API-KEY": self.api_key}
            )
            self.read, self.write, self.get_session_id = await self.transport.__aenter__()

            self.session_ctx = ClientSession(self.read, self.write)
            self.session = await self.session_ctx.__aenter__()

            await self.session.initialize()
            session_id = self.get_session_id()

            print(f"✓ connect success")
            return True

        except Exception as e:
            print(f"✗ connect failure: {e}")
            return False

    async def disconnect(self):
        """Disconnect from server"""
        try:
            if self.session:
                await self.session_ctx.__aexit__(None, None, None)
            if hasattr(self, 'transport'):
                await self.transport.__aexit__(None, None, None)
            print("✓ already disconnect")
        except Exception as e:
            print(f"✗ disconnect error: {e}")

    def parse_result(self, result):
        """Parse MCP tool call result"""
        try:
            if hasattr(result, 'content') and result.content:
                content = result.content[0]
                if hasattr(content, 'text'):
                    return json.loads(content.text)
            return str(result)
        except Exception as e:
            return {"error": f"parse error: {e}", "raw": str(result)}
```

### 2. Molecular Similarity Search Workflow

This workflow searches for similar molecules using Tanimoto similarity calculated from Morgan fingerprints.

**Workflow Steps:**

1. **Define Target Molecule** - Specify the query SMILES
2. **Define Candidate Molecules** - Provide list of candidate SMILES
3. **Calculate Similarity** - Compute Tanimoto scores for all candidates
4. **Rank Results** - Sort by similarity score to find most similar molecules

**Implementation:**

```python
## Initialize client
client = DrugSDAClient(
    "https://scp.intern-ai.org.cn/api/v1/mcp/2/DrugSDA-Tool",
    "<your-api-key>"
)

if not await client.connect():
    print("connection failed")
    exit()

## Input: Target molecule and candidate library
target = "CCO"  # Ethanol
candidates = [
    "CCCO",      # Propanol
    "CCCCO",     # Butanol
    "CC(C)O",    # Isopropanol
    "CCC(C)O",   # sec-Butanol
    "C1CC1",     # Cyclopropane
    "CC=O",      # Acetaldehyde
    "CCCOO"      # Propanoic acid
]

## Execute similarity calculation
result = await client.session.call_tool(
    "calculate_smiles_similarity",
    arguments={
        "target_smiles": target,
        "candidate_smiles_list": candidates
    }
)

result_data = client.parse_result(result)
similarities = result_data['similarities']

## Sort and display top 3 most similar molecules
top3_smiles = sorted(similarities, key=lambda x: x['score'], reverse=True)[:3]

print(f"Target molecule: {target}\n")
print("Top 3 most similar molecules:")
for i, item in enumerate(top3_smiles, 1):
    print(f"{i}. {item['smiles']} - Tanimoto score: {item['score']:.4f}")

await client.disconnect()
```

### Tool Descriptions

**DrugSDA-Tool Server:**
- `calculate_smiles_similarity`: Compute molecular similarity using Morgan fingerprints
  - Args:
    - `target_smiles` (str): Query molecule SMILES string
    - `candidate_smiles_list` (list): List of candidate molecule SMILES strings
  - Returns:
    - `similarities` (list): List of similarity scores
      - `smiles` (str): Candidate SMILES string
      - `score` (float): Tanimoto similarity (0-1)

### Input/Output

**Input:**
- `target_smiles`: SMILES string of the query molecule
- `candidate_smiles_list`: List of SMILES strings to compare against

**Output:**
- List of similarity results:
  - `smiles`: Candidate molecule SMILES
  - `score`: Tanimoto similarity coefficient (0-1)
    - 1.0 = identical molecules
    - >0.7 = highly similar
    - 0.4-0.7 = moderately similar
    - <0.4 = dissimilar

### Similarity Interpretation

- **Score > 0.85**: Very high similarity, likely same scaffold
- **Score 0.7-0.85**: High similarity, similar pharmacophore
- **Score 0.5-0.7**: Moderate similarity, related structures
- **Score < 0.5**: Low similarity, different chemical space

### Use Cases

- Virtual screening and library filtering
- Scaffold hopping in drug design
- Chemical space exploration
- Lead compound identification
- Analog searching in compound databases
- Structure-activity relationship studies

### Performance Notes

- **Execution time**: <1 second for up to 1000 candidates
- **Fingerprint**: Morgan fingerprint (radius 2, 2048 bits)
- **Algorithm**: Tanimoto coefficient for binary fingerprints
- **Scalability**: Efficient for large compound libraries

More from InternScience/scp

SkillDescription
admet_druglikeness_reportADMET & Drug-Likeness Report - Generate comprehensive ADMET and drug-likeness report: molecular properties, H-bond analysis, hydrophobicity, topology, and ADMET prediction. Use this skill for medicinal chemistry tasks involving calculate mol basic info calculate mol hbond calculate mol hydrophobicity calculate mol topology pred molecule admet. Combines 5 tools from 2 SCP server(s).
affinity_maturationAffinity Maturation Pipeline - Affinity maturation: compute binding affinity, predict mutations, compute hydrophilicity, and predict drug-target interaction. Use this skill for antibody engineering tasks involving ComputeAffinityCalculator zero shot sequence prediction ComputeHydrophilicity PredictDrugTargetInteraction. Combines 4 tools from 3 SCP server(s).
alanine_scanning_pipelineAlanine Scanning Mutagenesis Pipeline - Alanine scanning: design scan, compute properties for each mutant, predict interactions, and compare. Use this skill for protein biochemistry tasks involving AlanineScanningDesigner ComputeProtPara PredictDrugTargetInteraction calculate protein sequence properties. Combines 4 tools from 3 SCP server(s).
aliphatic_ring_analysisRing System Analysis - Analyze ring systems: count aliphatic carbocycles, analyze aromaticity, compute topology, and structure complexity. Use this skill for organic chemistry tasks involving GetAliphaticCarbocyclesNum AromaticityAnalyzer calculate mol topology calculate mol structure complexity. Combines 4 tools from 3 SCP server(s).
alphafold_structure_pipelineAlphaFold Structure Analysis Pipeline - AlphaFold pipeline: download predicted structure, predict pockets, extract sequence, and compute properties. Use this skill for computational biology tasks involving download alphafold structure run fpocket extract pdb sequence calculate pdb basic info. Combines 4 tools from 3 SCP server(s).
antibody_drug_developmentAntibody Drug Development - Develop antibody drug: target protein analysis, biotherapeutic lookup, protein properties, and interaction prediction. Use this skill for biologics tasks involving get uniprotkb entry by accession get biotherapeutic by name ComputeProtPara ComputeHydrophilicity. Combines 4 tools from 3 SCP server(s).
antibody_target_analysisAntibody-Target Analysis - Analyze an antibody target: UniProt protein info, InterPro domains, protein properties, and biotherapeutic data from ChEMBL. Use this skill for immunology tasks involving get uniprotkb entry by accession query interpro ComputeProtPara get biotherapeutic by name. Combines 4 tools from 4 SCP server(s).
atc_drug_classificationATC Drug Classification Lookup - Look up drug in ATC classification: ChEMBL ATC class, FDA drug info, PubChem compound, and mechanism of action. Use this skill for pharmacology tasks involving get atc class by level5 get mechanism of action by drug name get compound by name get drug by name. Combines 4 tools from 3 SCP server(s).
atmospheric-science-calculationsCalculate atmospheric parameters including Coriolis parameter, geostrophic wind, heat index, potential temperature, and dewpoint for meteorology and climate science.
binding_site_characterizationBinding Site Characterization - Characterize binding sites: predict pockets with fpocket and P2Rank, get binding site info from ChEMBL, and visualize. Use this skill for structural biology tasks involving run fpocket pred pocket prank get binding site by id visualize protein. Combines 4 tools from 3 SCP server(s).