molecular-property-profiling

$npx mdskill add InternScience/scp/molecular-property-profiling

Analyzes molecular properties for drug discovery and QSAR modeling

  • Provides detailed molecular profiling for compound evaluation
  • Uses DrugSDAClient and standard cheminformatics tools
  • Calculates descriptors across hydrophobicity, topology, and drug-likeness
  • Returns structured metrics in JSON format for downstream analysis

SKILL.md

.github/skills/molecular-property-profilingView on GitHub ↗
---
name: molecular-property-profiling
description: Comprehensive molecular property analysis covering basic info, hydrophobicity, H-bonding, structural complexity, topology, drug-likeness, charge distribution, and complexity metrics.
license: MIT license
metadata:
    skill-author: PJLab
---

# Molecular Property Profiling Workflow

## Usage

### 1. MCP Server Definition

Use the same `DrugSDAClient` class as defined in previous skills.

### 2. Comprehensive Molecular Property Analysis

This workflow computes a comprehensive set of molecular descriptors across 8 different categories, providing a complete molecular profile for QSAR modeling, drug discovery, and molecular analysis.

**Workflow Steps:**

1. **Basic Properties** - Molecular formula, weight, atom counts, bond counts
2. **Hydrophobicity** - LogP, molar refractivity, lipophilicity descriptors
3. **Hydrogen Bonding** - H-bond donors/acceptors, TPSA
4. **Structural Complexity** - Ring counts, aromatic rings, rotatable bonds
5. **Topological Descriptors** - Chi indices, Kappa shape indices
6. **Drug Chemistry** - QED score, Lipinski violations
7. **Charge Properties** - Gasteiger charges, formal charge
8. **Complexity Metrics** - Molecular complexity, asphericity

**Implementation:**

```python
from collections import defaultdict

def merge_lists_by_smiles(*lists):
    """Merge multiple descriptor lists by SMILES key"""
    merged = defaultdict(dict)
    for lst in lists:
        for d in lst:
            smiles = d['smiles']
            merged[smiles].update(d)
    return list(merged.values())

client = DrugSDAClient("https://scp.intern-ai.org.cn/api/v1/mcp/2/DrugSDA-Tool")
if not await client.connect():
    print("connection failed")
    return

## Input: List of SMILES strings
smiles_list = [
    'Nc1nnc(S(=O)(=O)NCCc2ccc(O)cc2)s1',
    'COc1ccc2c(=O)cc(C(=O)N3CCN(c4ccc(F)cc4)CC3)oc2c1',
    'CCCC1CCC(CC(=O)Cl)(C2CCCCC2)CC1'
]

## Step 1: Calculate basic molecular properties
result = await client.session.call_tool(
    "calculate_mol_basic_info",
    arguments={"smiles_list": smiles_list}
)
basic_metrics = client.parse_result(result)['metrics']

## Step 2: Calculate hydrophobicity descriptors
result = await client.session.call_tool(
    "calculate_mol_hydrophobicity",
    arguments={"smiles_list": smiles_list}
)
hydrophobicity_metrics = client.parse_result(result)['metrics']

## Step 3: Calculate hydrogen bonding properties
result = await client.session.call_tool(
    "calculate_mol_hbond",
    arguments={"smiles_list": smiles_list}
)
hbond_metrics = client.parse_result(result)['metrics']

## Step 4: Calculate structural complexity
result = await client.session.call_tool(
    "calculate_mol_structure_complexity",
    arguments={"smiles_list": smiles_list}
)
structure_metrics = client.parse_result(result)['metrics']

## Step 5: Calculate topological descriptors
result = await client.session.call_tool(
    "calculate_mol_topology",
    arguments={"smiles_list": smiles_list}
)
topology_metrics = client.parse_result(result)['metrics']

## Step 6: Calculate drug chemistry properties
result = await client.session.call_tool(
    "calculate_mol_drug_chemistry",
    arguments={"smiles_list": smiles_list}
)
chemistry_metrics = client.parse_result(result)['metrics']

## Step 7: Calculate charge properties
result = await client.session.call_tool(
    "calculate_mol_charge",
    arguments={"smiles_list": smiles_list}
)
charge_metrics = client.parse_result(result)['metrics']

## Step 8: Calculate complexity metrics
result = await client.session.call_tool(
    "calculate_mol_complexity",
    arguments={"smiles_list": smiles_list}
)
complexity_metrics = client.parse_result(result)['metrics']

## Merge all descriptors by SMILES
complete_profiles = merge_lists_by_smiles(
    basic_metrics,
    hydrophobicity_metrics,
    hbond_metrics,
    structure_metrics,
    topology_metrics,
    chemistry_metrics,
    charge_metrics,
    complexity_metrics
)

## Display results
for profile in complete_profiles:
    print(f"\nSMILES: {profile['smiles']}")
    print(f"Molecular Formula: {profile['molecular_formula']}")
    print(f"Molecular Weight: {profile['molecular_weight']:.2f}")
    print(f"LogP: {profile['logp']:.2f}")
    print(f"QED Score: {profile['qed']:.4f}")
    print(f"H-Bond Donors: {profile['num_h_donors']}")
    print(f"H-Bond Acceptors: {profile['num_h_acceptors']}")
    print(f"TPSA: {profile['tpsa']:.2f}")
    print(f"Lipinski Violations: {profile['lipinski_rule_of_5_violations']}")

await client.disconnect()
```

### Descriptor Categories

#### 1. Basic Properties
- `molecular_formula`: Molecular formula
- `molecular_weight`: Molecular weight (Da)
- `num_heavy_atoms`: Count of non-hydrogen atoms
- `num_atoms`, `num_bonds`: Total atom and bond counts
- `formal_charge`: Overall formal charge

#### 2. Hydrophobicity
- `logp`: Partition coefficient (lipophilicity)
- `molar_refractivity`: Molar refractivity
- `fraction_csp3`: Fraction of sp3 carbons (saturation)

#### 3. Hydrogen Bonding
- `num_h_donors`: H-bond donor count
- `num_h_acceptors`: H-bond acceptor count
- `tpsa`: Topological polar surface area (Ų)

#### 4. Structural Complexity
- `num_rings`, `num_aromatic_rings`: Ring counts
- `num_rotatable_bonds`: Flexible bonds
- `num_heteroatoms`: Non-C/H atoms

#### 5. Topological Descriptors
- `chi0v`-`chi4v`: Chi connectivity indices
- `kappa1`-`kappa3`: Kappa shape indices
- `hall_kier_alpha`: Hall-Kier alpha value

#### 6. Drug Chemistry
- `qed`: Quantitative Estimate of Drug-likeness (0-1)
- `lipinski_rule_of_5_violations`: Lipinski violations (0-4)

#### 7. Charge Properties
- `min/max/avg_gasteiger_charge`: Gasteiger partial charges
- `gasteiger_charge_range`: Charge distribution range

#### 8. Complexity Metrics
- `molecular_complexity`: Bertz complexity index
- `aromatic_proportion`: Fraction of aromatic atoms
- `asphericity`: 3D shape asphericity

### Input/Output

**Input:**
- `smiles_list`: List of SMILES strings

**Output:**
- List of dictionaries, each containing 50+ molecular descriptors for one molecule

### Applications

- **QSAR Modeling**: Use descriptors as features for predictive models
- **Drug Discovery**: Screen compounds by drug-likeness and physicochemical properties
- **Chemical Space Analysis**: Visualize and cluster molecules by properties
- **Lead Optimization**: Track property changes during optimization
- **Virtual Screening**: Filter libraries by desired property ranges

### Property Filters for Drug-likeness

Typical ranges for oral drug candidates:
- Molecular Weight: 150-500 Da
- LogP: 0-5
- H-Bond Donors: ≤ 5
- H-Bond Acceptors: ≤ 10
- TPSA: 20-140 Ų
- Rotatable Bonds: ≤ 10
- QED Score: > 0.5

More from InternScience/scp

SkillDescription
admet_druglikeness_reportADMET & Drug-Likeness Report - Generate comprehensive ADMET and drug-likeness report: molecular properties, H-bond analysis, hydrophobicity, topology, and ADMET prediction. Use this skill for medicinal chemistry tasks involving calculate mol basic info calculate mol hbond calculate mol hydrophobicity calculate mol topology pred molecule admet. Combines 5 tools from 2 SCP server(s).
affinity_maturationAffinity Maturation Pipeline - Affinity maturation: compute binding affinity, predict mutations, compute hydrophilicity, and predict drug-target interaction. Use this skill for antibody engineering tasks involving ComputeAffinityCalculator zero shot sequence prediction ComputeHydrophilicity PredictDrugTargetInteraction. Combines 4 tools from 3 SCP server(s).
alanine_scanning_pipelineAlanine Scanning Mutagenesis Pipeline - Alanine scanning: design scan, compute properties for each mutant, predict interactions, and compare. Use this skill for protein biochemistry tasks involving AlanineScanningDesigner ComputeProtPara PredictDrugTargetInteraction calculate protein sequence properties. Combines 4 tools from 3 SCP server(s).
aliphatic_ring_analysisRing System Analysis - Analyze ring systems: count aliphatic carbocycles, analyze aromaticity, compute topology, and structure complexity. Use this skill for organic chemistry tasks involving GetAliphaticCarbocyclesNum AromaticityAnalyzer calculate mol topology calculate mol structure complexity. Combines 4 tools from 3 SCP server(s).
alphafold_structure_pipelineAlphaFold Structure Analysis Pipeline - AlphaFold pipeline: download predicted structure, predict pockets, extract sequence, and compute properties. Use this skill for computational biology tasks involving download alphafold structure run fpocket extract pdb sequence calculate pdb basic info. Combines 4 tools from 3 SCP server(s).
antibody_drug_developmentAntibody Drug Development - Develop antibody drug: target protein analysis, biotherapeutic lookup, protein properties, and interaction prediction. Use this skill for biologics tasks involving get uniprotkb entry by accession get biotherapeutic by name ComputeProtPara ComputeHydrophilicity. Combines 4 tools from 3 SCP server(s).
antibody_target_analysisAntibody-Target Analysis - Analyze an antibody target: UniProt protein info, InterPro domains, protein properties, and biotherapeutic data from ChEMBL. Use this skill for immunology tasks involving get uniprotkb entry by accession query interpro ComputeProtPara get biotherapeutic by name. Combines 4 tools from 4 SCP server(s).
atc_drug_classificationATC Drug Classification Lookup - Look up drug in ATC classification: ChEMBL ATC class, FDA drug info, PubChem compound, and mechanism of action. Use this skill for pharmacology tasks involving get atc class by level5 get mechanism of action by drug name get compound by name get drug by name. Combines 4 tools from 3 SCP server(s).
atmospheric-science-calculationsCalculate atmospheric parameters including Coriolis parameter, geostrophic wind, heat index, potential temperature, and dewpoint for meteorology and climate science.
binding_site_characterizationBinding Site Characterization - Characterize binding sites: predict pockets with fpocket and P2Rank, get binding site info from ChEMBL, and visualize. Use this skill for structural biology tasks involving run fpocket pred pocket prank get binding site by id visualize protein. Combines 4 tools from 3 SCP server(s).