molecular-format-conversion

$npx mdskill add InternScience/scp/molecular-format-conversion

Use the same `ChemicalToolsClient` class as defined in the molecular-properties-calculation skill.

SKILL.md

.github/skills/molecular-format-conversionView on GitHub ↗
---
name: molecular-format-conversion
description: Convert between molecular formats including SMILES, InChI, InChIKey, and SELFIES for cheminformatics applications.
license: MIT license
metadata:
    skill-author: PJLab
---

# Molecular Format Conversion

## Usage

### 1. MCP Server Definition

Use the same `ChemicalToolsClient` class as defined in the molecular-properties-calculation skill.

### 2. Molecular Format Conversion Workflow

This workflow converts molecules between different chemical formats (SMILES, InChI, SELFIES) for database searching, machine learning, and cheminformatics applications.

**Workflow Steps:**

1. **SMILES to InChI** - Convert SMILES to International Chemical Identifier
2. **InChI to SMILES** - Convert InChI back to SMILES
3. **SMILES to SELFIES** - Convert to string-based molecular representation

**Implementation:**

```python
## Initialize client
HEADERS = {"SCP-HUB-API-KEY": "<your-api-key>"}

client = ChemicalToolsClient(
    "https://scp.intern-ai.org.cn/api/v1/mcp/31/SciToolAgent-Chem",
    HEADERS
)

if not await client.connect():
    print("connection failed")
    exit()

print("=== Molecular Format Conversion ===\n")

## Test molecules
smiles = "CCO"  # Ethanol
inchi = "InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"

## Step 1: SMILES to InChI
print("Step 1: SMILES to InChI")
result = await client.client.call_tool(
    "SMILESToInChI",
    arguments={"smiles": smiles}
)
result_data = client.parse_result(result)
print(f"SMILES: {smiles}")
print(f"{result_data}\n")

## Step 2: InChI to SMILES
print("Step 2: InChI to SMILES")
result = await client.client.call_tool(
    "InChIToSMILES",
    arguments={"inchi": inchi}
)
result_data = client.parse_result(result)
print(f"InChI: {inchi}")
print(f"{result_data}\n")

## Step 3: SMILES to SELFIES
print("Step 3: SMILES to SELFIES")
result = await client.client.call_tool(
    "SMILEStoSELFIES",
    arguments={"smiles": smiles}
)
result_data = client.parse_result(result)
print(f"SMILES: {smiles}")
print(f"{result_data}\n")

## Step 4: SELFIES to SMILES
print("Step 4: SELFIES to SMILES")
selfies = "[C][C][O]"
result = await client.client.call_tool(
    "SELFIEStoSMILES",
    arguments={"selfies": selfies}
)
result_data = client.parse_result(result)
print(f"SELFIES: {selfies}")
print(f"{result_data}\n")

await client.disconnect()
```

### Tool Descriptions

**SciToolAgent-Chem Server:**
- `SMILESToInChI`: Convert SMILES to InChI
  - Args: `smiles` (str)
  - Returns: InChI string

- `InChIToSMILES`: Convert InChI to SMILES
  - Args: `inchi` (str)
  - Returns: SMILES string

- `SMILEStoSELFIES`: Convert SMILES to SELFIES
  - Args: `smiles` (str)
  - Returns: SELFIES string

- `SELFIEStoSMILES`: Convert SELFIES to SMILES
  - Args: `selfies` (str)
  - Returns: SMILES string

- `InChIToInChIKey`: Convert InChI to InChIKey
  - Args: `inchi` (str)
  - Returns: InChIKey (27-character hash)

- `InChIKeyToInChI`: Convert InChIKey to InChI
  - Args: `inchikey` (str)
  - Returns: InChI string

### Input/Output

**Input:**
- **SMILES**: Simplified Molecular Input Line Entry System (e.g., "CCO")
- **InChI**: IUPAC International Chemical Identifier (e.g., "InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3")
- **InChIKey**: Hashed InChI (27 characters, e.g., "LFQSCWFLJHTTHZ-UHFFFAOYSA-N")
- **SELFIES**: Self-Referencing Embedded Strings (e.g., "[C][C][O]")

**Output:**
- Converted molecular representation in target format

### Format Comparison

| Format | Canonical | Human-Readable | Database-Friendly | ML-Friendly |
|--------|-----------|----------------|-------------------|-------------|
| SMILES | Partial | High | Medium | Medium |
| InChI | Yes | Low | High | Low |
| InChIKey | Yes | No | Very High | No |
| SELFIES | Yes | Low | Low | Very High |

### Use Cases

- **SMILES**: Standard format for most cheminformatics tools
- **InChI**: Canonical representation for database searching
- **InChIKey**: Fast database lookups and duplicate detection
- **SELFIES**: Machine learning models (guarantees valid molecules)

### Format Details

**SMILES (Simplified Molecular Input Line Entry System)**
- Pros: Human-readable, widely supported
- Cons: Not canonical (multiple SMILES for same molecule)
- Example: "CCO", "c1ccccc1", "CC(=O)O"

**InChI (International Chemical Identifier)**
- Pros: Canonical, includes stereochemistry and isotopes
- Cons: Long, not human-readable
- Example: "InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3"

**InChIKey**
- Pros: Fixed-length hash, fast comparison
- Cons: Cannot reconstruct molecule from key
- Example: "LFQSCWFLJHTTHZ-UHFFFAOYSA-N"
- Structure: 14 characters (connectivity) + separator + 8 characters (stereo) + separator + version

**SELFIES (Self-Referencing Embedded Strings)**
- Pros: Always generates valid molecules, ideal for ML
- Cons: Less human-readable, newer format
- Example: "[C][C][O]"
- Used in: Generative models, molecular optimization

### Additional Conversion Tools

- `ConvertSmilesToInchi`: Alternative SMILES to InChI converter
- `GenerateMolKeyFromSmiles`: Generate molecular key
- `InChIKeyToMOL`: Convert InChIKey to MOL file
- `IsValidInChIKey`: Validate InChIKey format

### Error Handling

Some conversions may fail for:
- Invalid input formats
- Unsupported chemical features
- Very large or complex molecules

Always check for errors in the returned data.

More from InternScience/scp

SkillDescription
admet_druglikeness_reportADMET & Drug-Likeness Report - Generate comprehensive ADMET and drug-likeness report: molecular properties, H-bond analysis, hydrophobicity, topology, and ADMET prediction. Use this skill for medicinal chemistry tasks involving calculate mol basic info calculate mol hbond calculate mol hydrophobicity calculate mol topology pred molecule admet. Combines 5 tools from 2 SCP server(s).
affinity_maturationAffinity Maturation Pipeline - Affinity maturation: compute binding affinity, predict mutations, compute hydrophilicity, and predict drug-target interaction. Use this skill for antibody engineering tasks involving ComputeAffinityCalculator zero shot sequence prediction ComputeHydrophilicity PredictDrugTargetInteraction. Combines 4 tools from 3 SCP server(s).
alanine_scanning_pipelineAlanine Scanning Mutagenesis Pipeline - Alanine scanning: design scan, compute properties for each mutant, predict interactions, and compare. Use this skill for protein biochemistry tasks involving AlanineScanningDesigner ComputeProtPara PredictDrugTargetInteraction calculate protein sequence properties. Combines 4 tools from 3 SCP server(s).
aliphatic_ring_analysisRing System Analysis - Analyze ring systems: count aliphatic carbocycles, analyze aromaticity, compute topology, and structure complexity. Use this skill for organic chemistry tasks involving GetAliphaticCarbocyclesNum AromaticityAnalyzer calculate mol topology calculate mol structure complexity. Combines 4 tools from 3 SCP server(s).
alphafold_structure_pipelineAlphaFold Structure Analysis Pipeline - AlphaFold pipeline: download predicted structure, predict pockets, extract sequence, and compute properties. Use this skill for computational biology tasks involving download alphafold structure run fpocket extract pdb sequence calculate pdb basic info. Combines 4 tools from 3 SCP server(s).
antibody_drug_developmentAntibody Drug Development - Develop antibody drug: target protein analysis, biotherapeutic lookup, protein properties, and interaction prediction. Use this skill for biologics tasks involving get uniprotkb entry by accession get biotherapeutic by name ComputeProtPara ComputeHydrophilicity. Combines 4 tools from 3 SCP server(s).
antibody_target_analysisAntibody-Target Analysis - Analyze an antibody target: UniProt protein info, InterPro domains, protein properties, and biotherapeutic data from ChEMBL. Use this skill for immunology tasks involving get uniprotkb entry by accession query interpro ComputeProtPara get biotherapeutic by name. Combines 4 tools from 4 SCP server(s).
atc_drug_classificationATC Drug Classification Lookup - Look up drug in ATC classification: ChEMBL ATC class, FDA drug info, PubChem compound, and mechanism of action. Use this skill for pharmacology tasks involving get atc class by level5 get mechanism of action by drug name get compound by name get drug by name. Combines 4 tools from 3 SCP server(s).
atmospheric-science-calculationsCalculate atmospheric parameters including Coriolis parameter, geostrophic wind, heat index, potential temperature, and dewpoint for meteorology and climate science.
binding_site_characterizationBinding Site Characterization - Characterize binding sites: predict pockets with fpocket and P2Rank, get binding site info from ChEMBL, and visualize. Use this skill for structural biology tasks involving run fpocket pred pocket prank get binding site by id visualize protein. Combines 4 tools from 3 SCP server(s).