bio-pdb-geometric-analysis
$
npx mdskill add GPTomics/bioSkills/bio-pdb-geometric-analysisPerform geometric analysis on protein structures using Biopython
- Measure distances, angles, dihedrals, RMSD, and solvent accessible surface area
- Uses Biopython Bio.PDB, numpy, and PDB files for structure data
- Analyzes atomic coordinates to calculate geometric properties and structural similarities
- Returns numerical results and structural insights for further analysis
SKILL.md
.github/skills/bio-pdb-geometric-analysisView on GitHub ↗
---
name: bio-pdb-geometric-analysis
description: Perform geometric calculations on protein structures using Biopython Bio.PDB. Use when measuring distances, angles, and dihedrals, superimposing structures, calculating RMSD, or computing solvent accessible surface area (SASA).
tool_type: python
primary_tool: Bio.PDB
---
## Version Compatibility
Reference examples tested with: BioPython 1.83+, numpy 1.26+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Geometric Analysis
**"Calculate RMSD between two protein structures"** → Measure atomic distances/angles/dihedrals, superimpose structures, compute RMSD, and find inter-residue contacts.
- Python: `Bio.PDB.Superimposer()` for RMSD, `NeighborSearch` for contacts
Measure distances, angles, and dihedrals. Superimpose structures and calculate RMSD. Find neighbor atoms and contacts.
## Required Imports
```python
from Bio.PDB import PDBParser, NeighborSearch, Superimposer
from Bio.PDB import calc_angle, calc_dihedral
import numpy as np
```
## Distance Between Atoms
```python
from Bio.PDB import PDBParser
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
chain = structure[0]['A']
atom1 = chain[100]['CA']
atom2 = chain[200]['CA']
# Direct subtraction gives distance
distance = atom1 - atom2
print(f'Distance: {distance:.2f} Angstroms')
# Or use numpy
import numpy as np
distance = np.linalg.norm(atom1.coord - atom2.coord)
```
## Distance Matrix
```python
import numpy as np
from Bio.PDB import PDBParser
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
ca_atoms = [r['CA'] for r in structure.get_residues() if r.has_id('CA') and r.id[0] == ' ']
n = len(ca_atoms)
dist_matrix = np.zeros((n, n))
for i in range(n):
for j in range(i + 1, n):
dist = ca_atoms[i] - ca_atoms[j]
dist_matrix[i, j] = dist
dist_matrix[j, i] = dist
print(f'Distance matrix shape: {dist_matrix.shape}')
```
## Angle Between Three Atoms
```python
from Bio.PDB import PDBParser, calc_angle
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
residue = structure[0]['A'][100]
n = residue['N']
ca = residue['CA']
c = residue['C']
# calc_angle takes Vector objects (atom.coord returns array, use atom.get_vector())
angle_rad = calc_angle(n.get_vector(), ca.get_vector(), c.get_vector())
angle_deg = np.degrees(angle_rad)
print(f'N-CA-C angle: {angle_deg:.1f} degrees')
```
## Dihedral Angles
```python
from Bio.PDB import PDBParser, calc_dihedral
import numpy as np
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
chain = structure[0]['A']
# Calculate phi angle (C-N-CA-C)
res_prev = chain[99]
res_curr = chain[100]
phi = calc_dihedral(
res_prev['C'].get_vector(),
res_curr['N'].get_vector(),
res_curr['CA'].get_vector(),
res_curr['C'].get_vector()
)
print(f'Phi: {np.degrees(phi):.1f} degrees')
# Calculate psi angle (N-CA-C-N)
res_next = chain[101]
psi = calc_dihedral(
res_curr['N'].get_vector(),
res_curr['CA'].get_vector(),
res_curr['C'].get_vector(),
res_next['N'].get_vector()
)
print(f'Psi: {np.degrees(psi):.1f} degrees')
```
## Ramachandran Angles for All Residues
```python
from Bio.PDB import PDBParser, PPBuilder, calc_dihedral
import numpy as np
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
ppb = PPBuilder()
phi_psi = []
for pp in ppb.build_peptides(structure):
angles = pp.get_phi_psi_list()
for residue, (phi, psi) in zip(pp, angles):
if phi is not None and psi is not None:
phi_psi.append((residue.resname, np.degrees(phi), np.degrees(psi)))
for name, phi, psi in phi_psi[:10]:
print(f'{name}: phi={phi:.1f}, psi={psi:.1f}')
```
## Finding Neighbor Atoms
```python
from Bio.PDB import PDBParser, NeighborSearch
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Build search tree from all atoms
all_atoms = list(structure.get_atoms())
ns = NeighborSearch(all_atoms)
# Find atoms within radius of a point
center = structure[0]['A'][100]['CA'].coord
radius = 5.0 # Angstroms
neighbors = ns.search(center, radius)
print(f'Found {len(neighbors)} atoms within {radius}A')
# Find atoms within radius of another atom
ca_atom = structure[0]['A'][100]['CA']
neighbors = ns.search(ca_atom.coord, 5.0)
```
## Finding Residue Contacts
```python
from Bio.PDB import PDBParser, NeighborSearch
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
all_atoms = list(structure.get_atoms())
ns = NeighborSearch(all_atoms)
# Find all atom pairs within distance
contact_distance = 4.0
contacts = ns.search_all(contact_distance, level='R') # R = residue level
print(f'Found {len(contacts)} residue contacts within {contact_distance}A')
for res1, res2 in contacts[:10]:
print(f' {res1.resname}{res1.id[1]} - {res2.resname}{res2.id[1]}')
```
## Contact Levels
```python
from Bio.PDB import NeighborSearch
# search_all returns pairs at specified level
# Level codes: A=atom, R=residue, C=chain, M=model, S=structure
atom_contacts = ns.search_all(4.0, level='A') # Atom pairs
residue_contacts = ns.search_all(4.0, level='R') # Residue pairs
chain_contacts = ns.search_all(10.0, level='C') # Chain pairs
```
## Superimposing Structures
```python
from Bio.PDB import PDBParser, Superimposer
parser = PDBParser(QUIET=True)
ref_structure = parser.get_structure('ref', 'reference.pdb')
mobile_structure = parser.get_structure('mobile', 'mobile.pdb')
# Get CA atoms from both structures
ref_atoms = [r['CA'] for r in ref_structure.get_residues() if r.has_id('CA') and r.id[0] == ' ']
mobile_atoms = [r['CA'] for r in mobile_structure.get_residues() if r.has_id('CA') and r.id[0] == ' ']
# Ensure same number of atoms
n = min(len(ref_atoms), len(mobile_atoms))
ref_atoms = ref_atoms[:n]
mobile_atoms = mobile_atoms[:n]
# Superimpose
sup = Superimposer()
sup.set_atoms(ref_atoms, mobile_atoms)
print(f'RMSD before: {sup.rms:.2f} Angstroms')
# Apply transformation to all atoms in mobile structure
sup.apply(mobile_structure.get_atoms())
```
## Calculating RMSD
```python
from Bio.PDB import PDBParser, Superimposer
import numpy as np
parser = PDBParser(QUIET=True)
struct1 = parser.get_structure('s1', 'structure1.pdb')
struct2 = parser.get_structure('s2', 'structure2.pdb')
atoms1 = [r['CA'] for r in struct1.get_residues() if r.has_id('CA') and r.id[0] == ' ']
atoms2 = [r['CA'] for r in struct2.get_residues() if r.has_id('CA') and r.id[0] == ' ']
# Using Superimposer (with alignment)
sup = Superimposer()
sup.set_atoms(atoms1, atoms2)
rmsd_aligned = sup.rms
print(f'RMSD (aligned): {rmsd_aligned:.2f} A')
# Raw RMSD (no alignment)
coords1 = np.array([a.coord for a in atoms1])
coords2 = np.array([a.coord for a in atoms2])
rmsd_raw = np.sqrt(np.mean(np.sum((coords1 - coords2) ** 2, axis=1)))
print(f'RMSD (raw): {rmsd_raw:.2f} A')
```
## CEAligner for Dissimilar Structures
```python
from Bio.PDB import PDBParser, CEAligner
parser = PDBParser(QUIET=True)
ref = parser.get_structure('ref', 'reference.pdb')
mobile = parser.get_structure('mobile', 'query.pdb')
# CE alignment works for structures with different sequences
aligner = CEAligner()
aligner.set_reference(ref)
aligner.align(mobile)
print(f'RMSD: {aligner.rms:.2f} A')
# CEAligner automatically modifies mobile structure coordinates
```
## Center of Mass
```python
from Bio.PDB import PDBParser
import numpy as np
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Unweighted center (geometric center)
coords = np.array([a.coord for a in structure.get_atoms()])
center = coords.mean(axis=0)
print(f'Geometric center: {center}')
# Mass-weighted center (approximate - uses atom count)
# For accurate mass-weighted, use element masses
atoms = list(structure.get_atoms())
masses = {'C': 12.0, 'N': 14.0, 'O': 16.0, 'S': 32.0, 'H': 1.0}
total_mass = 0
weighted_sum = np.zeros(3)
for atom in atoms:
mass = masses.get(atom.element, 12.0)
weighted_sum += mass * atom.coord
total_mass += mass
center_of_mass = weighted_sum / total_mass
print(f'Center of mass: {center_of_mass}')
```
## Radius of Gyration
```python
import numpy as np
from Bio.PDB import PDBParser
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
coords = np.array([a.coord for a in structure.get_atoms()])
center = coords.mean(axis=0)
# Radius of gyration
rg = np.sqrt(np.mean(np.sum((coords - center) ** 2, axis=1)))
print(f'Radius of gyration: {rg:.2f} A')
```
## Finding Surface Residues
```python
from Bio.PDB import PDBParser, NeighborSearch
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Simple approach: residues with few neighbors
all_atoms = list(structure.get_atoms())
ns = NeighborSearch(all_atoms)
surface_residues = []
for residue in structure.get_residues():
if residue.id[0] != ' ':
continue
if not residue.has_id('CA'):
continue
ca = residue['CA']
neighbors = ns.search(ca.coord, 10.0, level='R')
if len(neighbors) < 15: # Threshold for surface
surface_residues.append(residue)
print(f'Surface residues: {len(surface_residues)}')
```
## Vector Operations
```python
from Bio.PDB import PDBParser
from Bio.PDB.vectors import Vector, rotaxis
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Get vectors from atoms
atom1 = structure[0]['A'][100]['CA']
atom2 = structure[0]['A'][101]['CA']
v1 = atom1.get_vector()
v2 = atom2.get_vector()
# Vector operations
diff = v2 - v1
print(f'Distance vector: {diff}')
print(f'Length: {diff.norm():.2f}')
# Normalize
unit = diff.normalized()
# Cross product
cross = v1 ** v2
# Dot product
dot = v1 * v2
```
## Chi Angles (Sidechain Dihedrals)
```python
from Bio.PDB import PDBParser, calc_dihedral
import numpy as np
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Chi1 angle for a residue (N-CA-CB-CG)
residue = structure[0]['A'][100]
if residue.has_id('CB') and residue.has_id('CG'):
chi1 = calc_dihedral(
residue['N'].get_vector(),
residue['CA'].get_vector(),
residue['CB'].get_vector(),
residue['CG'].get_vector()
)
print(f'Chi1: {np.degrees(chi1):.1f} degrees')
```
## Solvent Accessible Surface Area (SASA)
```python
from Bio.PDB import PDBParser
from Bio.PDB.SASA import ShrakeRupley
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Calculate SASA using Shrake-Rupley algorithm
sr = ShrakeRupley()
sr.compute(structure, level='S') # S=structure, M=model, C=chain, R=residue, A=atom
# Access total SASA
print(f'Total SASA: {structure.sasa:.2f} A^2')
# Access per-residue SASA
for residue in structure.get_residues():
if hasattr(residue, 'sasa'):
print(f'{residue.resname}{residue.id[1]}: {residue.sasa:.2f} A^2')
```
## SASA with Custom Parameters
```python
from Bio.PDB import PDBParser
from Bio.PDB.SASA import ShrakeRupley
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
# Higher precision (more points = slower but more accurate)
sr = ShrakeRupley(probe_radius=1.4, n_points=960)
sr.compute(structure, level='R')
# Per-atom SASA
for atom in structure.get_atoms():
print(f'{atom.name}: {atom.sasa:.2f} A^2')
```
## Identifying Buried vs Exposed Residues
```python
from Bio.PDB import PDBParser
from Bio.PDB.SASA import ShrakeRupley
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', 'protein.pdb')
sr = ShrakeRupley()
sr.compute(structure, level='R')
buried = []
exposed = []
for residue in structure.get_residues():
if residue.id[0] != ' ':
continue
if hasattr(residue, 'sasa'):
if residue.sasa < 10.0: # Threshold in A^2
buried.append(residue)
else:
exposed.append(residue)
print(f'Buried: {len(buried)}, Exposed: {len(exposed)}')
```
## Related Skills
- structure-io - Parse and write structure files
- structure-navigation - Access chains, residues, atoms
- structure-modification - Transform coordinates, modify structures
- alignment/pairwise-alignment - Sequence alignment for structure comparison
More from GPTomics/bioSkills
- bio-admet-predictionPredicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
- bio-alignment-amplicon-clippingTrim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
- bio-alignment-filteringFilter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
- bio-alignment-indexingCreate and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
- bio-alignment-ioRead, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
- bio-alignment-msa-parsingParse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
- bio-alignment-msa-statisticsCalculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
- bio-alignment-multiplePerform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
- bio-alignment-pairwisePerform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
- bio-alignment-sortingSort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.