Skip to content

FridrichMethod/mdpp

Repository files navigation

mdpp

Molecular Dynamics Pre- & Post-Processing

A Python toolkit for MD simulation workflows — trajectory analysis, cheminformatics, publication-ready plots, system preparation, and GROMACS/OpenFE automation.

Documentation Python 3.12+ License: MIT Ruff Type checked: mypy

Table of Contents

Highlights

  • Trajectory analysis — RMSD, RMSF, delta-RMSF, DCCM, SASA, radius of gyration, hydrogen bonds, native contacts, pairwise distances, DSSP secondary structure
  • Dimensionality reduction — PCA (with projection), TICA, backbone torsion featurization, free energy surfaces
  • Conformational clustering — RMSD distance matrix, GROMOS algorithm
  • Pluggable compute backends — pairwise distances and RMSD matrix ship with mdtraj / numba / torch / jax / cupy backends; all functions default to mdtraj for API consistency and correctness (only mdtraj supports PBC), opt in to numba for 50x+ CPU speedup or the GPU backends for very large trajectories
  • Cheminformatics — molecular descriptors, PAINS filters, fingerprints (Morgan/ECFP), Tanimoto similarity, Butina clustering
  • Publication-ready plots — one-liner matplotlib figures with proper axis labels and units
  • 2D/3D visualization — molecule structure drawings (RDKit), interactive 3D views (py3Dmol, nglview)
  • System preparation — PDB fixing (OpenMM), pKa prediction (PROPKA), ligand parameterization (RDKit), trajectory merge/slice/subsample
  • GROMACS automation — MDP templates plus analysis, runtime, and post-processing helpers in scripts/gromacs/
  • OpenFE automation — RBFE workflow scripts with SLURM array jobs and checkpoint resumption
  • Typed & tested — full type annotations, frozen dataclass results, Google docstrings

Installation

git clone https://github.com/FridrichMethod/mdpp.git
cd mdpp
pip install -e ".[dev]"
Conda environment with OpenMM support
conda create -n mdpp python=3.12 -y && conda activate mdpp
conda install -c conda-forge pdbfixer -y
pip install -e ".[openmm,dev]"
Optional GPU backends (cupy / torch / jax)
pip install -e ".[gpu]"

Enables the backend="cupy", backend="torch", and backend="jax" options on compute_rmsd_matrix, compute_distances, and featurize_ca_distances. These libraries are optional -- the numba and mdtraj backends work without any GPU dependencies.

Quick Start

Load, analyze, and plot in 3 lines

from mdpp.core import load_trajectory
from mdpp.analysis import compute_rmsd
from mdpp.plots import plot_rmsd

traj = load_trajectory("md.xtc", topology_path="topol.gro")
result = compute_rmsd(traj, atom_selection="backbone")
ax = plot_rmsd(result)

Multi-panel figures

import matplotlib.pyplot as plt
from mdpp.analysis import compute_rmsd, compute_rmsf, compute_dccm
from mdpp.plots import plot_rmsd, plot_rmsf, plot_dccm, plot_fes

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
plot_rmsd(rmsd_result, ax=axes[0, 0])
plot_rmsf(rmsf_result, ax=axes[0, 1])
plot_dccm(dccm_result, ax=axes[1, 0])
plot_fes(fes_result, ax=axes[1, 1])
fig.tight_layout()

Parse GROMACS output

from mdpp.core import read_xvg, read_edr

df = read_xvg("rmsd.xvg")         # XVG → pandas DataFrame
df = read_edr("ener.edr")          # binary EDR → pandas DataFrame

Prepare a protein

from mdpp.prep import fix_pdb, strip_solvent, run_propka

fix_pdb("raw.pdb", "fixed.pdb", pH=7.4)    # add missing atoms & hydrogens
dry = strip_solvent(traj, keep_ions=True)    # remove water
pka = run_propka("protein.pdb")             # predict titratable residue pKa values

Cheminformatics

from mdpp.chem import MolSupplier, calc_descs, gen_fp, calc_sim, is_pains

for mol in MolSupplier("compounds.sdf"):
    descs = calc_descs(mol)                 # molecular descriptors (MW, LogP, TPSA, ...)
    fp = gen_fp(mol, fp_type="ecfp4")       # ECFP4 fingerprint
    print(f"PAINS: {is_pains(mol)}")        # structural alert filter

Package Structure

mdpp
├── core         Trajectory I/O · XVG/EDR parsers · atom selection · alignment
├── analysis     RMSD · RMSF · delta-RMSF · DCCM · SASA · Rg · H-bonds · contacts · DSSP · PCA · TICA · FES · clustering
├── chem         Descriptors · PAINS filters · fingerprints · similarity · molecule file I/O
├── plots        Time series · heatmaps · FES contours · scatter · contact maps · 2D/3D molecules
├── prep         PDB fixing · pKa prediction · ligand topology · trajectory merge/slice/subsample
├── constants    Physical constants (gas constant, default temperature)
└── scripts      Repository shell helpers for GROMACS, OpenFE, and BrownDye

Scripts

cp scripts/gromacs/mdps/charmm/*.mdp ./sim/
cp scripts/gromacs/mdrun/mdprep.sh ./sim/
cp scripts/gromacs/mdrun/mdrun.sh ./sim/

Shell scripts live in scripts/ and are not installed as part of the Python package.

Category Contents
gromacs/mdps Force-field-specific MDP templates for AMBER and CHARMM workflows
gromacs/analysis RMSD, RMSF, DSSP, H-bonds, energy, Rg, SASA, clustering
gromacs/runtime Job status monitor, restart, extend, export
gromacs/compilation GROMACS build scripts (generic + Sherlock HPC)
gromacs/mdenv Environment setup (Sherlock module loads)
gromacs/data_transfer DTN download scripts (Sherlock)
gromacs/postprocessing Trajectory postprocessing
gromacs/visualization PyMOL movie generation
openfe/quickrun RBFE SLURM submission (quickrun.sh, quickrun.sbatch)
openfe/runtime Status checking (check_status.sh) and periodic monitoring (monitor.sbatch)

Design Philosophy

Every analysis function follows the same pattern:

result = compute_something(traj, *, keyword_args...)   # → frozen dataclass
ax = plot_something(result, *, ax=None)                 # → matplotlib Axes
  • Input: md.Trajectory (from MDTraj) or a feature matrix
  • Output: frozen @dataclass with unit-conversion properties (.time_ns, .rmsd_angstrom, etc.)
  • Plotting: pass the result dataclass directly — labels and units are set automatically

Dependencies

Built on the scientific Python ecosystem:

Library Role
MDTraj Trajectory loading & geometry
MDAnalysis XVG auxiliary reader
panedr GROMACS EDR parsing
scikit-learn PCA, clustering
deeptime TICA
RDKit Cheminformatics & ligand topology
Numba Parallel CPU kernels: RMSD matrix (QCP), pairwise distances, similarity
PyTorch · JAX · CuPy Optional GPU backends (install via [gpu] extra)
PROPKA pKa prediction
BioPython PDB chain extraction
matplotlib Static 2D plotting
py3Dmol Interactive 3D molecule views
nglview Interactive 3D trajectory views

Documentation

Full documentation at mdpp.readthedocs.io.

Build locally:

pip install -e ".[docs]"
mkdocs serve                  # http://127.0.0.1:8000

Contributing

# Lint & format
ruff check src/ tests/ --fix
ruff format src/ tests/

# Type check
mypy src/mdpp/

# Run tests
pytest                               # full suite
pytest -m "not benchmark"            # skip perf benchmarks
pytest -m "not gpu"                  # skip GPU backend tests (CPU-only)
pytest -m "benchmark and not slow"   # fast performance checks

# Full pre-commit suite
pre-commit run --all-files

Custom pytest markers: benchmark (timing), slow (>10s), gpu (cupy/torch/jax).

License

MIT — Zhaoyang Li

About

MD simulation pre- and post-processing

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors