PapayaDB is a curated computational resource designed to organize structural models, molecular simulations and interaction descriptors for cathepsin–glycosaminoglycan complexes.
Cysteine cathepsins are papain-like proteases involved in protein degradation, extracellular matrix remodeling and regulation of proteolytic activity. Their interactions with glycosaminoglycans (GAGs) can influence localization, stability, substrate recognition and enzymatic function.
As cathepsins and GAGs are both involved in extracellular matrix turnover, inflammation and tissue remodeling, their complexes are relevant to pathological processes such as bone resorption disorders, cancer progression, inflammatory diseases and connective tissue remodeling.
Despite the biological relevance of cathepsin–GAG interactions, amount of experimentally resolved structures of these complexes remains limited. GAGs are periodic flexible chemically heterogeneous and highly charged molecules, which makes their binding modes difficult to capture using experimental structural biology alone.
PapayaDB was created to address this gap by providing a systematic in silico dataset of cathepsin–GAG complexes generated with molecular docking and molecular dynamics simulations.
Each PapayaDB record is organized by cathepsin, GAG class, oligosaccharide length and simulation method. Depending on dataset availability, records may include structures, trajectories, plots, contact maps, energy estimates, hydrogen-bond analyses and downloadable files.
Structures of cathepsins, GAG oligosaccharides and predicted complexes.
Binding poses and candidate interaction regions.
Trajectories of cathepsin–GAG complexes showing their behaviour over time.
Quantitative summary in terms of RMSD, contact maps, hydrogen bonds, MM-GBSA and LIE estimates.
Cathepsin name, GAG class, chain length, simulation method, identifiers and available files.
PapayaDB combines several simulation strategies because no single representation captures every relevant feature of cathepsin–GAG recognition. All-atom simulations provide detailed local interactions, RS-REMD improves sampling for longer GAG chains, and coarse-grained simulations enable extended exploration of selected systems.
Represents every atom of the protein, GAG and solvent environment. In PapayaDB, all-atom simulations are used to describe short oligosaccharide complexes in high structural detail.
Used to improve conformational sampling and explore broader structural variability, especially for longer GAG chains.
Reduces molecular detail to enable longer-timescale simulations and broader exploration of interaction patterns.
The all-atom dataset was generated through a multistage workflow combining electrostatic potential analysis, molecular docking, DBSCAN clustering, representative pose selection and molecular dynamics simulations.
Important note: This workflow refers to the all-atom molecular dynamics dataset.
The datasets in PapayaDB were generated using method-specific computational protocols. Parameters are shown to make the records easier to interpret, compare and reuse.
| Docking software: | AutoDock 3.05 |
| Docking grid: | 126 Å × 126 Å × 126 Å, 0.375 Å step |
| Pose selection: | Top 50 docking poses |
| Clustering: | DBSCAN |
| Representative selection: | Up to three representative binding poses selected from the most favorable scoring clusters |
| MD engine: | AMBER 20 |
| Solvent: | TIP3P water model |
| Force field: | ff14SB for protein, GLYCAM06j for GAG |
| Production run: | 50 ns |
| Post-processing: | RMSD, contact maps, hydrogen bonds, MM-GBSA, LIE and per-residue energy decomposition |
| Chain length: | dp16 |
| Coverage: | Each cathepsin–GAG pair |
| Purpose: | Enhanced conformational sampling |
| Protocol details: | Detailed RS-REMD parameters will be provided with the corresponding dataset records. |
| Chain length: | dp6, dp16 |
| GAG class: | Heparin |
| Representation: | SUGRES-compatible heparin representation |
| Purpose: | Extended-timescale exploration |
| Protocol details: | Detailed coarse-grained simulation parameters will be provided with the corresponding dataset records. |
PapayaDB records include simulation-derived descriptors that summarize structural stability, interaction persistence and approximate energetic properties of cathepsin–GAG complexes. These descriptors are intended to support comparative interpretation across cathepsins, GAG classes, chain lengths and simulation methods.
| Descriptor | What it describes | How to interpret it |
|---|---|---|
| Molecular dynamics trajectory | Time-dependent evolution of the molecular system during simulation. | Allows inspection of how the cathepsin–GAG complex behaves over time, including movement, flexibility and stability of the binding mode. |
| Protein RMSD | Root mean square deviation of protein atomic positions relative to a reference structure. | Lower and stable RMSD values suggest that the protein structure remains close to the starting or reference conformation. Larger shifts may indicate conformational rearrangement. |
| GAG RMSD | Root mean square deviation of GAG atomic positions relative to a reference structure. | Helps assess GAG mobility and conformational variability during simulation. Higher values may reflect flexible or changing binding modes. |
| MM-GBSA binding free energy | Approximate binding free-energy estimate calculated from minimized molecular dynamics trajectory structures with implicit solvent treatment. | Useful as a qualitative descriptor of complex stability. Values should be interpreted comparatively, not as direct experimental affinities. |
| Per-residue MM-GBSA energy decomposition | Estimated contribution of individual protein residues to the MM-GBSA binding free energy. | Helps identify residues that contribute favorably or unfavorably to GAG binding and may indicate potential GAG-recognition regions on the cathepsin surface. |
| Linear interaction energy — LIE | Approximate interaction-energy descriptor based on electrostatic and van der Waals interactions extracted directly from the molecular dynamics trajectory. | Provides an additional energetic summary of protein–GAG interaction strength and trends across related systems. |
| Hydrogen bonds | Polar interactions between donor and acceptor atoms, defined using geometric distance and angle criteria. | Persistent hydrogen bonds may indicate specific stabilizing contacts between GAG functional groups and protein residues. |
| Contact maps | Normalized representation of contacts between GAG units and protein residues during simulation. | Values close to 1 indicate contacts maintained for most or all of the simulation. Values close to 0 indicate rare or absent contacts. |
Affinity estimators in PapayaDB, based on MM-GBSA and LIE calculations, are intended for qualitative and comparative interpretation. They should not be treated as direct quantitative measurements of experimental binding affinity.
PapayaDB integrates multiple simulation datasets to support comparison across protein family members, GAG classes, chain lengths and simulation resolutions.
The database coverage numbers describe the total number of simulations and complexes generated within the PapayaDB project. Not all generated records have been fully entered into the public web interface yet. Record upload and validation are ongoing, and the complete dataset is expected to be available by autumn 2026.
cathepsin complexes with dp2, dp4 and dp6 GAGs.
cathepsin complexes with dp16 GAGs
cathepsin complexes with dp6 and dp16 heparin.
PapayaDB is organized to support findability, accessibility, interoperability and reuse of cathepsin–GAG simulation data.
Records are organized by cathepsin, GAG class, chain length, simulation method and identifiers.
Available structures, plots, descriptors and downloadable files are grouped at the record level for direct inspection and reuse.
Where available, records reference external identifiers and standards such as PDB, UniProt, ChEBI, GlyTouCan and GlycoCT.
Transparent protocols, metadata and descriptor definitions support comparison and reuse across related systems.
PapayaDB data are provided as a research-oriented computational resource for browsing, comparison and interpretation of cathepsin–glycosaminoglycan interaction models. The database includes computationally generated structural models, simulation-derived descriptors, plots, metadata and downloadable files.
The data may be used for academic, educational and non-commercial research purposes, provided that PapayaDB and the relevant associated publication or release are cited. Formal citation information will be provided with the first public release or associated publication.
Users should note that PapayaDB binding affinity records are based on in silico models. They are intended for qualitative and comparative interpretation across related systems and should not be treated for direct comparison with experimental binding affinity.
External identifiers and referenced resources, such as PDB, UniProt, ChEBI, GlyTouCan and GlycoCT, remain subject to their own database terms, licenses and citation requirements.
For reuse beyond academic browsing, citation, teaching or non-commercial research, please contact the PapayaDB team.
A curated list of publications related to cathepsin–GAG interactions, molecular docking, molecular dynamics protocols, force fields and FAIR glycomics resources will be provided here.
PapayaDB is currently under active development. The associated manuscript has not yet been submitted. Manuscript submission is planned for October 2026. Until the manuscript and formal citation details are available, please do not cite PapayaDB as a published resource. For questions regarding citation, collaboration or early-stage use of the database, please contact the PapayaDB team at contact@papayadb.org.