New in Communications Chemistry (Nature, 2026) — Our team co-authored a peer-reviewed study that validates what we’ve always believed at Molecular Forecaster: when you build ML tools with real chemistry knowledge, you get results that actually transfer to the real world.
Better Chemistry In, Better Predictions Out
In drug discovery, pKa isn’t just a number — it’s the gatekeeper for solubility, permeability, metabolic stability, and formulation. Get it wrong by even one unit and your lead compound may never make it past the first ADMET filter. Most ML-based pKa predictors on the market today are trained on molecular fingerprints with little chemical insight baked in. They perform well on molecules that look like the training data, but struggle the moment they encounter a novel scaffold — exactly the scenario that matters most in a real drug discovery campaign.
This is the gap our collaborators at McGill University helped us to close with pKaLearn, a new pKa prediction model. The core idea mirrors MFI’s founding philosophy: chemistry at our core. Instead of letting an algorithm self-teach from raw data, our team encoded the same fundamental principles that every medicinal chemist relies on — electronegativity, inductive effects, resonance stabilization, bond polarization — directly into the model’s architecture. The result is a tool that doesn’t just memorize; it understands.
What This Means for Your Pipeline
Head-to-head performance (MAE, lower is better):
pKaLearn 0.59 · Chemprop 0.62 · MolGpKa 0.68 · Epik 0.79–0.83 · Marvin 0.80–0.86 · AP-DNN 1.80
Why This Matters
This isn’t just an academic exercise. pKaLearn is a direct reflection of how we build every tool at Molecular Forecaster — by encoding real chemistry into computational workflows, not by throwing data at a black box and hoping for the best. From FITTED’s mechanism-aware covalent docking to IMPACTS’ CYP metabolism predictions to our GNN ADMET models, the throughline is the same: domain expertise is the feature that matters most.
The study also uncovered that a widely used conjugation feature in RDKit — the backbone of many cheminformatics pipelines — is fundamentally flawed for pKa prediction. Fixing that single definition improved accuracy by 6%. These are the kinds of chemistry-level insights that only come from teams who understand both the code and the science.
Publication details
Genzling, J., Luo, Z., Weiser, B. & Moitessier, N. “Development of a pKa predictor (pKaLearn) by leveraging teaching experience to improve machine learning.” Communications Chemistry (2026). DOI: 10.1038/s42004-026-01983-y Code & data: github.com/MoitessierLab/pKaLearn
Interested in how chemistry-first computational tools can accelerate your pipeline?
Visit us at · molecularforecaster.com
