Repository logo

Digital molecular representations for reaction prediction and optimization


The properties of molecules can be related to measurable outcomes from chemical reactions, for example, reaction yield, selectivity, or rate. The unique ways we represent molecules computationally can provide insight into how a particular molecular property influences reaction outcome. This dissertation discusses different ways a molecule can be digitally represented, and properties that can be measured from these molecular representations. The first two chapters provide context and a landscape of the current field of digital molecular representations and properties. Chapter three focuses on a specific type of molecular property, the steric properties of molecules, describing the size and shape that molecules occupy in space. Existing steric parameters have proven useful in how the size or steric bulk of a molecule can influence reaction outcome. We address an opportunity in the literature currently unaccounted for by describing the concept of steric proximity. We quantify how near to a reactive site the steric bulk of a molecule lies in two novel steric parameter sets, Sterimol2vec and vol2vec. Chapter four focuses on another class of molecular properties, electronic properties. Commonly used in computational studies, the partial atomic charge is often used in mechanistic studies to justify reactivity at atomic sites, by providing a conceptual medium for the buildup of electronic charge across a molecule, partitioned into its atoms. This study benchmarks and compares different methods for computing partial atomic charge through comparisons with experimentally tabulated Hammett parameters. We find that the choice of method and the atomic position for which the charge is measured is important in relating to the reactivity of the system. Many computational studies rely on programming and computational workflows for data collection and analysis. Chapter five is used to summarize open-source Python tools resulting from this and additional work relating to the collection and analysis of molecular properties. Three programs are summarized. GoodVibes is used to compute and apply corrections to thermochemistry data (entropy, enthalpy, and Gibbs free energy), while automating tasks for computing and visualizing relative thermochemistry potential energy surfaces. DBSTEP is used for computing novel steric parameters described in chapter three, along with existing parameters, Sterimol and percent buried volumes. The final program discussed is Py-X Struct, a program designed to query molecules and substructures in X-ray crystal structures from the Cambridge Structural Database, measuring geometric information like bond distances, angles, and dihedrals between user specified atoms. The final chapter summarizes results and potential future directions for these projects.


2023 Summer.
Includes bibliographical references.

Rights Access

Embargo expires: 08/28/2024.


data chemistry
molecular properties
computational chemistry
Python programming
molecular featurization


Associated Publications