Data-driven strategies for organic structure-property and structure-reactivity relationships
Date
2024
Journal Title
Journal ISSN
Volume Title
Abstract
The prediction of molecular properties plays a pivotal role in various domains, from drug discovery to materials science. With the advent of machine learning (ML) techniques, particularly in the field of cheminformatics, the prediction of properties for small organic molecules has witnessed significant advancements. This document delves into the diverse machine-learning strategies employed for the accurate prediction of properties crucial for understanding molecular behavior. In Chapter 1, I offer insights into the evolution of data-driven modeling through Quantitative Structure-Property Relationships (QSPR), highlighting promising advancements in utilizing chemical features to construct predictive models for molecular properties. In Chapter 2, I delve into the primary stage of modeling, focusing on data collection for predictive tasks. I illustrate how the integration of automation and computational tools' advancement can construct modular workflows for FAIR (Findable, Accessible, Interoperable, and Reusable) chemistry. This approach aims to enhance the usability and reproducibility of scientific data. In Chapter 3, I emphasize leveraging computational tools to access high-level data for small organic molecules. I showcase the creation of a novel metric for assessing organic radical stability, utilizing a comprehensive chemical database of radicals. This involves employing straightforward physical organic descriptors, namely fractional spin, and buried volume, computed through systematic computational workflows. In Chapter 4, I explore the progression of graph-based models designed to forecast molecular properties, specifically Bond Dissociation Energy. Additionally, I conduct a thorough examination of two particular applications pertinent to pharmaceutical and atmospheric chemistry. I demonstrate that utilizing a minimal number of molecules from the relevant chemical space can notably enhance large-scale machine-learning models. Finally, in Chapter 5, I combine the developed tools from Chapters 3 and 4, to perform goal-directed molecular optimization in identifying novel radicals for aqueous redox flow batteries using graph neural networks (radical stability, redox potentials, and bond dissociation energy) and reinforcement learning. This de novo molecular optimization strategy has successfully identified 32 new radical candidates. By amalgamating insights from diverse studies, this dissertation endeavors to offer a comprehensive grasp of how machine-learning strategies are transforming the terrain of molecular property prediction.
Description
Rights Access
Embargo expires: 12/20/2025.
Subject
computational chemistry
cheminformatics
machine learning