Browsing by Author "Paton, Robert S., advisor"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access Combining mechanistic and statistical models for predicting reaction outcomes in organic synthesis(Colorado State University. Libraries, 2023) Gallegos, Liliana Cabrera, author; Paton, Robert S., advisor; McNally, Andrew, committee member; Rappé, Anthony, committee member; Hess, Ann, committee memberComputational modeling and machine learning tools have assisted in the fundamental challenge of predicting the "over-the-arrow" optimal reaction conditions to maximize the output (e.g., yield and selectivity). The work presented here explores multiple challenging synthetic reactions for reaction optimization ranging from: (i) precise photocatalytic transformations in chemical biology, (ii) new reactivity using organobismuth(V) reagents, (iii) challenging reversible nucleophilic alcohol addition reactions influence at equilibrium, and (iv) a late-stage key reaction step in a total synthesis project. Overall, this dissertation aims to assist in predicting optimal reaction outcomes by understanding and formulating reaction mechanisms from quantum mechanics and statistical methods while using open-source automated workflows to improve transparency and reproducibility within data-chemistry fields. Chapter 1 provides the necessary background to introduce the methods behind computational and statistical models that assist in addressing the challenges faced within the optimization process and the limitations of each strategy. First, there will be a brief overview of the computational protocols to generate and understand reaction mechanisms using quantum mechanical methods. Then, a summary of the data-driven approach introduces the statistical methods and metrics that build relationships to chemical reactivity using computer-readable mechanistically derived molecular descriptions. Chapter 2 tackles the challenge of studying the chemical reactivity in large biological systems (e.g., peptides and proteins) with quantum mechanical methods. First, the precise photocatalytic functionalization at selenocysteine reaction developed by the Payne lab is simulated using a simplified model substrate followed by a more realistic model that generates the final energy profile. Based on the resulting computational analysis, the utility of this late-stage functionalization reaction is later demonstrated on large polypeptide chains. Chapters 3 and 4 embark on a journey into new bismuth chemistry developed by the Ball group. The bismuth arylation reaction published in Nature transformed the following collaborative work discussed here, ranging from the computational protocols implemented in selectivity problems to the versatile chemical reactivity originating from bismuth(V) reagents. From the previously reported but otherwise unexplored DFT integration grid effects, the computed free energies on organobismuth reactions explored here would have led to significant errors and incorrectly predicting selectivities. With the optimal computational protocols, new reactivity using organobismuth reagents is investigated in Chapter 3 to propose a reaction mechanism for the selective arylation of 2- and 4-pyridiones. Chapter 4 describes the mechanistic investigation of the developed palladium-catalyzed cross-coupling reaction to achieve challenging C-C couplings in mild reaction conditions with the amino-bridged bismacycle reagent. A statistical modeling approach using automated workflows discussed in Chapter 7 is applied here to predict an optimal reaction design and capture the origin of the reactivity for various coupling substrates and modified organobismuth(V)-reagents for the developed Bisma-Stille cross-coupling reaction. Chapter 5 describes a mechanistic investigation to optimize a challenging key reaction in the total synthesis of the natural product of allopupukeanane developed by the Sarpong group. The reaction success in late-stage synthetic plans becomes detrimental as the availability of reactants in a multiple-step natural product synthesis becomes limiting. The elementary step influencing the reactivity is identified in the palladium-mediated cascade reaction. Then, a data-driven approach is implemented to screen various ligands and collect mechanistically derived molecular DFT features to incorporate into a Bayesian optimization tool developed by the Doyle lab. Automated workflows discussed in Chapter 7 were utilized to collect the features. This approach successfully identified more suitable and efficient reaction conditions for racemic mixture, byproduct formation, and catalyst decomposition challenges. The overall synthesis plan to access multiple natural products via the bridged bicycle scaffold highlighted in this chapter is an ongoing project by the Sarpong group. Chapter 6 pivots into data-driven approaches to formulate statistical relationships sampled over small and large datasets. First, the collaborative research in section 6.2 dives into building a multivariate linear regression model with a small dataset to explain the reaction performance in various solvents on the challenging reversible nucleophilic alcohol addition reaction developed by the Bandar group. The statistical conclusions provide the bases for modeling the solvent effects via DFT methods. Next, in section 6.3, a machine learning model is trained on a large diverse molecule dataset to predict NMR chemical shifts with high accuracy to DFT-derived NMR values at only a fraction of the cost of DFT methods. Here are two examples where a successful prediction is evaluated based on the research goal to obtain model accuracy or interpretability. Chapter 7 focuses on facilitating the transparency and reproducibility for collecting and generating meaningful statistical models for the data chemist in low- and high-throughput studies. The open-source, automated workflows, DISCO and REGGAE, allowed for the execution of projects mentioned in Chapters 4 to 6 at different stages of the research process (e.g., chemical data collection, feature selection, and then statistical modeling).Item Open Access Digital molecular representations for reaction prediction and optimization(Colorado State University. Libraries, 2023) Luchini, Guilian W., author; Paton, Robert S., advisor; Rappé, Anthony K., committee member; Bandar, Jeffrey S., committee member; Shipman, Patrick D., committee memberThe properties of molecules can be related to measurable outcomes from chemical reactions, for example, reaction yield, selectivity, or rate. The unique ways we represent molecules computationally can provide insight into how a particular molecular property influences reaction outcome. This dissertation discusses different ways a molecule can be digitally represented, and properties that can be measured from these molecular representations. The first two chapters provide context and a landscape of the current field of digital molecular representations and properties. Chapter three focuses on a specific type of molecular property, the steric properties of molecules, describing the size and shape that molecules occupy in space. Existing steric parameters have proven useful in how the size or steric bulk of a molecule can influence reaction outcome. We address an opportunity in the literature currently unaccounted for by describing the concept of steric proximity. We quantify how near to a reactive site the steric bulk of a molecule lies in two novel steric parameter sets, Sterimol2vec and vol2vec. Chapter four focuses on another class of molecular properties, electronic properties. Commonly used in computational studies, the partial atomic charge is often used in mechanistic studies to justify reactivity at atomic sites, by providing a conceptual medium for the buildup of electronic charge across a molecule, partitioned into its atoms. This study benchmarks and compares different methods for computing partial atomic charge through comparisons with experimentally tabulated Hammett parameters. We find that the choice of method and the atomic position for which the charge is measured is important in relating to the reactivity of the system. Many computational studies rely on programming and computational workflows for data collection and analysis. Chapter five is used to summarize open-source Python tools resulting from this and additional work relating to the collection and analysis of molecular properties. Three programs are summarized. GoodVibes is used to compute and apply corrections to thermochemistry data (entropy, enthalpy, and Gibbs free energy), while automating tasks for computing and visualizing relative thermochemistry potential energy surfaces. DBSTEP is used for computing novel steric parameters described in chapter three, along with existing parameters, Sterimol and percent buried volumes. The final program discussed is Py-X Struct, a program designed to query molecules and substructures in X-ray crystal structures from the Cambridge Structural Database, measuring geometric information like bond distances, angles, and dihedrals between user specified atoms. The final chapter summarizes results and potential future directions for these projects.