The development of a high-throughput plasmid assembly pipeline
Loading...
Files
Hernandez_colostate_0053A_19263.pdf (2.71 MB)Access status: Embargo until 2027-01-07 ,
Date
Journal Title
Journal ISSN
Volume Title
Abstract
Plasmids are a key tool in every biologist's workbox that are used in research, therapeutic development, and industrial biotechnology. With advances in computational design and machine learning, researchers need to design thousands of plasmids to potentially fulfill a single experimental goal. However, such expansive libraries challenge current assembly methods. Today, there is a need for a scalable, high-throughput whole plasmid construction and verification process that can overcome the current limitations of commercial gene synthesis services. Recent progress in molecular technologies and laboratory automation has increased the volume of samples that can be produced, displacing the bottleneck of plasmid assembly from physical operations to data management. This dissertation addresses some of the challenges that need to be overcome to integrate complex physical operations into a system that can execute plasmid construction workflows predictably and at scale. Controlling the quality of the process is essential to benchmarking future iterations of the plasmid manufacturing process. A bioinformatics tool was developed to streamline the analysis of sequencing data, and the reproducibility of this quality control process was characterized extensively. Algorithms from computer science and electrical engineering were adapted to DNA to develop self-documenting plasmids that are easier to track during the construction process and beyond when the plasmids are used to develop and manufacture a biotechnology project. While the development of an industrial-scale plasmid construction infrastructure is beyond the scope of a doctoral project, it has been possible to identify 10 key data management principles that provide a roadmap to the development of such a system. Similarly, the risks associated with the development of this infrastructure were analyzed to guide the development of a safe and resilient infrastructure. A key focus was placed on the sample and data management for constructs produced, relying on plasmid-centric hybrid sequencing and assembly pipelines to streamline data assembly by combining next-generation and long-read sequencing data to leverage the strengths of each platform to circumvent weaknesses associated with individual technologies. This hybrid strategy enables accurate plasmid assemblies, even within complex libraries. In parallel, verification of plasmid sequences by embedded digital documentation is explored by using DNA storage techniques to store information directly in a plasmid sample. By directly linking physical and digital documentation of samples, verification of a construct becomes easier and more reliable for larger libraries. Such use of digital signatures allows for improved reproducibility across the different construction stages, where samples can directly inform the user of potential mutations within the sequence, even in cases where the sequence is not known to the user. Combined with standardized lab techniques, it becomes easier and more efficient to produce, track, and verify larger plasmid libraries than before. These advances support a shift towards scalable, automation-friendly plasmid assembly workflows. By integrating plasmid-specific, robust sequencing workflows, embedded sequence documentation, and high-throughput strategies, the framework presented here lays the foundation for improved plasmid library construction. This work demonstrates that, as the design space for plasmids expands, proper workflows and strong management infrastructures are essential to ensure fidelity of data, the integrity of samples, and reproducible science overall.
Description
Rights Access
Embargo expires: 01/07/2027.
Subject
molecular biology
sequencing
systems biology
plasmids
high-throughput
synthetic biology
