Optimizing text analytics and document automation with meta-algorithmic systems engineering

Villanueva, Arturo N., Jr., author; Simske, Steven J., advisor; Hefner, Rick D., committee member; Krishnaswamy, Nikhil, committee member; Miller, Erika, committee member; Roberts, Nicholas, committee member

Optimizing text analytics and document automation with meta-algorithmic systems engineering

Files

VillanuevaJr_colostate_0053A_17799.pdf (2.11 MB)

Date

2023

Authors

Villanueva, Arturo N., Jr., author

Simske, Steven J., advisor

Hefner, Rick D., committee member

Krishnaswamy, Nikhil, committee member

Miller, Erika, committee member

Roberts, Nicholas, committee member

Abstract

Natural language processing (NLP) has seen significant advances in recent years, but challenges remain in making algorithms both efficient and accurate. In this study, we examine three key areas of NLP and explore the potential of meta-algorithmics and functional analysis for improving analytic and machine learning performance and conclude with expansions for future research. The first area focuses on text classification for requirements engineering, where stakeholder requirements must be classified into appropriate categories for further processing. We investigate multiple combinations of algorithms and meta-algorithms to optimize the classification process, confirming the optimality of Naïve Bayes and highlighting a certain sensitivity to the Global Vectors (GloVe) word embeddings algorithm. The second area of focus is extractive summarization, which offers advantages to abstractive summarization due to its lossless nature. We propose a second-order meta-algorithm that uses existing algorithms and selects appropriate combinations to generate more effective summaries than any individual algorithm. The third area covers document ordering, where we propose techniques for generating an optimal reading order for use in learning, training, and content sequencing. We propose two main methods: one using document similarities and the other using entropy against topics generated through Latent Dirichlet Allocation (LDA).

Subject

extractive summarization

meta-algorithmics

text classification

functional analysis

document ordering

natural language processing

URI

https://hdl.handle.net/10217/236997
https://doi.org/10.25675/3.02644

Collections

2020-
Theses and Dissertations

Full item page

Optimizing text analytics and document automation with meta-algorithmic systems engineering

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By