Iterative matrix completion and topic modeling using matrix and tensor factorizations
dc.contributor.author | Kassab, Lara, author | |
dc.contributor.author | Adams, Henry, advisor | |
dc.contributor.author | Fosdick, Bailey, committee member | |
dc.contributor.author | Kirby, Michael, committee member | |
dc.contributor.author | Peterson, Chris, committee member | |
dc.date.accessioned | 2022-01-07T11:30:21Z | |
dc.date.available | 2022-01-07T11:30:21Z | |
dc.date.issued | 2021 | |
dc.description.abstract | With the ever-increasing access to data, one of the greatest challenges that remains is how to make sense out of this abundance of information. In this dissertation, we propose three techniques that take into account underlying structure in large-scale data to produce better or more interpretable results for machine learning tasks. One of the challenges that arise when it comes to analyzing large-scale datasets is missing values in data, which could be challenging to handle without efficient methods. We propose adjusting an iteratively reweighted least squares algorithm for low-rank matrix completion to take into account sparsity-based structure in the missing entries. We also propose an iterative gradient-projection-based implementation of the algorithm, and present numerical experiments showcasing the performance of the algorithm compared to standard algorithms. Another challenge arises while performing a (semi-)supervised learning task on high-dimensional data. We propose variants of semi-supervised nonnegative matrix factorization models and provide motivation for these models as maximum likelihood estimators. The proposed models simultaneously provide a topic model and a model for classification. We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to document classification (e.g., 20 Newsgroups dataset). Lastly, although many datasets can be represented as matrices, datasets also often arise as high-dimensional arrays, known as higher-order tensors. We show that nonnegative CANDECOMP/PARAFAC tensor decomposition successfully detects short-lasting topics in temporal text datasets, including news headlines and COVID-19 related tweets, that other popular methods such as Latent Dirichlet Allocation and Nonnegative Matrix Factorization fail to fully detect. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.identifier | Kassab_colostate_0053A_16853.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/234258 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2020- | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | datasets | |
dc.subject | interpretable results | |
dc.subject | machine learning tasks | |
dc.subject | algorithm | |
dc.subject | matrices | |
dc.subject | higher-order tensors | |
dc.title | Iterative matrix completion and topic modeling using matrix and tensor factorizations | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Mathematics | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Kassab_colostate_0053A_16853.pdf
- Size:
- 3.6 MB
- Format:
- Adobe Portable Document Format