Harnessing spatiotemporal data characteristics to facilitate large-scale analytics over voluminous, high-dimensional observational datasets
dc.contributor.author | Rammer, Daniel P., author | |
dc.contributor.author | Pallickara, Shrideep, advisor | |
dc.contributor.author | Pallickara, Sangmi Lee, advisor | |
dc.contributor.author | Ghosh, Sudipto, committee member | |
dc.contributor.author | Breidt, Jay, committee member | |
dc.date.accessioned | 2022-01-07T11:29:57Z | |
dc.date.available | 2022-01-07T11:29:57Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Spatiotemporal data volumes have increased exponentially alongside a need to extract knowledge from them. We propose a methodology, encompassing a suite of algorithmic and systems innovations, to accomplish spatiotemporal data analysis at scale. Our methodology partitions and distributes data to reconcile the competing pulls of dispersion and load balancing. The dispersion schemes are informed by designing distributed data structures to organize metadata in support of expressive query evaluations and high-throughput data retrievals. Targeted, sequential disk block accesses and data sketching techniques are leveraged for effective retrievals. We facilitate seamless integration into data processing frameworks and analytical engines by building compliance for the Hadoop Distributed File System. A refinement of our methodology supports memory-residency and dynamic materialization of data (or subsets thereof) as DataFrames, Datasets, and Resilient Distributed Datasets. These refinements are backed by speculative prefetching schemes that manage speed differentials across the data storage hierarchy. We extend the data-centric view of our methodology to orchestration of deep learning workloads while preserving accuracy and ensuring faster completion times. Finally, we assess the suitability of our methodology using diverse high-dimensional datasets, myriad model fitting algorithms (including ensemble methods and deep neural networks), and multiple data processing frameworks such as Hadoop, Spark, TensorFlow, and PyTorch. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.identifier | Rammer_colostate_0053A_16814.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/234231 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2020- | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | distributed analytics | |
dc.subject | spatiotemporal | |
dc.subject | distributed systems | |
dc.subject | big data | |
dc.title | Harnessing spatiotemporal data characteristics to facilitate large-scale analytics over voluminous, high-dimensional observational datasets | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Rammer_colostate_0053A_16814.pdf
- Size:
- 2.23 MB
- Format:
- Adobe Portable Document Format