Repository logo

A framework to support distributional similarity analysis over arbitrary spatiotemporal scopes at scale

dc.contributor.authorHansen, Paige, author
dc.contributor.authorPallickara, Shrideep, advisor
dc.contributor.authorLee Pallickara, Sangmi, advisor
dc.contributor.authorArabi, Mazdak, committee member
dc.date.accessioned2026-01-12T11:27:39Z
dc.date.issued2025
dc.description.abstractOur methodology leverages a mix of statistical, algorithmic, and systems techniques to enable efficient and memory-resident similarity analysis. Rather than relying on a fixed metric, similarity thresholds are derived from the characteristics of each variable, allowing the measure to remain sensitive to intra-dataset variation. We employ the Jensen–Shannon divergence for its symmetry and boundedness and we also summarize probability density functions as compact 4-tuples that allow navigation through extents based on their degree of similarity. A refinement of this representation further allows differential scaling across dimensions to extend the scope of analysis. The contributions of this thesis are threefold. First, we compute variable-specific thresholds that adapt similarity scoring to the distributional features of the data. Second, we introduce a novel distance-based measure that prunes the search space without compromising accuracy. Third, we demonstrate the ability to perform distributional analyses, both comprehensive and interactive, across arbitrary spatiotemporal scopes, with near real-time calculation of thresholds and similarity estimates. Our empirical benchmarks, with multivariate datasets spanning 50 years of complex, evolving climate phenomena, validate these design choices and underscore the suitability of the methodology for large-scale, longitudinal datasets. Our methodology results in three orders of magnitude speedup over Apache Druid, which is a leading framework for distributional analysis at scale.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierHansen_colostate_0053N_19286.pdf
dc.identifier.urihttps://hdl.handle.net/10217/242670
dc.identifier.urihttps://doi.org/10.25675/3.025562
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectdeclarative queries
dc.subjectspatiotemporally evolving phenomena
dc.subjectdistributional similarity
dc.subjectbig data
dc.titleA framework to support distributional similarity analysis over arbitrary spatiotemporal scopes at scale
dc.typeText
dc.typeImage
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hansen_colostate_0053N_19286.pdf
Size:
264.87 KB
Format:
Adobe Portable Document Format