Show simple item record

dc.contributor.advisorPallickara, Shrideep
dc.contributor.advisorPallickara, Sangmi Lee
dc.contributor.authorBudgaga, Walid
dc.contributor.committeememberBen-Hur, Asa
dc.contributor.committeememberSchumacher, Russ
dc.date.accessioned2007-01-03T06:37:15Z
dc.date.available2007-01-03T06:37:15Z
dc.date.issued2014
dc.description2014 Summer.
dc.descriptionIncludes bibliographical references.
dc.description.abstractIn this research work we present an approach encompassing both algorithm and system design to detect anomalies in data streams. Individual observations within these streams are multidimensional, with each dimension corresponding to a feature of interest. We consider time-series geospatial datasets generated by remote and in situ observational devices. Three aspects make this problem particularly challenging: (1) the cumulative volume and rates of data arrivals, (2) anomalies evolve over time, and (3) there are spatio-temporal correlations associated with the data. Therefore, anomaly detections must be accurate and performed in real time. Given the data volumes involved, solutions must minimize user intervention and be amenable to distributed processing to ensure scalability. Our approach achieves accurate, high throughput classications in real time. We rely on Expectation Maximization (EM) to build Gaussian Mixture Models (GMMs) that model the densities of the training data. Rather than one all-encompassing model, our approach involves multiple model instances, each of which is responsible for a particular geographical extent and can also adapt as data evolves. We have incorporated these algorithms into our distributed storage platform, Galileo, and proled their suitability through empirical analysis which demonstrates high throughput (10,000 observations per-second, per-node) and low latency on real-world datasets.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierBudgaga_colostate_0053N_12527.pdf
dc.identifier.urihttp://hdl.handle.net/10217/83908
dc.languageEnglish
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019 - CSU Theses and Dissertations
dc.rightsCopyright of the original work is retained by the author.
dc.subjectbig data
dc.subjectclustering
dc.subjectdata streams
dc.subjectdistributed system
dc.subjectonline anomaly detection
dc.subjecttime series analytics
dc.titleFramework for real-time, autonomous anomaly detection over voluminous time-series geospatial data streams, A
dc.typeText
dcterms.rights.dplaThe copyright and related rights status of this Item has not been evaluated (https://rightsstatements.org/vocab/CNE/1.0/). Please refer to the organization that has made the Item available for more information.
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record