A framework for real-time, autonomous anomaly detection over voluminous time-series geospatial data streams
dc.contributor.author | Budgaga, Walid, author | |
dc.contributor.author | Pallickara, Shrideep, advisor | |
dc.contributor.author | Pallickara, Sangmi Lee, advisor | |
dc.contributor.author | Ben-Hur, Asa, committee member | |
dc.contributor.author | Schumacher, Russ, committee member | |
dc.date.accessioned | 2007-01-03T06:37:15Z | |
dc.date.available | 2007-01-03T06:37:15Z | |
dc.date.issued | 2014 | |
dc.description.abstract | In this research work we present an approach encompassing both algorithm and system design to detect anomalies in data streams. Individual observations within these streams are multidimensional, with each dimension corresponding to a feature of interest. We consider time-series geospatial datasets generated by remote and in situ observational devices. Three aspects make this problem particularly challenging: (1) the cumulative volume and rates of data arrivals, (2) anomalies evolve over time, and (3) there are spatio-temporal correlations associated with the data. Therefore, anomaly detections must be accurate and performed in real time. Given the data volumes involved, solutions must minimize user intervention and be amenable to distributed processing to ensure scalability. Our approach achieves accurate, high throughput classications in real time. We rely on Expectation Maximization (EM) to build Gaussian Mixture Models (GMMs) that model the densities of the training data. Rather than one all-encompassing model, our approach involves multiple model instances, each of which is responsible for a particular geographical extent and can also adapt as data evolves. We have incorporated these algorithms into our distributed storage platform, Galileo, and proled their suitability through empirical analysis which demonstrates high throughput (10,000 observations per-second, per-node) and low latency on real-world datasets. | |
dc.format.medium | born digital | |
dc.format.medium | masters theses | |
dc.identifier | Budgaga_colostate_0053N_12527.pdf | |
dc.identifier.uri | http://hdl.handle.net/10217/83908 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2000-2019 | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | big data | |
dc.subject | clustering | |
dc.subject | data streams | |
dc.subject | distributed system | |
dc.subject | online anomaly detection | |
dc.subject | time series analytics | |
dc.title | A framework for real-time, autonomous anomaly detection over voluminous time-series geospatial data streams | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science (M.S.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Budgaga_colostate_0053N_12527.pdf
- Size:
- 928.86 KB
- Format:
- Adobe Portable Document Format
- Description: