Spatiotemporal anomaly detection: streaming architecture and algorithms
Date
2020
Authors
Siegel, Barry W., author
Labadie, John, advisor
Chong, Edwin, committee member
Maciejewski, Anthony, committee member
Young, Peter, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoft™. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitter™) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases.
Description
Rights Access
Subject
artificial intelligence
anomaly detection