Theses and Dissertations

Permanent URI for this collectionhttps://hdl.handle.net/10217/100389

Browse

Now showing 1 - 2 of 2

Open Access
Looking under the hood: visualizing what LSTMs learn
(Colorado State University. Libraries, 2019) Patil, Dhruva, author; Draper, Bruce, advisor; Beveridge, J. Ross, committee member; Maciejewski, Anthony, committee member
Recurrent Neural Networks (RNNs) such as Long Short Term Memory (LSTM) and Gated Recurrent Units (GRUs) have been successful in many applications involving sequential data. The success of these models lies in the complex feature representations they learn from the training data. One criteria to trust the model is its validation accuracy. However, this can lead to surprises when the network learns properties of the input data, different from what the designer intended and/or the user assumes. As a result, we lack confidence in even high-performing networks when they are deployed in applications with novel input data, or where the cost of failure is very high. Thus understanding and visualizing what recurrent networks have learned becomes essential. Visualizations of RNN models are better established in the field of natural language processing than in computer vision. This work presents visualizations of what recurrent networks, particularly LSTMs, learn in the domain of action recognition, where the inputs are sequences of 3D human poses, or skeletons. The goal of the thesis is to understand the properties learned by a network with regard to an input action sequence, and how it will generalize to novel inputs. This thesis presents two methods for visualizing concepts learned by RNNs in the domain of action recognition, providing an independent insight into the working of the recognition model. The first visualization method shows the sensitivity of joints over time in a video sequence. The second visualization method generates synthetic videos that maximize the responses of a class label or hidden unit within a set of known anatomical constraints. These techniques are combined in a visualization tool called SkeletonVis to help developers and users gain insights into models embedded in RNNs for action recognition. We present case studies on NTU-RGBD, a popular data set for action recognition, to reveal properties learnt by a trained LSTM network.
Open Access
Scalable learning of actions from unlabeled videos
(Colorado State University. Libraries, 2013) O'Hara, Stephen, author; Draper, Bruce A., advisor; Howe, Adele, committee member; Anderson, Charles, committee member; Peterson, Christopher, committee member
Emerging applications in human-computer interfaces, security, and robotics have a need for understanding human behavior from video data. Much of the research in the field of action recognition evaluates methods using relatively small data sets, under controlled conditions, and with a small set of allowable action labels. There are significant challenges in trying to adapt existing action recognition models to less structured and larger-scale data sets. Those challenges include: the recognition of a large vocabulary of actions, the scalability to learn from a large corpus of video data, the need for real-time recognition on streaming video, and the requirement to operate in settings with uncontrolled lighting, a variety of camera angles, dynamic backgrounds, and multiple actors. This thesis focuses on scalable methods for classifying and clustering actions with minimal human supervision. Unsupervised methods are emphasized in order to learn from a massive amount of unlabeled data, and for the potential to retrain models with minimal human intervention when adapting to new settings or applications. Because many applications of action recognition require real-time performance, and training data sets can be large, scalable methods for both learning and detection are beneficial. The specific contributions from this dissertation include a novel method for Approximate Nearest Neighbor (ANN) indexing of general metric spaces and the application of this structure to a manifold-based action representation. With this structure, nearest-neighbor action recognition is demonstrated to be comparable or superior to existing methods, while also being fast and scalable. Leveraging the same metric space indexing mechanism, a novel clustering method is introduced for discovering action exemplars in data.

Browse

Browsing Theses and Dissertations by Subject "action recognition"