Demonstrating that dataset domains are largely linearly separable in the feature space of common CNNs
Date
2020
Authors
Dragan, Matthew R., author
Beveridge, J. Ross, advisor
Ortega, Francisco, committee member
Peterson, Chris, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
Deep convolutional neural networks (DCNNs) have achieved state of the art performance on a variety of tasks. These high-performing networks require large and diverse training datasets to facilitate generalization when extracting high-level features from low-level data. However, even with the availability of these diverse datasets, DCNNs are not prepared to handle all the data that could be thrown at them. One major challenges DCNNs face is the notion of forced choice. For example, a network trained for image classification is configured to choose from a predefined set of labels with the expectation that any new input image will contain an instance of one of the known objects. Given this expectation it is generally assumed that the network is trained for a particular domain, where domain is defined by the set of known object classes as well as more implicit assumptions that go along with any data collection. For example, some implicit characteristics of the ImageNet dataset domain are that most images are taken outdoors and the object of interest is roughly in the center of the frame. Thus the domain of the network is defined by the training data that is chosen. Which leads to the following key questions: Does a network know the domain it was trained for? and Can a network easily distinguish between in-domain and out-of-domain images? In this thesis it will be shown that for several widely used public datasets and commonly used neural networks, the answer to both questions is yes. The presence of a simple method of differentiating between in-domain and out-of-domain cases has significant implications for work on domain adaptation, transfer learning, and model generalization.