Utilizing network features to detect erroneous inputs

Neural networks are vulnerable to a wide range of erroneous inputs such as corrupted, out-of-distribution, misclassified, and adversarial examples. Previously, separate solutions have been proposed for each of these faulty data types; however, in this work I show that the collective set of erroneous inputs can be jointly identified with a single model. Specifically, I train a linear SVM classifier to detect these four types of erroneous data using the hidden and softmax feature vectors of pre-trained neural networks. Results indicate that these faulty data types generally exhibit linearly separable activation properties from correctly processed examples. I am able to identify erroneous inputs with an AUROC of 0.973 on CIFAR10, 0.957 on Tiny ImageNet, and 0.941 on ImageNet. I experimentally validate the findings across a diverse range of datasets, domains, and pre-trained models.
2020 Fall.
