Something is fishy! - How ambiguous language affects generalization of video action recognition networks

Patil, Dhruva Kishor, author; Beveridge, J. Ross, advisor; Krishnaswamy, Nikhil, advisor; Ortega, Francisco R., committee member; Clegg, Benjamin, committee member

Something is fishy! - How ambiguous language affects generalization of video action recognition networks

dc.contributor.author	Patil, Dhruva Kishor, author
dc.contributor.author	Beveridge, J. Ross, advisor
dc.contributor.author	Krishnaswamy, Nikhil, advisor
dc.contributor.author	Ortega, Francisco R., committee member
dc.contributor.author	Clegg, Benjamin, committee member
dc.date.accessioned	2022-05-30T10:22:48Z
dc.date.available	2022-05-30T10:22:48Z
dc.date.issued	2022
dc.description.abstract	Modern neural networks designed for video action recognition are able to classify video snippets with high degrees of confidence and accuracy. The success of these models lies in the complex feature representations they learn from the training data, but the limitations of these models are rarely linked on a deeper level to the inconsistent quality of the training data. Although newer and better approaches pride themselves on higher evaluation metrics, this dissertation questions whether these networks are recognizing the peculiarities of dataset labels. A reason for these peculiarities lies in the deviation from standardized data collection and curation protocols that ensure quality labels. Consequently, the models may learn data properties that are irrelevant or even undesirable when trained using only a forced choice technique. One solution for these shortcomings is to reinspect the training data and gain better insights towards designing more efficient algorithms. The Something-Something dataset, a popular dataset for video action recognition, has large semantic overlaps both visually as well as linguistically between different labels provided for each video sample. It can be argued that there are multiple possible interpretations of actions in videos and the restriction of one label per video can limit or even negatively impact the network's ability to generalize to even the dataset's own testing data. To validate this claim, this dissertation introduces a human-in-the-loop procedure to review the legacy labels and relabel the Something-Something validation data. When the new labels thus obtained are used to reassess the performance of video action recognition networks, significant gains of almost 12% and 3% in the top-1 and top-5 accuracies respectively are reported. This hypothesis is further validated by visualizing the layer-wise internals of the networks using Grad-CAM to show that the model focuses on relevant salient regions when predicting an action in a video.
dc.format.medium	born digital
dc.format.medium	doctoral dissertations
dc.identifier	Patil_colostate_0053A_17134.pdf
dc.identifier.uri	https://hdl.handle.net/10217/235324
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2020-
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	human-in-the-loop
dc.subject	video action recognition
dc.subject	Grad-CAM
dc.subject	visualization
dc.subject	Something-Something
dc.title	Something is fishy! - How ambiguous language affects generalization of video action recognition networks
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (Ph.D.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Patil_colostate_0053A_17134.pdf
Size:: 4.43 MB
Format:: Adobe Portable Document Format

Download

Collections

2020-
Theses and Dissertations