Exploring the effects of multimodal features on a machine learning knowledge tracker

Khebour, Ibrahim, author; Krishnaswamy, Nikhil, advisor; Blanchard, Nathaniel, advisor; Peterson, Christopher, committee member

Exploring the effects of multimodal features on a machine learning knowledge tracker

dc.contributor.author	Khebour, Ibrahim, author
dc.contributor.author	Krishnaswamy, Nikhil, advisor
dc.contributor.author	Blanchard, Nathaniel, advisor
dc.contributor.author	Peterson, Christopher, committee member
dc.date.accessioned	2026-06-08T10:31:27Z
dc.date.issued	2026
dc.description.abstract	Conversations involve multiple channels of information exchange. Spoken language is the most common, but non-verbal cues such as gestures, body pose, and movements also play a role. These channels carry semantic information but are discrete and harder for machines to detect. Recent advances in multimodal Large Language Models (LLMs) show that incorporating additional modalities can improve performance, raising the question: how much do extra modalities contribute, and what are the limits of continually stacking them? Modeling the flow of conversation remains challenging for AI, particularly in natural, collaborative settings where non-verbal channels are prominent. To address this, TRACE was developed, a multimodal system that monitors shared knowledge in group tasks by tracking utterances, gestures, and actions. The system runs in real time using speech-only features, while an offline version integrates broader modalities, including problem-solving cues from speech, actions, and gestures. This thesis extends the live system by incorporating additional features. Some require training new models to process visual inputs in real time. Since components may differ from the offline version, I will conduct a comparative analysis of both systems. The evaluation will highlight cases where the live version underperforms, as some loss is expected. A comparison with the current live tracker will also measure the impact of new modalities. The Weights Task Dataset will be used for training, testing, and evaluation of action and gesture classification. Automating this process reduces the need for manual annotation and links gestures to broader semantic context, offering substantial value for future work.
dc.format.medium	born digital
dc.format.medium	masters theses
dc.identifier	khebour_colostate_0053N_19339.pdf
dc.identifier.uri	https://hdl.handle.net/10217/244745
dc.identifier.uri	https://doi.org/10.25675/3.027105
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2020-
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	human-computer interactions
dc.subject	multimodality
dc.subject	human-human interactions
dc.subject	common ground
dc.title	Exploring the effects of multimodal features on a machine learning knowledge tracker
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.S.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: khebour_colostate_0053N_19339.pdf
Size:: 1.75 MB
Format:: Adobe Portable Document Format

Download

Collections

2020-
Theses and Dissertations