Repository logo
 

Embodied multimodal referring expressions generation

dc.contributor.authorAlalyani, Nada H., author
dc.contributor.authorKrishnaswamy, Nikhil, advisor
dc.contributor.authorOrtega, Francisco, committee member
dc.contributor.authorBlanchard, Nathaniel, committee member
dc.contributor.authorWang, Haonan, committee member
dc.date.accessioned2024-09-09T20:52:12Z
dc.date.available2024-09-09T20:52:12Z
dc.date.issued2024
dc.description.abstractUsing both verbal and non-verbal modalities in generating definite descriptions of objects and locations is a critical human capability in collaborative interactions. Despite advancements in AI, embodied interactive virtual agents (IVAs) are not equipped to intelligently mix modalities to communicate their intents as humans do, which hamstrings naturalistic multimodal IVA. We introduce SCMRE, a situated corpus of multimodal referring expressions (MREs) intended for training generative AI systems in multimodal IVA, focusing on multimodal referring expressions. Our contributions include: 1) Developing an IVA platform that interprets human multimodal instructions and responds with language and gestures; 2) Providing 24 participants with 10 scenes, each involving ten equally-sized blocks randomly placed on a table. These interactions generated a dataset of 10,408 samples; 3) Analyzing SCMRE, revealing that the utilization of pointing significantly reduces the ambiguity of prompts and increases the efficiency of IVA's execution of humans' prompts; 4) Augmenting and synthesizing SCMRE, resulting in 22,159 samples to generate more data for model training; 5) Finetuning LLaMA 2-chat-13B for generating contextually-correct and situationally-fluent multimodal referring expressions; 6) Integrating the fine-tuned model into the IVA to evaluate the success of the generative model-enabled IVA in communication with humans; 7) Establishing the evaluation process which applies to both humans and IVAs and combines quantitative and qualitative metrics.
dc.format.mediumborn digital
dc.format.mediumdoctoral dissertations
dc.identifierAlalyani_colostate_0053A_18518.pdf
dc.identifier.urihttps://hdl.handle.net/10217/239275
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2020-
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.titleEmbodied multimodal referring expressions generation
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (Ph.D.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Alalyani_colostate_0053A_18518.pdf
Size:
5.32 MB
Format:
Adobe Portable Document Format