Embodied multimodal referring expressions generation
dc.contributor.author | Alalyani, Nada H., author | |
dc.contributor.author | Krishnaswamy, Nikhil, advisor | |
dc.contributor.author | Ortega, Francisco, committee member | |
dc.contributor.author | Blanchard, Nathaniel, committee member | |
dc.contributor.author | Wang, Haonan, committee member | |
dc.date.accessioned | 2024-09-09T20:52:12Z | |
dc.date.available | 2024-09-09T20:52:12Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Using both verbal and non-verbal modalities in generating definite descriptions of objects and locations is a critical human capability in collaborative interactions. Despite advancements in AI, embodied interactive virtual agents (IVAs) are not equipped to intelligently mix modalities to communicate their intents as humans do, which hamstrings naturalistic multimodal IVA. We introduce SCMRE, a situated corpus of multimodal referring expressions (MREs) intended for training generative AI systems in multimodal IVA, focusing on multimodal referring expressions. Our contributions include: 1) Developing an IVA platform that interprets human multimodal instructions and responds with language and gestures; 2) Providing 24 participants with 10 scenes, each involving ten equally-sized blocks randomly placed on a table. These interactions generated a dataset of 10,408 samples; 3) Analyzing SCMRE, revealing that the utilization of pointing significantly reduces the ambiguity of prompts and increases the efficiency of IVA's execution of humans' prompts; 4) Augmenting and synthesizing SCMRE, resulting in 22,159 samples to generate more data for model training; 5) Finetuning LLaMA 2-chat-13B for generating contextually-correct and situationally-fluent multimodal referring expressions; 6) Integrating the fine-tuned model into the IVA to evaluate the success of the generative model-enabled IVA in communication with humans; 7) Establishing the evaluation process which applies to both humans and IVAs and combines quantitative and qualitative metrics. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.identifier | Alalyani_colostate_0053A_18518.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/239275 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2020- | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.title | Embodied multimodal referring expressions generation | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Alalyani_colostate_0053A_18518.pdf
- Size:
- 5.32 MB
- Format:
- Adobe Portable Document Format