Linear mappings: semantic transfer from transformer models for cognate detection and coreference resolution

Nath, Abhijnan, author; Krishnaswamy, Nikhil, advisor; Blanchard, Nathaniel, committee member; King, Emily J., committee member

Linear mappings: semantic transfer from transformer models for cognate detection and coreference resolution

dc.contributor.author	Nath, Abhijnan, author
dc.contributor.author	Krishnaswamy, Nikhil, advisor
dc.contributor.author	Blanchard, Nathaniel, committee member
dc.contributor.author	King, Emily J., committee member
dc.date.accessioned	2023-01-21T01:24:11Z
dc.date.available	2023-01-21T01:24:11Z
dc.date.issued	2022
dc.description.abstract	Embeddings or vector representations of language and their properties are useful for understanding how Natural Language Processing technology works. The usefulness of embeddings, however, depends on how contextualized or information-rich such embeddings are. In this work, I apply a novel affine (linear) mapping technique first established in the field of computer vision to embeddings generated from large Transformer-based language models. In particular, I study its use in two challenging linguistic tasks: cross-lingual cognate detection and cross-document coreference resolution. Cognate detection for two Low-Resource Languages (LRL), Assamese and Bengali, is framed as a binary classification problem using semantic (embedding-based), articulatory, and phonetic features. Linear maps for this task are extrinsically evaluated on the extent of transfer of semantic information between monolingual as well as multi-lingual models including those specialized for low-resourced Indian languages. For cross-document coreference resolution, whole-document contextual representations are generated for event and entity mentions from cross- document language models like CDLM and other BERT-variants and then linearly mapped to form coreferring clusters based on their cosine similarities. I evaluate my results on gold output based on established coreference metrics like BCUB and MUC. My findings reveal that linearly transforming vectors from one model's embedding space to another carries certain semantic information with high fidelity thereby revealing the existence of a canonical embedding space and its geometric properties for language models. Interestingly, even for a much more challenging task like coreference resolution, linear maps are able to transfer semantic information between "lighter" models or less contextual models and "larger" models with near-equivalent performance or even improved results in some cases.
dc.format.medium	born digital
dc.format.medium	masters theses
dc.identifier	Nath_colostate_0053N_17510.pdf
dc.identifier.uri	https://hdl.handle.net/10217/235958
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2020-
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	coreference resolution
dc.subject	low-resource languages
dc.subject	transformer
dc.subject	language models
dc.subject	affine transformation
dc.subject	semantics
dc.title	Linear mappings: semantic transfer from transformer models for cognate detection and coreference resolution
dc.type	Text
dc.type	Image
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.S.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Nath_colostate_0053N_17510.pdf
Size:: 1.44 MB
Format:: Adobe Portable Document Format

Download

Collections

2020-
Theses and Dissertations