Transformer, diffusion, and GAN-based augmentations for contrastive learning of visual representations

Armstrong, Samuel, author; Pallickara, Sangmi, advisor; Pallickara, Shrideep, advisor; Ghosh, Sudipto, committee member; Breidt, F. Jay, committee member

Transformer, diffusion, and GAN-based augmentations for contrastive learning of visual representations

Files

Armstrong_colostate_0053A_18287.pdf (3.26 MB)

Date

2024

Authors

Armstrong, Samuel, author

Pallickara, Sangmi, advisor

Pallickara, Shrideep, advisor

Ghosh, Sudipto, committee member

Breidt, F. Jay, committee member

Abstract

Generative modeling and self-supervised learning have emerged as two of the most prominent fields of study in machine learning in recent years. Generative models are able to learn detailed visual representations that can then be used to generate synthetic data. Modern self-supervised learning methods are able to extract high-level visual information from images in an unsupervised manner and then apply this information to downstream tasks such as object detection and segmentation. As generative models become more and more advanced, we want to be able to extract their learned knowledge and then apply it to downstream tasks. In this work, we develop Generative Contrastive Learning (GCL), a methodology that uses contrastive learning to extract information from modern generative models. We define GCL's high-level components: an encoder, feature map augmenter, decoder, handcrafted augmenter, and contrastive learning model and demonstrate how to apply GCL to the three major types of large generative models: GANs, Diffusion Models, and Image Transformers. Due to the complex nature of generative models and the near-infinite number of unique images they can produce, we have developed several methodologies to synthesize images in a manner that compliments the augmentation-based learning that is used in contrastive learning frameworks. Our work shows that applying these large generative models to self-supervised learning can be done in a computationally viable manner without the use of large clusters of high-performance GPUs. Finally, we show the clear benefit of leveraging generative models in a contrastive learning setting using standard self-supervised learning benchmarks.

Subject

generative models

machine learning

self-supervised learning

learning representations

computer vision

neural networks

URI

https://hdl.handle.net/10217/238504

Collections

2020-
Theses and Dissertations

Full item page

Transformer, diffusion, and GAN-based augmentations for contrastive learning of visual representations

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Associated Publications

Collections