Tian, Yuxin

Deep Representation Learning for Multimodal Data Retrieval

2023

Tian, Yuxin
Advisor(s): Newsam, Shawn

Creative Commons 'BY' version 4.0 license

Abstract

This dissertation explores the potential of deep representation learning in the realm of semantic matching across multimodal data - an area of increasing relevance as digital information becomes progressively diverse. With a core emphasis on enhancing the effectiveness of retrieval systems, this research dives into the intricacies of two specific types of deep representations: invariant representations and multimodal representations.

Invariant representations boost the resilience of deep representations to variations inherent in data, catering to changes such as alterations in image content, the timestamp of image creation, and orientation. This dissertation delves into the development of robust invariant representations that persist despite temporal shifts in the data, highlighting their utility in a variety of real-world applications. The applicability of the invariant representations is further examined in overhead image geolocalization.

Concurrently, multimodal representations aim to establish semantic correspondences across different data modalities, including text, images, tabular data, and more. The study of multimodal representations facilitates many applications such as image and text interleaved search engines and recommendation systems. We propose a framework for learning composed image-text representations. This approach combines visual and textual modalities to enrich the search experience, facilitating image retrieval supplemented by textual feedback. Due to the complexity of the recommendation system, optimizing the retrieval model alone may not always lead to better performance. Thus, we propose a multi-task learning approach for multimodal representation learning to address this challenge, thereby fostering more accurate semantic matching.

By extensively exploring deep representation learning for retrieval tasks, this dissertation illustrates the substantial potential inherent in learning invariant and multimodal representations. As such, it not only advances current understanding and development in this rapidly evolving domain but also lays the groundwork for future research opportunities.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Merced

Deep Representation Learning for Multimodal Data Retrieval