Improving Recommender Systems via Multimodal Information
Skip to main content
Open Access Publications from the University of California


UCLA Electronic Theses and Dissertations bannerUCLA

Improving Recommender Systems via Multimodal Information


Recommender systems are the backbones of a variety of critical services provided by tech-heavy applications and companies. In social media applications such as Facebook, Instagram, TikTok, and Snapchat, recommender systems of different types are leveraged to suggest the next post, image, or video to users to their satisfaction. Online shopping websites, such as Amazon, eBay, and Taobao, recommend items to users so that they can immediately find what they favor without the need for intensive querying. Due to its outstanding significance, both academia and industry put great effort into developing more powerful recommendation engines.

In this dissertation, we aim at improving recommender systems via different ways of incorporating data from multiple modalities such as the graphical structure of the entity relations, the attributes of entities, and the textual reviews to items from users. We exemplify the process of incorporating multimodal data via five works completed during my Ph.D. study. In these works, we will demonstrate the incorporation of different data modalities for different recommendation scenarios. NeRank focuses on the question routing task that recommends experts to question raisers combining user expertise and structural relations of entities. InterHAt considers the polysemy of features to build an interpretable click-through rate predictor. GEAPR, specialized in point of interest recommendation, decomposes the user motivation by data modalities such as social network, attribute information, and geolocation. The framework of ASPE+APRE presents a possibility to objectively understand the preference of users through what they said rather than what they purchased, clicked, or viewed. Using the objective information, recommender systems can obtain a detailed and fine-grained picture of user interests and item properties. This framework handles the descriptive statements of reviews leaving the comparative statements unattended. Finally, we introduce SAECON that deals with comparative statements and analyzes the reviews with larger coverage.

The research effort demonstrates that incorporating data from multiple modalities can hugely improve the performance of recommendations. In addition, it provides recommendation engines with interpretability to decompose the motivation behind certain user behaviors when using the service. It can be envisioned that the fusion of multimodal data will inspire the development of recommender systems in both academic research and industrial practice.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View