Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Islands and Bridges of Language: Bio-Inspired Structural Analysis of Language Embedding Data

Creative Commons 'BY' version 4.0 license
Abstract

In this thesis, I propose a method of applying an agent-based model named Monte Carlo Physarum Machine (MCPM) to language embedding data. This method has been previously applied in astronomy for inferring the quasi-fractal structure of the cosmic web. In this thesis, I show that this model can provide a distinct scope to understand, analyze and extract information from language embedding data. I assess the novelty of the algorithm rst by identifying the characteristics of the revealed structure through visualization, and generate word similarity metrics in comparison with other status quo similarity metrics. In addition, I propose a visualization tool to further help explore the language embedding space in 3D. As a result, I argue that both the MCPM method and the visualization tool can assist examining the structure of language embedding in the reduced 3D space.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View