Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Computational Methods for Next-Generation Online Media Ecosystems

Abstract

Human biases have found their way into our digital footprints. Human corpora and human forms of expression mirror biases inherent in societies explicitly or implicitly.

Now, information all around us is tainted with these biases. This has led to severe consequences from a technological perspective. First, social and cultural biases have found their way into technology, and in particular into automated tools that rely on human-generated data leading to discriminatory systems [1]. Secondly, while online information ecosystems provide freedom of expression and give voice to individuals, they have also suffered a wave of disorder due to the prevalence of malevolent online misuse, manifested as hate speech and online misinformation, such as fake news. These problems are motivated by bias and present unprecedented challenges because they "cannot be solved in a traditional linear fashion, since the problem definition evolves as new possible solutions are considered and/or implemented" [2]. In this thesis, we investigate the digital representations of these prejudices including issues

of gender equality and hate speech.

In the first part of the thesis, we begin by analyzing stories of women sharing their harassment experiences and show how targets of gender-based violence utilize social media to shift their cognitive states by leveraging storytelling. We then move into studying gender bias representations in Natural Language Processing. We provide a comprehensive review of current methods that attempt to debias corpora and prevent bias amplification in machine learning models. We then show that current Neural Relation Extraction systems exhibit gender bias.

In the next part of the thesis, we focus on online hate speech and its nuances on social media. In order to design automated hate speech detection systems, we must empirically study existing instances of hate speech. We present the first set of online hate speech studies that investigate hate instigators and hate targets, linguistic properties of directed and generalized hate speech, and online hate communities. As a result of this work, we make publicly available a high precision dataset of 28K tweets, currently the largest Twitter hate speech dataset available to the research community. Our work includes a one of a kind set of analyses pertaining to hate speech that have impacted the design of hate speech detection systems by improving the F-1 score of hate speech detection and classification systems in online social media by an average of 10%.

Our work enables the design of the next-generation hate speech detection systems and gender bias detection and mitigation systems. We conclude with an overview of our key findings as well as a discussion of future research directions inspired by the work in this dissertation.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View