Ji, Rigesi

Feature Bias in Machine Learning Models: An In-depth Exploration for Software Engineering Tasks

2023

Ji, Rigesi
Advisor(s): Ahmed, Iftekhar

Abstract

The increasing popularity of machine learning techniques in software engineering research promises to improve software development practices by automating various tasks, such as defect prediction, code completion, bug localization and etc. However, the susceptibility of these models to feature bias can significantly affect their performance and reliability. This dissertation delves deep into the realm of machine learning in software engineering, focusing on the profound effects of feature bias on model performance. Feature bias, characterized by the uneven distribution of features in training datasets, can inadvertently skew the results of machine learning models, leading to potential inaccuracies in predictions.

In the first of the three studies, we embark on a journey to uncover the presence and implications of feature bias in software engineering tasks. Our findings are revelatory, indicating that feature bias is not just a theoretical concern but a tangible issue that can significantly hamper the performance of machine learning models. By analyzing both traditional statistical models and advanced deep learning algorithms, we underscore the pervasive nature of feature bias. The implications of these findings are vast, especially when considering the increasing reliance on machine learning models in software engineering. Ensuring the performance and reliability of these models is paramount, and as such, understanding the role of feature bias becomes crucial.

In our second study, rather than viewing feature bias as a mere impediment, we harness its characteristics as an advantage. We delve into the mispredictions of machine learning models, using feature bias as a lens to interpret these inaccuracies. This unique approach allows us to gain deeper insights into the areas where models are most susceptible to errors. Building on these insights, we introduce a novel technique aimed at bolstering model performance, especially in regions that are traditionally vulnerable to mispredictions. This proactive approach not only mitigates the negative effects of feature bias but also leverages it to refine and enhance model accuracy.

In the third segment of our research, we delved into feature bias within Transformer-based models. These recent advancements in machine learning have set benchmarks in various software engineering tasks, such as code clone detection, code generation, and code translation. Central to their functionality is the attention mechanism, which allows them to focus on relevant input segments during training and prediction. Despite their impressive performance, we sought to determine if an 'attention bias' exists during predictions. Our findings highlighted a notable attention bias towards specific source code tokens, potentially affecting their efficacy in software engineering tasks. In response, we devised a strategy to enhance Transformer-based model performance by directing their attention to crucial source code tokens, aiming to bolster their reliability in real-world applications.

Our study underscores the criticality of recognizing and mitigating both feature bias and attention bias when crafting machine learning models for software engineering endeavors. The methodologies we introduced serve to enhance the efficacy and dependability of these models, making them more apt for deployment in practical software engineering scenarios.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Irvine

Feature Bias in Machine Learning Models: An In-depth Exploration for Software Engineering Tasks