- Main
Fine-Tuning BERT for Sentiment Analysis
- Wang, Michelle Lu
- Advisor(s): Wu, Yingnian
Abstract
The introduction of transformer models have vastly improved the performance of machine learning methods on natural language processing (NLP) tasks. Transformer models use a self-attention mechanism which allow the model to weigh the importance of different words in a sequence when making predictions. They also introduce positional encodings which allow the model to be highly parallelizable, expanding the capacity of data they are able to learn from.
This paper explores the fine-tuning of the pre-trained transformer model BERT (Bidirectional Encoder Representations from Transformers) for sentiment analysis on e-commerce reviews. Traditional fine-tuning approaches which involve updating every parameter of a pre-trained model's hundreds of millions of parameters can be inefficient and unnecessary. A parameter-efficient fine-tuning approach is proposed to enhance the pre-trained BERT's performance in discerning between positive and negative sentiments within the diverse user-generate reviews.
The methodology begins by preprocessing the data, including text cleaning and tokenization, to prepare the dataset for training. Subsequently, fine-tuning and hyperparameter tuning techniques are applied to the model in order to tailor BERT to the specific qualities of the dataset. Smaller subsets of data are fine-tuned on in order to find optimal hyperparameter settings for fine-tuning the full dataset. Three BERT based models will be explored: BERT$_{BASE}$, RoBERTa$_{BASE}$, and DistilBERT. Each model will be fine-tuned and evaluated in order to find the model which achieves the highest test accuracy rate. The paper will also delve into the obstacles of training with a large dataset, proposing solutions and techniques to circumvent the problems.
The findings of this paper show the variations of the models which perform the best greatly depend on the needs of the dataset. The larger dataset analyzed in this paper requires a faster, lighter model in order to process in its entirety. The experiment also explores more robustly optimized and larger models, which yield adequate results using a smaller data subset, but are not suitable for bigger data sets. Hyperparameter settings are shown to affect the performance of the model, but not impact the model in any distinct patterns. The exception to this is the number of epochs the data is trained for, which almost always positively influences model accuracy rates.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-