Large Language models (LLMs), such as ChatGPT, have gained popularity in recent years with the advancement of Natural Language Processing (NLP), with use cases spanning many disciplines and daily lives as well. Its capability and proficiency in comprehending and generating human- like text has definitely rocketed its popularity with users who may have not utilized nor known of LLMs beforehand. Nevertheless, LLMs inherit explicit and implicit biases from the datasets they were trained on; these biases can include social, ethical, cultural, religious, and other prejudices and stereotypes. The datasets include much of the internet a place where opinions are apparent, articles need not be fact checked, and prejudices may be openly shared. As such, LLM outputs can include obvious and subtle biases. It is important to comprehensively examine such shortcomings by identifying the existence and extent of such biases, recognizing the origin, and attempting to mitigate such biased outputs to ensure fair outputs to reduce harmful stereotypes and misinformation. This study inspects and highlights the need to address biases in LLMs amid growing generative Artificial Intelligence (AI).We utilize bias-specific benchmarks such StereoSet and CrowSPairs to evaluate the existence of various biases in many different generative models such as BERT, GPT 3.5, and ADA. We also propose an automated Bias-Identification Framewok to recognize various social biases in LLMs such as gender, race, profession, and religion. To detect both explicit and implicit biases, we adopt a three-pronged approach for thorough and inclusive analysis. Our analysis began with understanding the overall scope of bias by running some preliminary experiments and noting the apparent presence of bias. With this, we utilized the three-pronged approach and found encoded bias patterns. Results indicate fine-tuned models struggle with gender biases but excel at identifying and avoiding racial biases. This may be due the selection of training data and its limitations including unequal representation and skewed language patterns.
Our findings also illustrated that despite some cases of success, LLMs often over-rely on key- words in prompts and its outputs. This demonstrates the incapability of LLMs to attempt to truly understand the accuracy and authenticity of its outputs. To further illuminate the capability of the analyzed LLMs in detecting biases, we employed Bag-of-Words model analyses to unveil indication of stereotyping within specific words and vocabulary. This highlighted another important aspect of over-reliance of words where the model would actively understand stereotypical words to have a generally negative connotation in its meaning with any words remotely positive ignored.
Finally, in an attempt to bolster model performance, we applied an enhancement learning strategy involving fine-tuning models using different prompting techniques as well as some data augmentation of the bias benchmarks. Different prompting techniques allowed us to understand and make a conclusion whether results would be consistent if the same prompt would be reworded or asked in a different way. We found fine-tuned models to exhibit promising adaptability during cross-dataset testing and significantly enhanced performance on implicit bias benchmarks, with performance gains of up to 20%.