Evaluating Stereotypical Biases and Implications for Fairness in Large Language Models

Christina Cao; Danushka Bandara

Download Paper | Permalink

Conference: 2024 ASEE North East Section
Location: Fairfield, Connecticut
Publication Date: April 19, 2024
Start Date: April 19, 2024
End Date: April 20, 2024
Page Count: 10
DOI: 10.18260/1-2--45767
Permanent URL: https://peer.asee.org/45767
Download Count: 630

DANUSHKA BANDARA received the bachelor’s degree in Electrical Engineering from the University of Moratuwa, Sri Lanka, in 2009. He received his master’s and Ph.D. degrees in Computer Engineering and Electrical and Computer Engineering from Syracuse University, Syracuse, NY, USA, in 2013 and 2018, respectively. From 2019 to 2020, he worked as a Data Scientist at Corning Incorporated, Corning, NY, USA. Currently, he is an
Assistant Professor of Computer Science and Engineering
at Fairfield University, Fairfield, CT, USA. His Current research interests include Applied machine learning, Bioinformatics, Human-computer interaction, and Computational social science.

visit author page

Download Paper | Permalink

Abstract

In this study, we investigate the types of stereotypical bias in Large Language Models (LLMs). We highlight the risks of ignoring bias in LLMs, ranging from perpetuating stereotypes to affecting hiring decisions, medical diagnostics, and criminal justice outcomes. To address these issues, we propose a novel approach to evaluate bias in LLMs using metrics developed by Stereoset [1]. Our experiments involve evaluating several proprietary and open-source LLMs (GPT4, GEMINI PRO, OPENCHAT, LLAMA) for stereotypical bias and examining the attributes that influence bias. We used a selected 100 prompts from the stereoset dataset to query the LLMs via their respective APIs. The results were evaluated using the language modeling score, stereotype score and the combination iCAT[1] score. In particular, open source LLMs showed higher levels of bias in handling stereotypes than proprietary LLMs (40% average stereotype score for the open source LLMs and 47% average stereotype score for the proprietary ones: 50% being the ideal, unbiased stereotype score). The language modeling score was even between the models, with the open source models achieving 94% and the proprietary ones 91%. The combined average iCAT score was 76.6% for the proprietary models and 62.5% for the open source models. This disparity in stereotypical bias could be due to the regulatory inspection and user testing through reinforcement learning with human feedback (RLHF) that the proprietary models are subject to. We present our findings and discuss their implications for mitigating bias in LLMs. Overall, this research contributes to the understanding of bias in LLMs and provides insights into strategies for improving fairness and equity in NLP applications.

[1] Nadeem, M., Bethke, A., & Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.

Citation
Format

Cao, C., & Bandara, D. (2024, April), Evaluating Stereotypical Biases and Implications for Fairness in Large Language Models Paper presented at 2024 ASEE North East Section, Fairfield, Connecticut. 10.18260/1-2--45767

Evaluating Stereotypical Biases and Implications for Fairness in Large Language Models

Paper Authors

Christina Cao .

Danushka Bandara Fairfield University

Abstract

Citation

APA

APA - LaTeX bibitem

MLA

MLA - LaTeX bibitem

Bibtex

EndNote - RIS