Case Study: Using Synthetic Datasets to Examine Bias in Machine Learning Algorithms for Resume Screening

Annika Haughey; Brian P. Mann; Siobhan Oca

Download Paper | Permalink

Conference: 2025 ASEE Annual Conference & Exposition
Location: Montreal, Quebec, Canada
Publication Date: June 22, 2025
Start Date: June 22, 2025
End Date: August 15, 2025
Conference Session: Engineering Ethics Division (ETHICS) Technical Session - Ethics in ML/AI
Tagged Division: Engineering Ethics Division (ETHICS)
Tagged Topic: Diversity
Page Count: 13
DOI: 10.18260/1-2--56071
Permanent URL: https://peer.asee.org/56071
Download Count: 3

Paper Authors

biography

Annika Haughey Duke University

visit author page

Graduate Student

visit author page

biography

Brian P. Mann Duke University

visit author page

Dr. Brian Mann is an endowed Professor of Mechanical Engineering at Duke University. He received his BS degree in 1996 from the University of Missouri prior to accepting a position with McDonnell Douglas Corporation. Three years later, he accepted a pos

visit author page

biography

Siobhan Oca Duke University orcid.org/0000-0002-1370-0036

visit author page

Siobhan Rigby Oca is the director of master studies and an assistant professor of the practice in the Thomas Lord Department of Mechanical Engineering and Materials Science at Duke University, NC, USA. She received her B.Sc. from Massachusetts Institute of Technology and Master in Translational Medicine from the Universities of California Berkeley and San Francisco. She completed her Ph.D. in Mechanical Engineering in 2022 from Duke University. Her research interests include applied medical robotics, human robot interaction, and robotics education.

visit author page

Download Paper | Permalink

Abstract

The increasing use of artificial intelligence (AI) in recruitment, particularly through resume screening algorithms [1], has raised significant ethical concerns. These systems, designed to automate the hiring process by filtering and ranking candidates, rely heavily on machine learning (ML) techniques and historical data to make decisions. However, they can unintentionally perpetuate biases present in the data, leading to discriminatory outcomes. A well-known example is Amazon’s hiring tool [2], which was found to favor male candidates over women, a result of being trained on biased historical hiring data.

In this case study, we developed a synthetic dataset designed to mimic the one used in the Amazon case to explore similar bias issues. The dataset consists of artificial resumes generated to reflect a diverse applicant pool. Each resume contained demographic information, previous work history, education, as well as skills and activities. Using this dataset, we trained a machine learning algorithm to rank candidates based on resumes of current employees at a fictional company. This algorithm was trained using code in a Jupyter notebook, which allowed for students to later modify and interact with the algorithm. The dataset and corresponding machine learning algorithm used to train the model are available on GitHub [3].

As with Amazon’s tool, the algorithm began to exhibit biased decision-making, favoring certain demographics over others. We took special care to highlight even with the exclusion of explicit demographic information in the resumes, the algorithm still learned to bias against previously underrepresented groups. This allowed us to bring attention to the ethical implications of deploying such AI tools in the hiring process, and more broadly, the dangers of using AI in any decision-making capacity that can profoundly affect individuals' lives. This exercise taught essential problem-solving techniques for addressing these challenges, equipping students with practical tools for developing and evaluating machine learning algorithms in a more ethical manner.

This exercise serves as an interactive framework for students to engage with real-world ethical dilemmas in AI and machine learning. Through this case study, students learned about the importance of ethical oversight in engineering practices, particularly in the development and application of algorithms. They gained hands-on experience in recognizing and addressing bias in AI systems, offering valuable lessons they can carry into professional practice.

We first introduced this case study in a graduate-level course on ethics in automation. However, the case study can be integrated into various other courses, including engineering ethics, machine learning, and data science. It offers an accessible and engaging way to teach both technical and ethical concepts, making it ideal for undergraduate and graduate courses that emphasize the intersection of technology and ethics.

[1] B. Spar and I. Plentenyuk, "Global Recruiting Trends 2018: The 4 Ideas Changing How You Hire," LinkedIn, Jan. 2018. [Online]. Available: https://news.linkedin.com/2018/1/global-recruiting-trends-2018#:~:text=Recruiters%20and%20hiring%20managers%2C%20globally,and%20nurturing%20candidates%20(55%25).

[2] "Insight: Amazon scraps secret AI recruiting tool that showed bias against women," Reuters, Oct. 2018. [Online]. Available: https://www.reuters.com/article/world/insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK0AG/.

[3] https://github.com/annikaLindstrom/EthicsInAI

Citation
Format

Haughey, A., & Mann, B. P., & Oca, S. (2025, June), Case Study: Using Synthetic Datasets to Examine Bias in Machine Learning Algorithms for Resume Screening Paper presented at 2025 ASEE Annual Conference & Exposition , Montreal, Quebec, Canada . 10.18260/1-2--56071

TY - CPAPER
AB - The increasing use of artificial intelligence (AI) in recruitment, particularly through resume screening algorithms [1], has raised significant ethical concerns. These systems, designed to automate the hiring process by filtering and ranking candidates, rely heavily on machine learning (ML) techniques and historical data to make decisions. However, they can unintentionally perpetuate biases present in the data, leading to discriminatory outcomes. A well-known example is Amazon’s hiring tool [2], which was found to favor male candidates over women, a result of being trained on biased historical hiring data.

In this case study, we developed a synthetic dataset designed to mimic the one used in the Amazon case to explore similar bias issues. The dataset consists of artificial resumes generated to reflect a diverse applicant pool. Each resume contained demographic information, previous work history, education, as well as skills and activities. Using this dataset, we trained a machine learning algorithm to rank candidates based on resumes of current employees at a fictional company. This algorithm was trained using code in a Jupyter notebook, which allowed for students to later modify and interact with the algorithm. The dataset and corresponding machine learning algorithm used to train the model are available on GitHub [3].

As with Amazon’s tool, the algorithm began to exhibit biased decision-making, favoring certain demographics over others. We took special care to highlight even with the exclusion of explicit demographic information in the resumes, the algorithm still learned to bias against previously underrepresented groups. This allowed us to bring attention to the ethical implications of deploying such AI tools in the hiring process, and more broadly, the dangers of using AI in any decision-making capacity that can profoundly affect individuals&#39; lives. This exercise taught essential problem-solving techniques for addressing these challenges, equipping students with practical tools for developing and evaluating machine learning algorithms in a more ethical manner.

This exercise serves as an interactive framework for students to engage with real-world ethical dilemmas in AI and machine learning. Through this case study, students learned about the importance of ethical oversight in engineering practices, particularly in the development and application of algorithms. They gained hands-on experience in recognizing and addressing bias in AI systems, offering valuable lessons they can carry into professional practice.

We first introduced this case study in a graduate-level course on ethics in automation. However, the case study can be integrated into various other courses, including engineering ethics, machine learning, and data science. It offers an accessible and engaging way to teach both technical and ethical concepts, making it ideal for undergraduate and graduate courses that emphasize the intersection of technology and ethics.

[1] B. Spar and I. Plentenyuk, &quot;Global Recruiting Trends 2018: The 4 Ideas Changing How You Hire,&quot; LinkedIn, Jan. 2018. [Online]. Available: https://news.linkedin.com/2018/1/global-recruiting-trends-2018#:~:text=Recruiters%20and%20hiring%20managers%2C%20globally,and%20nurturing%20candidates%20(55%25).

[2] &quot;Insight: Amazon scraps secret AI recruiting tool that showed bias against women,&quot; Reuters, Oct. 2018. [Online]. Available: https://www.reuters.com/article/world/insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK0AG/.

[3] https://github.com/annikaLindstrom/EthicsInAI

AU - Annika Haughey
AU - Brian P. Mann
AU - Siobhan Oca
CY - Montreal, Quebec, Canada
DA - 2025/06/22
PB - ASEE Conferences
TI - Case Study: Using Synthetic Datasets to Examine Bias in Machine Learning Algorithms for Resume Screening
UR - https://peer.asee.org/56071
DO - 10.18260/1-2--56071
ER -