AI Bias Reflects and Reinforces Gender and Age Stereotypes in the Workplace
A groundbreaking study published in Nature on October 8 has uncovered troubling evidence of how artificial intelligence systems like ChatGPT perpetuate harmful gender and age biases in professional settings. When researchers asked ChatGPT to generate resumes for candidates with female names such as Allison Baker or Maria Garcia versus male names like Matthew Owens or Joe Alvarez, the AI consistently made female candidates appear 1.6 years younger on average than their male counterparts. More concerningly, in a self-fulfilling cycle, the AI then ranked these artificially younger female applicants as less qualified than male applicants, demonstrating how AI systems can amplify existing societal prejudices.
This bias revealed by the AI doesn’t reflect reality. According to U.S. Census data, male and female employees in the United States are approximately the same age across the workforce. Even more telling, the researchers found that ChatGPT’s bias persisted even in industries where women typically skew older than men, such as sales and service sectors. Computer scientist Danaé Mataxa of the University of Pennsylvania, who wasn’t involved in the study, notes that while discrimination against older women in the workforce has long been recognized, it has been difficult to prove quantitatively until now. Mataxa explains that this “gendered ageism” has significant real-world implications: “It’s a notable and harmful thing for women to see themselves portrayed as if their lifespan has a story arc that drops off in their 30s or 40s.”
The research team employed multiple methodologies to demonstrate how skewed information inputs distort AI outputs. They began by having over 6,000 human coders judge the age of individuals in nearly 1.4 million online images and videos across various occupations. The coders consistently rated women as younger than men, with this bias appearing most strongly in prestigious occupations like doctors and CEOs. This suggests that people perceive older men, but not older women, as authoritative figures. To ensure this wasn’t simply due to visual factors like image filters or cosmetics, the team also analyzed text using nine language models. This analysis revealed that less prestigious job titles like “secretary” or “intern” were linguistically associated with younger females, while prestigious titles such as “chairman” or “director of research” linked with older males.
The researchers then conducted an experiment with over 450 participants to determine if online distortions influence people’s real-world beliefs. One group searched for occupation-related images on Google, uploaded them to a database, labeled them by gender, and estimated the age of those depicted. A control group uploaded random pictures and estimated average ages without visual prompts. The results were clear: participants who viewed pictures of female employees in occupations like mathematics, graphic design, or art teaching estimated the average age in those fields as two years younger than the control group did. Conversely, those who viewed images of male employees estimated the average age as more than half a year older than controls.
When it came to testing ChatGPT directly, the team prompted it to generate resumes for 54 different occupations using 16 female and 16 male names, creating almost 17,300 resumes per gender group. When asked to rank these resumes on a scale from 1 to 100, the AI consistently generated younger, less experienced profiles for women and then gave those resumes lower scores. Study co-author Douglas Guilbeault, a computational social scientist from Stanford University, suggests these findings may help explain the persistence of the glass ceiling for women in the workplace: “Organizations that are trying to be diverse hire young women and they don’t promote them.” Interestingly, the research also showed that these biases harm everyone—ChatGPT scored resumes from young men lower than those from young women.
In an accompanying perspective article, sociologist Ana Macanovic of European University Institute warns that as AI becomes more prevalent in our daily lives, such biases are likely to intensify unless addressed. Guilbeault points out that companies like Google and OpenAI typically approach bias correction one dimension at a time—tackling racism or sexism individually—but this approach fails to address intersecting biases such as those involving gender and age or race and class. “Real discrimination comes from the combination of inequalities,” Guilbeault explains, highlighting how a narrow approach to increasing representation can inadvertently reinforce other stereotypes, such as depicting predominantly wealthy white people and poor Black people online. As AI continues to shape our world, recognizing and addressing these complex, interconnected biases becomes increasingly crucial for building truly fair systems.