News • Uncovering human prejudice

How AI can help detect and reduce bias in emergency medicine

Human cognitive biases can particularly affect decision-making when speed is of the essence, such as when lives are at stake in a medical emergency.

Researchers tested an advanced method of generative artificial intelligence (AI), trained with data from patient records corresponding to 480.000 entries to the Bordeaux University Hospital Emergency Department. Its findings, presented at the Machine Learning for Health symposium in Vancouver, and published in Proceedings of Machine Learning Research, show that the AI tested is likely to reproduce and measure caregiver biases relating to patient gender during triage. These results form a case study of how new generative AI algorithms can be used to identify and understand human cognitive biases.

In emergency care settings that demand rapid decision-making, human cognitive biases, particularly “judgment” biases, can critically impact medical decisions and patient prognosis. These “cognitive shortcuts” occur when people are required to form opinions or make decisions based on incomplete or over-generalized information. Decision-making can therefore be unconsciously affected by these biases (related, for example, to sex/gender, age, ethnicity, etc.), and lead to under or overestimating the severity of a patient’s condition.

So how can we better identify these biases and reduce their impact? One answer could be found in AI and particularly generative AI known as “large language models” (LLMs), which are capable of imitating human decision-making thanks to their mastery of human language (such as ChatGPT). These models are in fact capable of effectively understanding the “free-text” that accounts for a large proportion of the clinical data collected by healthcare staff, particularly in hospital emergency departments.

This research shows how large language models can help detect and anticipate human cognitive biases – in this case regarding the goal of fairer and more effective management of medical emergencies
Emmanuel Lagarde

A team led by Inserm Research Director Emmanuel Lagarde at the Bordeaux Population Health Research Center (Inserm/University of Bordeaux), was interested in the potential of these LLMs to detect and quantify gender bias in a rapid decision-making setting. The context used to evaluate this method was the triage of patients in emergency departments. Accurate triage is critical: underestimating an emergency in which treatment is then delayed could worsen prognosis. However, overestimating the severity of a patient’s condition could lead to the overuse of resources, which can be particularly harmful if many other patients are also requiring attention.

The scientists used an innovative approach in which AI was trained to triage patients based on the texts contained in their medical record, thereby reproducing any cognitive biases of the nursing staff having performed this triage. The data used for this training comprised over 480.000 entries to the Emergency Department of Bordeaux University Hospital between January 2013 and December 2021.

Article • From chatbot to medical assistant

Generative AI: prompt solutions for healthcare?

Anyone who has exchanged a few lines of dialogue with a large language model (LLM), will probably agree that generative AI is an impressive new breed of technology. LLMs show great potential in addressing some of the most urgent challenges in healthcare. At the Medica tradefair, several expert sessions were dedicated to generative AI, its potential medical applications and current caveats.

Once trained, the model was capable of assigning a triage score (evaluating the severity of the patient’s condition) based on reading a record, as the nurse would do. The record was then altered in order to change patient gender references in the clinical texts, and a new score was assigned by the model. It was the difference between these two scores, one produced from the original record and the other from the altered record, which made it possible to estimate the cognitive bias.

The results showed the AI to be significantly biased against women. Based on identical clinical records, the severity of their conditions tended to be underestimated compared to those of men (with around 5% classified as “less critical” and 1.81% classified as “more critical”). Conversely, the severity of the men’s conditions tended to be slightly overestimated (with 3.7% deemed “more critical” versus 2.9% deemed “less critical”). This bias increased in line with the level of inexperience of the nursing staff.

“This research shows how large language models can help detect and anticipate human cognitive biases – in this case regarding the goal of fairer and more effective management of medical emergencies,” explains Lagarde. “The method used shows that, in this context, LLMs are able to identify and reproduce the biases that guide human decision-making from the clinical data collected by nursing staff,” adds Ariel Guerra-Adames, doctoral student and first author of this research.

The team will now go on to study the evaluation of biases related to other patient characteristics (age, ethnic group). Ultimately, it should also be possible to refine the system with the introduction of non-verbal variables (facial expressions, tone of voice) which, while not necessarily appearing in the written data, could nevertheless be critical in decision-making.

Source: Inserm

06.03.2025