News • Emergency medicine

Triaging patients: no job for AI (alone)

Doctors and nurses are better at triaging patients in emergency departments than artificial intelligence (AI), according to research presented at the European Emergency Medicine Congress.

Portrait photo of Dr Renata Jukneviciene
Dr Renata Jukneviciene

Image source: EUSEM; credit: Dr Renata Jukneviciene 

However, Dr Renata Jukneviciene, a postdoctoral researcher at Vilnius University, Lithuania, who presented the study, said that AI could be useful when used in conjunction with clinical staff, but should not be used as a stand-alone triage tool. “We conducted this study to address the growing issue of overcrowding in the emergency department and the escalating workload of nurses,” said Dr Jukneviciene. “Given the rapid development of AI tools like ChatGPT, we aimed to explore whether AI could support triage decision-making, improve efficiency and reduce the burden on staff in emergency settings.” 

The researchers distributed a paper and digital questionnaire to six emergency medicine doctors and 51 nurses working in the emergency department of Vilnius University Hospital Santaros Klinikos. They asked them to triage clinical cases selected randomly from 110 reports cited in the PubMed database on the internet. The clinical staff were required to classify the patients according to urgency, placing them in one of five categories from most to least urgent, using the Manchester Triage System. The same cases were analysed by ChatGPT (version 3.5). 

A total of 44 nurses (86.3%) and six doctors (100%) completed the questionnaire. “Overall, AI underperformed compared to both nurses and doctors across most of the metrics we measured,” said Dr Jukneviciene. “For example, AI’s overall accuracy was 50.4%, compared to 65.5% for nurses and 70.6% for doctors. Sensitivity – how well it identified true urgent cases – for AI was also lower at 58.3% compared to nurses, who scored 73.8%, and doctors, who scored 83.0%.” 

Doctors had the highest scores in all the areas and categories of urgency that the researchers analysed. “However, AI did outperform nurses in the first triage category, which are the most urgent cases; it showed better accuracy and specificity, meaning that it identified the truly life-threatening cases. For accuracy, AI scored 27.3% compared to 9.3% for nurses, and for the specificity AI scored 27.8% versus 8.3%.” 

The distribution of cases across the five categories of urgency was as follows: 


1 (most urgent) 5 (least urgent)
Doctors9%21%29%23%18%
Nurses9%15%35%35%6%
AI29%24%43%3%1%

“These results suggest that while AI generally tends to over-triage, it may be somewhat more cautious in flagging critical cases, which can be both a strength and a drawback,” said Dr Jukneviciene. 

Doctors also performed better than AI when considering cases that required or involved surgery, and in cases that required treatment with medication or other non-invasive therapies. For surgical cases, doctors scored 68.4%, nurses scored 63.% and AI scored 39.5% for reliability. For therapeutic cases, doctors scored 65.9%, nurses scored 44.5% and AI did better than nurses, scoring 51.9% for reliability. 

Excessive triaging could lead to inefficiencies, so careful integration and human oversight are crucial. Hospitals should approach AI implementation with caution and focus on training staff to critically interpret AI suggestions

Renata Jukneviciene

“While we anticipated that AI might not outperform experienced clinicians and nurses, we were surprised that in some areas AI performed quite well. In fact, in the most urgent triage category, it demonstrated higher accuracy than nurses. This indicates that AI should not replace clinical judgement, but could serve as a decision-support tool in specific clinical contexts and in overwhelmed emergency departments. AI may assist in prioritising the most urgent cases more consistently and in supporting new or less experienced staff. However, excessive triaging could lead to inefficiencies, so careful integration and human oversight are crucial. Hospitals should approach AI implementation with caution and focus on training staff to critically interpret AI suggestions,” concluded Dr Jukneviciene. 

The researchers are planning follow-up studies using newer versions of AI and AI models that are fine-tuned for medical purposes. They want to test them in larger groups of participants, include ECG interpretation, and explore how AI can be integrated into nurse training, specifically for triage and incidents involving mass casualties. 

Limitations of the study include its small numbers, that it took place in a single centre, and that the AI analysis took place outside a real-time hospital setting, so it was not possible to assess how it could be used in the daily workflow; nor was it possible to interact with patients, assess vital signs and have follow-up data. In addition, ChatGPT 3.5 was not trained specifically for medical use. 

Strengths of the study were that it used real clinical cases for comparison by a multidisciplinary group of doctors and nurses, as well as AI; its accessibility and flexibility was increased by distributing the questionnaire digitally and on paper; it was clinically relevant to current healthcare challenges such as overcrowding and staff shortages in the emergency department; and the study identified that AI over-triages many patients, assigning higher urgency to them, which is crucial knowledge for the safe implementation of AI in emergency departments. 

Dr Barbra Backus is chair of the European Society for Emergency Medicine (EUSEM) abstract selection committee. She is an emergency physician in Amsterdam, The Netherlands, and was not involved in the study. She said: “AI has the potential to be a useful tool for many aspects of medical care and it is already proving its worth in areas such as interpreting x-rays. However, it has its limitations, and this study shows very clearly that it cannot replace trained medical staff for triaging patients coming in to emergency departments. This does not mean it should not be used, as it could aid in speeding up decision-making. However, it needs to be applied with caution and with oversight from doctors and nurses. I expect AI will improve in the future, but should be tested at every stage of development.” 


Source: European Society for Emergency Medicine 

01.10.2025

Related articles

Photo

News • Uncovering human prejudice

How AI can help detect and reduce bias in emergency medicine

Generative AI can detect and quantify cognitive biases of human caregivers during medical emergency situations. This way, LLMs could help reduce the impact of bias based on gender, age, or ethnicity.

Photo

News • Patient-written symptom descriptions

ChatGPT struggles when medical questions are put in layman's terms

Enter symptoms into ChatGPT, receive an accurate diagnosis? Research reveals that LLM AI models are not quite there yet, struggling to identify genetic conditions from patient-written descriptions.

Photo

News • Language barriers for health information

Chatbots get less accurate when health queries are not in English

Chatbots like ChatGPT generally deliver servicable results when asked for healthcare advice. However, new research suggests that the LLM's accuracy drops when languages other than English are used.

Related products

Subscribe to Newsletter