News • Reasoning like a human

New LLM prompting strategy boosts AI accuracy in healthcare advice

New study finds that mimicking human intuition helps ChatGPT better identify when patients can safely use self-care.

Portrait photo of Marvin Kopka
Co-author Marvin Kopka from the Division of Ergonomics, Department of Psychology & Ergonomics (IPA) at Technische Universität Berlin.

Image credit: Marvin Kopka

The study, published in JMIR Biomedical Engineering from JMIR Publications, suggests a paradigm shift in prompt engineering: moving away from computer-focused instructions toward strategies rooted in applied psychology. 

As millions of users turn to tools like ChatGPT for health advice, a persistent issue remains: AI often defaults to emergency or professional care recommendations, even for minor issues, out of extreme caution. This over-triage can lead to unnecessary healthcare costs and patient anxiety. 

The research team, led by Marvin Kopka and Markus A. Feufel, tested 10 different ChatGPT models (including the newest GPT-4o and GPT-5 series) using prompts inspired by Naturalistic Decision-Making (NDM). Unlike traditional logic, NDM focuses on how human experts make high-stakes decisions under uncertainty. 

The study utilized two specific psychological frameworks: 

  • Recognition-Primed Decision-Making (RPD): Instructing the AI to match the patient’s symptoms to "ypical cases and mentally simulate the outcome. 
  • Data-Frame Theory: Tasking the AI to build a mental frame of the situation and constantly question it as new data emerges.

Key Results

  • Significant Accuracy Boost: NDM-inspired prompts increased overall accuracy across all models. The most notable gains were in self-care advice, which jumped from a meager 13.4% with standard prompts to nearly 30% with NDM reasoning. 
  • Activating "Thinking" in Simpler Models: Non-reasoning models (which typically failed to identify self-care cases) began providing accurate, nuanced advice when given a "human reasoning blueprint." 
  • Safety Maintained: While the AI became better at identifying when it was safe to stay home, it maintained its high accuracy in identifying true emergencies. 

I hope that applying human decision-making to LLMs will help us develop AI tools that are also useful in real-world decision-making

Marvin Kopka

“When testing AI, we too often give it perfect information and then see that it performs extremely well,” said author Marvin Kopka. “But many problems in the real world are ill-defined. We have good models for how experts make decisions in such situations, so using them as prompts seemed like an obvious next step. I hope that applying human decision-making to LLMs will help us develop AI tools that are also useful in real-world decision-making.” 

The study suggests that in real-world situations, where medical data is often messy or incomplete, a "reasoning blueprint" based on human cognition can be more effective than standard computational logic. By instructing the AI to simulate outcomes and question its own initial "frames" of a situation, the researchers were able to mitigate the common AI tendency toward over-caution. 

While these findings mark a significant step forward in making LLMs more effective partners in clinical decision-making, the team notes that the model is currently best suited for controlled environments. Future research will be essential to determine if these NDM-inspired prompts translate into better decision support for everyday users in non-standardized settings. 

The research was conducted by Marvin Kopka and Markus A. Feufel at the Division of Ergonomics, Department of Psychology & Ergonomics (IPA) at Technische Universität Berlin. Their work focuses on human factors and the safe integration of AI into human decision-making environments. Marvin was recently recognized as one of the five winners of the 2025 JMIR Publications Early Career Researcher Award, an honor that underscores the caliber and impact of the research presented in this study. 


Source: JMIR Publications 

13.05.2026

More on the subject:

Related articles

Photo

News • Behavioural, lifestyle and psychosocial information

“Digital twin” AI links mental health to type 2 diabetes

A new study using a “digital twin” AI model has found that factors such as loneliness, insomnia and poor mental health substantially raise a person’s future risk of developing type 2 diabetes.

Photo

News • Open-source AI models put to the test

LLMs outperform doctors at summarizing complex cancer pathology reports

AI models can generate more complete summaries of complex cancer pathology reports than physicians, according to a new study that tested six models developed by Meta, Google, DeepSeek and Mistral AI.

Photo

News • AI-enabled ambient documentation

Do AI scribes prevent clinician burnout? Yes, but...

AI-enabled ambient documentation shows great promise for reducing doctors' workload – but how big is their impact on burnout prevention really? A new study reveals modest time-saving effects.

Subscribe to Newsletter