Lab-trained pathology AI meets real world: ‘mistakes can happen’

Human pathologists are extensively trained to detect when tissue samples from one patient mistakenly end up on another patient’s microscope slides (a problem known as tissue contamination).

But such contamination can easily confuse artificial intelligence (AI) models, which are often trained in pristine, simulated environments, reports a new Northwestern Medicine study. The findings were published in the journal Modern Pathology. It marks the first study to examine how tissue contamination affects machine-learning models.

Our findings serve as a reminder that AI that works incredibly well in the lab may fall on its face in the real world. Patients should continue to expect that a human expert is the final decider on diagnoses made on biopsies and other tissue samples
Jeffery Goldstein

“We train AIs to tell ‘A’ versus ‘B’ in a very clean, artificial environment, but, in real life, the AI will see a variety of materials that it hasn’t trained on. When it does, mistakes can happen,” said corresponding author Dr. Jeffery Goldstein, director of perinatal pathology and an assistant professor of perinatal pathology and autopsy at Northwestern University Feinberg School of Medicine. “Our findings serve as a reminder that AI that works incredibly well in the lab may fall on its face in the real world. Patients should continue to expect that a human expert is the final decider on diagnoses made on biopsies and other tissue samples. Pathologists fear — and AI companies hope — that the computers are coming for our jobs. Not yet.”

In the new study, scientists trained three AI models to scan microscope slides of placenta tissue to

detect blood vessel damage;
estimate gestational age; and
classify macroscopic lesions.

They trained a fourth AI model to detect prostate cancer in tissues collected from needle biopsies. When the models were ready, the scientists exposed each one to small portions of contaminant tissue (e.g. bladder, blood, etc.) that were randomly sampled from other slides. Finally, they tested the AIs’ reactions.

Each of the four AI models paid too much attention to the tissue contamination, which resulted in errors when diagnosing or detecting vessel damage, gestational age, lesions and prostate cancer, the study found.

Source: Northwestern University

24.01.2024