AI performs similar to traditional risk prediction models

News • Machine learning

AI performs similar to traditional risk prediction models

A study published by The BMJ finds that machine learning models have similar performance to traditional statistical models and share similar uncertainty in making risk predictions for individual patients.

The NHS has invested £250m ($323m; €275m) to embed machine learning in healthcare, but researchers say the level of consistency (stability) within and between models should be assessed before they are used to make treatment decisions for individual patients.

Risk prediction models are widely used in clinical practice. They use statistical techniques alongside information about people, such as their age and ethnicity, to identify those at high risk of developing an illness and make decisions about their care.

Previous research has found that a traditional risk prediction model such as QRISK3 has very good model performance at the population level, but has considerable uncertainty on individual risk prediction.

Some studies claim that machine learning models can outperform traditional models, while others argue that they cannot provide explainable reasons behind their predictions, potentially leading to inappropriate actions. What’s more, machine learning models often ignore censoring - when patients are lost (either by error or by being unreachable) during a study and the model assumes they are disease free, leading to biased predictions.

To explore these issues further, researchers in the UK, China and the Netherlands set out to assess the consistency of machine learning and statistical techniques in predicting individual level and population level risks of cardiovascular disease and the effects of censoring on risk predictions.

They assessed 19 different prediction techniques (12 machine learning models and seven statistical models) using data from 3.6 million patients registered at 391 general practices in England between 1998 and 2018. Data from general practices, hospital admission and mortality records were used to test each model’s performance against actual events. All 19 models yielded similar population level performance. However, cardiovascular disease risk predictions for the same patients varied substantially between models, especially in patients with higher risks.

For example, a patient with a cardiovascular disease risk of 9.5-10.5% predicted by the traditional QRISK3 model had a risk of 2.9-9.2% and 2.4-7.2% predicted by other models. Models that ignored censoring (including commonly used machine learning models) substantially underestimated risk of cardiovascular disease. Of the 223,815 patients with a cardiovascular disease risk above 7.5% with QRISK3 (a model that does consider censoring), 57.8% would be reclassified below 7.5% when using another type of model, explain the researchers.

The researchers acknowledge some limitations in comparing the different models, such as the fact that more predictors could have been considered. However, they point out that their results remained similar after more detailed analyses, suggesting that they withstand scrutiny. “A variety of models predicted risks for the same patients very differently despite similar model performances,” they write. “Consequently, different treatment decisions could be made by arbitrarily selecting another modelling technique.”

As such, they suggest these models “should not be directly applied to the prediction of long term risks without considering censoring” and that the level of consistency within and between models “should be routinely assessed before they are used to inform clinical decision making.”

Source: The BMJ


Read all latest stories

Related articles


News • Machine learning

Google AI now can predict cardiovascular problems from retinal scans

Google AI has made a breakthrough: successfully predicting cardiovascular problems such as heart attacks and strokes simply from images of the retina, with no blood draws or other tests necessary.…


News • On the go

Improving wearables for medical applications

Cardiovascular diseases are the most common cause of fatalities in Germany. Medical wearables which measure vital parameters such as the blood pressure, heart rate and blood oxygen levels in real…


News • Brain tumor treatment network

'Federated learning' AI approach allows hospitals to share patient data privately

To answer medical questions that can be applied to a wide patient population, machine learning models rely on large, diverse datasets from a variety of institutions. However, health systems and…

Related products

Subscribe to Newsletter