Vol 32 ISSUE 03-04/23 October 2023 THE EUROPEAN FORUM FOR THOSE IN THE BUSINESS OF MAKING HEALTHCARE WORK Not always fair AI in healthcare: diverse and balanced training datasets of high quality needed Machine learning and artificial intelligence (AI) are playing an increasingly important role in medicine and healthcare, and not just since ChatGPT. This is es- pecially true in data-intensive specialties such as radiology, pa- thology or intensive care. The quality of diagnostics and deci- sion-making via AI, however, does not only depend on a soph- isticated algorithm but – cru- cially – on the quality of the training data. tem to be able to make a decision, many thousands or millions of x-rays of benign and malignant tis- sue changes are needed as training data. The algorithm classifies each of these training images and com- pares the result with the human- made diagnosis. If the algorithm’s diagnosis was the weights assigned to individual con- nections between the virtual neur- ons are reweighted to improve ac- curacy next time. Once all training data has been processed, a new, incorrect, dated is rare but the algorithm does require a huge amount of training data. When a high volume of digital data can be obtained, it is often from sources of inconsistent quality. This quantity and quality problem is further complicated by privacy issues in healthcare. Garbage in, garbage out Most IT users know the phrase “garbage in, garbage out”: if the input data is garbage, the output will be garbage as well. This also vious: In the US, an AI algorithm was used to estimate which in- patients would need additional care. The training data used the costs patients had previously in- curred as a marker for disease se- verity? This led to the fact that for African-American ad- ditional care was less likely to be recommended because these pa- tients had in the past incurred lower costs. This was not due to their lower disease severity but to their lower access to health care, patients m o c . e b o d a . k c o t s - s k r o w r o m a t e m © smaller data set is used to check the accuracy of the fully trained al- gorithm. This step is called vali- dation. Quality of the training data No doubt: The accuracy of the AI algorithm can only be as good as the quality of the training data. If, for example, the training data con- tains many x-rays, in which a ma- lignant tissue change was mis- takenly considered benign by the human expert or vice versa, the AI will learn from false examples – which will affect its accuracy later on. Slowly digitalized healthcare system training Obtaining high-quality data for algorithms, however, often turns out to be difficult since our healthcare system is only slowly being digitalized. Data that has been carefully and manually vali- holds true in AT where the ac- curacy of a classification algorithm always depends on the quality of the training data. But not all types of low-quality output are immedi- ately recognizable as such. There can be subtle biases in the classifi- cations of an algorithm caused by an unbalanced composition of the training data. The accuracy of diag- nostic algorithms, for example, is worse in populations whose data was underrepresented in the orig- inal training data. from predominantly Well-known examples are al- gorithms to classify malignant skin tumors which were trained with data fair- skinned (Caucasian) individuals. These algorithms show lower diag- nostic accuracy when they are sup- posed to make a correct diagnosis in a dark-skinned person? The source of the bias is not always ob- i.e. a pre-existing systemic dis- advantage. The unbalanced composition of training data is often described with the acronym WEIRD: “white, educated, industrialized, rich and democratic countries” are over- represented. Women and the elderly are also disadvantaged And it’s not just the ethnic and economic background that creates bias. Women are not fairly repre- sented in AI either. For example, researchers suspect that women are overrepresented in the diag- nosis of depression because, among other things, diagnostic al- gorithms query behaviors that are more common in women – regard- less of clinical depression?. In contrast, the Institute of Health In- formatics at University College is intelligence Artificial a buzzword. Underneath the buzz, AI consists of algorithms based on certain machine learning methods. One of these methods, which has received a lot of attention in recent years, is the artificial neural net- work. The layers of nerve cells that are involved in learning processes in the human brain are algorith- mically reproduced (albeit idea- lized). Highly complex learning tasks require many layers of ar- tificial neurons – this is deep learn- ing, another AI term that has be- come popular. Learning from training data A neural network and other forms of AI learn how to perform their task – such as making a specific di- agnosis – based on training data. Imagine, the task is to distinguish malignant from benign findings on chest x-rays. In order for the sys- www.healthcare-in-europe.com London showed that an AI al- gorithm for diagnosing liver dis- ease in women had a significantly lower hit rate: it was wrong in 44% of women, but only in 23% of men. Age as a factor for the development of bias And another factor that can play a role in the development of bias: age. Facial recognition algorithms, for example, are less accurate in an elderly population?. This is particu- larly disconcerting in view of the fact that robotics is increasingly being used in geriatric care, for example to inform and entertain elderly people and dementia pa- tients. Particularly in this field there is substantial research being conducted to improve machine rec- ognition of emotions based on fa- cial expressions. In order to address these biases and ensure equal treatment in a digitalized healthcare system di- verse, balanced and high-quality training data sets are imperative. This also requires legal certainty with regard to the use of patient data for research and development. This issue will be addressed in Germany by the Federal Ministry of Health’s? proposed laws in the context of the digitalization strat- egy and in the EU with the Euro- pean Health Data Space? ■ Report: Dr. Christina Czeschik CONTENTS AI RADIOLOGY WOMEN‘S HEALTH GENDER MEDICINE LABORATORY/ PATHOLOGY 1-3 4-7 8-9 10-11 12-16 www.healthcare-in-europe.com