Pioneering software to protect patients' privacy
Information in patients' records could benefit biomedical research in terms of understanding diseases and their treatments. The drawback is that those records contain confidential information that could identify patients. If that data has to be removed manually, the task is not only painstaking and therefore expensive, but also not foolproof.
Now a computer programme that can automatically delete confidential data from medical records, yet leave their vital medical information intact, has been developed by researchers at the Massachusetts Institute of Technology (MIT). ‘We’ve developed a free and open-source software package to allow researchers to accurately de-identify text in medical records,’ explained Gari D Clifford, a principal research scientist in the Harvard-MIT Division of Health Sciences and Technology (HST) who led the research* with Principal Investigator Professor Roger G Mark, of HST and MIT’s Department of Electrical Engineering and Computer Science.
To test the new software, the researchers used it on 1,836 nursing notes (containing 296,400 words). Using multiple experts and additional algorithms, they replaced all personal data with ‘fake’ information. They report that the software successfully deleted over 94% of the confidential information, but only 0.2% of the medical content was wrongly deleted. ‘This is significantly better than one expert working alone, at least as good as two trained medical professionals checking each other’s work and many, many times faster than either,’ they pointed out.
The free, open-source software package (labelled de-identified data together with the software) will enable other researchers to improve their systems and allow adaptation of the software to other data types with different qualities.
According to Dr Zohara Cohen, programme director at the National Institute of Biomedical Imaging and Bioengineering, sponsor of the work, the information in patients’ medical records is a ‘largely untapped treasure trove’ that the biomedical research community could use to increase understanding of diseases and their treatments. ‘The automated de-identification software developed under the guidance of Dr Mark is a big step forward in permitting the widespread sharing of patient information without the risk of compromised privacy and confidentiality,’ he pointed out.
* This research was published in journal BMC Medical Informatics and Decision Making (24/7/08). Other research team members: Ishna Neamatullah; Margaret M. Douglass; Li-wei H. Lehman, an HST research engineer; Andrew Reisner, an HST visiting scientist; Mauricio Villarroel, an HST visiting engineer; William J. Long, a principal research associate in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL); Professor Peter Szolovits of the Department of Electrical Engineering and Computer Science and HST; and George B. Moody, HST sponsored research staff.
01.09.2008