Pioneering software to protect patients' privacy

Information in patients' records could benefit biomedical research in terms of understanding diseases and their treatments. The drawback is that those records contain confidential information that could identify patients. If that data has to be removed manually, the task is not only painstaking and therefore expensive, but also not foolproof.

Prof Roger G Mark
Prof Roger G Mark

Now a computer programme that can automatically delete confidential data from medical records, yet leave their vital medical information intact, has been developed by researchers at the Massachusetts Institute of Technology (MIT). ‘We’ve developed a free and open-source software package to allow researchers to accurately de-identify text in medical records,’ explained Gari D Clifford, a principal research scientist in the Harvard-MIT Division of Health Sciences and Technology (HST) who led the research* with Principal Investigator Professor Roger G Mark, of HST and MIT’s Department of Electrical Engineering and Computer Science.
To test the new software, the researchers used it on 1,836 nursing notes (containing 296,400 words). Using multiple experts and additional algorithms, they replaced all personal data with ‘fake’ information. They report that the software successfully deleted over 94% of the confidential information, but only 0.2% of the medical content was wrongly deleted. ‘This is significantly better than one expert working alone, at least as good as two trained medical professionals checking each other’s work and many, many times faster than either,’ they pointed out.
The free, open-source software package (labelled de-identified data together with the software) will enable other researchers to improve their systems and allow adaptation of the software to other data types with different qualities.
According to Dr Zohara Cohen, programme director at the National Institute of Biomedical Imaging and Bioengineering, sponsor of the work, the information in patients’ medical records is a ‘largely untapped treasure trove’ that the biomedical research community could use to increase understanding of diseases and their treatments. ‘The automated de-identification software developed under the guidance of Dr Mark is a big step forward in permitting the widespread sharing of patient information without the risk of compromised privacy and confidentiality,’ he pointed out. 
* This research was published in journal BMC Medical Informatics and Decision Making (24/7/08). Other research team members: Ishna Neamatullah; Margaret M. Douglass; Li-wei H. Lehman, an HST research engineer; Andrew Reisner, an HST visiting scientist; Mauricio Villarroel, an HST visiting engineer; William J. Long, a principal research associate in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL); Professor Peter Szolovits of the Department of Electrical Engineering and Computer Science and HST; and George B. Moody, HST sponsored research staff.

01.09.2008

Read all latest stories

Related articles

Photo

News • Regulatory issues

Genetic data privacy, the GDPR, and research needs: a delicate balance

The EU’s General Data Protection Regulation (GDPR) has created a great deal of uncertainty about how key requirements should be interpreted. This means that collaborators in international genetic…

Photo

News • Apple

New app to recruit and track patients in clinical study

By providing tools to allow users to be more productive in working with healthcare big data, several Silicon Valley giants hope to increase their presence in medical services. The latest company to…

Photo

Article • Study

Hospital design has little effect on patient satisfaction

Contrary to previous reports, a study led by Johns Hopkins researchers found that patients’ satisfaction scores only modestly improved based on the newly remodeled design of a hospital.

Related products

Subscribe to Newsletter