Automatic for the people – and their personal case narratives

Technology / 10 May 2019

UMC data scientist Eva-Lisa Meldau presents the de-identification research at the Data Innovation Summit in Stockholm, March 2019. Photo: Data Innovation @datasweden2019

Deep learning shows promise removing personal identifiers from case reports, according to new work presented by UMC researchers.


Case narratives can be a valuable source of insight about adverse drug reactions, potentially adding details that are less readily available – or not available at all – from the more structured, standardised data fields of individual case safety reports. But sharing case narratives raises legal and ethical duties to protect the confidentiality of patient’s private details.

Currently, removing personal identifiers from case narratives is a manual task that can be both time consuming and tedious. However, new work from UMC suggests there may be a better way.

At the Data Innovation Summit 2019, held in Stockholm in March, UMC data scientist Eva-Lisa Meldau discussed the research into automatically de-identifying case narratives to protect patient privacy. 

The UMC researchers trained neural networks with more than 500 medical records, developing a deep learning algorithm that “read” the narratives in multiple ways to predict identifying data, assisted by standard natural language processing tools and manually constructed rules and dictionary queries.

“We developed the system to be conservative in a way that it only retains words in the text if it is highly confident that they are not personal identifiers,” said Meldau.

The results have been encouraging, suggesting that the algorithm could be trained to perform as well as or possibly better than a human annotator, albeit at the expense of removing more non-personal text.

The algorithm has so far only been developed using medical records from a de-identification challenge. “As we fine-tune the algorithm with annotated original narratives, we expect the performance to improve even further,” said Meldau.

“Automatic detection  and removal of personal identifiers in case  narratives using deep learning” is available at bit.ly/umc-posters.

“Automatic detection and removal of personal identifiers in case narratives using deep learning” is available at bit.ly/umc-posters.

You may also like


UMC breaks ground in automated coding for safety reports

A new study highlights the gains to be made in time, quality, and consistency from using UMC’s coding engine WHODrug Koda to automate drug coding in post-marketing surveillance.

Technology / 09 May 2022

New podcast episode explains the IDMP standards

National differences in identifying products and substances complicate pharmacovigilance. But the IDMP standards promise a harmonised, structured body of definitions. Learn how...

Technology / 03 October 2022

UMC mobilises offline app for vaccine adverse event reporting

Now vaccination field workers can collect reports on adverse events following immunisation (AEFI) using a phone, tablet, or laptop even in the remotest of locations.

Technology / 20 September 2022

Our website uses cookies

Cookies are small text files held on your computer. They allow us to give you the best browsing experience possible and mean that we can understand how you use our site. Some cookies have already been set. You can delete and block cookies but parts of our site won't work without them. By using our website you accept our use of cookies.

Find out more