Audiology by Alexandros G. Sfakianakis: Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding

Τετάρτη 16 Ιανουαρίου 2019

Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding

Abstract

Introduction

Adverse drug event (ADE) detection is a vital step towards effective pharmacovigilance and prevention of future incidents caused by potentially harmful ADEs. The electronic health records (EHRs) of patients in hospitals contain valuable information regarding ADEs and hence are an important source for detecting ADE signals. However, EHR texts tend to be noisy. Yet applying off-the-shelf tools for EHR text preprocessing jeopardizes the subsequent ADE detection performance, which depends on a well tokenized text input.

Objective

In this paper, we report our experience with the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0), which aims to promote deep innovations on this subject. In particular, we have developed rule-based sentence and word tokenization techniques to deal with the noise in the EHR text.

Methods

We propose a detection methodology by adapting a three-layered, deep learning architecture of (1) recurrent neural network [bi-directional long short-term memory (Bi-LSTM)] for character-level word representation to encode the morphological features of the medical terminology, (2) Bi-LSTM for capturing the contextual information of each word within a sentence, and (3) conditional random fields for the final label prediction by also considering the surrounding words. We experiment with different word embedding methods commonly used in word-level classification tasks and demonstrate the impact of an integrated usage of both domain-specific and general-purpose pre-trained word embedding for detecting ADEs from EHRs.

Results

Our system was ranked first for the named entity recognition task in the MADE1.0 challenge, with a micro-averaged F1-score of 0.8290 (official score).

Conclusion

Our results indicate that the integration of two widely used sequence labeling techniques that complement each other along with dual-level embedding (character level and word level) to represent words in the input layer results in a deep learning architecture that achieves excellent information extraction accuracy for EHR notes.

http://bit.ly/2FvXxuv

Audiology by Alexandros G. Sfakianakis

Πληροφορίες