Neural Document Embeddings for Intensive Care Patient Mortality Prediction

The steadily growing amount of digitized clinical data such as health records, scholarly medical literature, systematic reviews of substances and procedures, or descriptions of clinical trials holds significant potential for exploitation by automatic inference and data mining techniques. Besides the wide range of clinical research questions such as drug-to-drug interactions or quantitative population studies of disease properties, there is a rich potential for applying data-driven methods in daily clinical practice for key tasks such as decision support or patient mortality prediction. The latter task is especially important in clinical practice when prioritizing allocation of scarce resources or determining the frequency and intensity of post-discharge care.

We present an automatic mortality prediction scheme based on the unstructured textual content of clinical notes. Proposing a convolutional document embedding approach, our empirical investigation using the MIMIC-III intensive care database shows significant performance gains compared to previously employed methods such as latent topic distributions or generic doc2vec embeddings. These improvements are especially pronounced for the difficult problem of post-discharge mortality prediction.

This paper has been accepted for presentation at the NIPS 2016 Machine Learning for Health Workshop and has won a best abstract award at the Artificial Intelligence in Medicine Symposium.