AUTHOR=Roussinov Dmitri , Conkie Andrew , Patterson Andrew , Sainsbury Christopher TITLE=Predicting Clinical Events Based on Raw Text: From Bag-of-Words to Attention-Based Transformers JOURNAL=Frontiers in Digital Health VOLUME=Volume 3 - 2021 YEAR=2022 URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2021.810260 DOI=10.3389/fdgth.2021.810260 ISSN=2673-253X ABSTRACT=Identifying which patients are at higher risks of dying or being re-admitted often happens to be resource- and life- saving, thus is a very important and challenging task for healthcare text analytics. While many successful approaches exist to predict such clinical events based on categorical and numerical variables, a large amount of health records exists in the format of raw text such as clinical notes or discharge summaries. However, the text-analytics models applied to free-form natural language found in those notes are lagging behind the break-throughs happening in the other domains and remain to be primarily based on older bag-of-words technologies. As a result, they rarely reach the accuracy level acceptable for the clinicians. In spite of their success in other domains, the superiority of deep neural approaches over classical bags of words for this task has not yet been convincingly demonstrated. Even the most recent break-throughs due to the pre-trained attention-based transformers have not yet made their ways into the medical domain. Using a publicly available database , we have explored several classification models to predict patients' re-admission or a fatality based on their discharge summaries and established that 1) The performance of the neural models used in our experiments convincingly exceeds those based on bag of words by several percentage points as measured by the standard metrics. 2) This allows to achieve the accuracy typically acceptable for the clinicians as of practical use (area under the ROC curve above .75) on the majority of our prediction targets. 3) While the pre-trained attention-based transformer performed only on par with the model that averages word embeddings when applied to full length discharge summaries, the transformer handles shorter text segments substantially better, at times with the margin of .04 in the area under the ROC curve. 4) We suggest several models to overcome the transformers' major drawback (their input size limitation ), and confirm that this is crucial for the to achieve their top performance. 5) We have successfully demonstrated how non-text attributes can be combined with text to gain additional improvements for several prediction targets.