Text Analysis of Electronic Medical Records to Predict Seclusion in Psychiatric Wards: Proof of Concept

Aim: With the introduction of “Electronic Medical Record” (EMR) a wealth of digital data has become available. This provides a unique opportunity for exploring precedents for seclusion. This study explored the feasibility of text mining analysis in the EMR to eventually help reduce the use of seclusion in psychiatry. Methods: The texts in notes and reports of the EMR during 5 years on an acute and non-acute psychiatric ward were analyzed using a text mining application. A period of 14 days was selected before seclusion or for non-secluded patients, before discharge. The resulting concepts were analyzed using chi-square tests to assess which concepts had a significant higher or lower frequency than expected in the “seclusion” and “non-seclusion” categories. Results: Text mining led to an overview of 1,500 meaningful concepts. In the 14 day period prior to the event, 115 of these concepts had a significantly higher frequency in the seclusion category and 49 in the non-seclusion category. Analysis of the concepts from days 14 to 7 resulted in 54 concepts with a significantly higher frequency in the seclusion-category and 14 in the non-seclusion category. Conclusions: The resulting significant concepts are comparable to reasons for seclusion in literature. These results are “proof of concept”. Analyzing text of reports in the EMR seems therefore promising as contribution to tools available for the prediction of seclusion. The next step is to build, train and test a model, before text mining can be part of an evidence-based clinical decision making tool.


INTRODUCTION
Reasons for being admitted to a closed psychiatric ward usually involve the combination of psychiatric symptoms and aggressive or impulsive behaviors and/or presenting a risk to others or oneself (1)(2)(3). By providing structure, socio-therapeutic interventions, and medication, patients usually become less agitated (4,5). In some situations, however, there is no other alternative than to use restraining measures (6). In the Netherlands, seclusion is the preferred restraining measure and is used more often compared to other countries, with forced medication being used less. The high use of seclusion (in number and duration) has been subject to national extensive political discussion and media coverage (7)(8)(9). Seclusion should be avoided as much as possible and not only because the therapeutic value is doubtful (10,11). This measure has proven to be a traumatic intervention for both the patient (12,13) and staff (14,15). Various initiatives have taken place to diminish the use of seclusion (16)(17)(18). Over the past years seclusion rates in the Netherlands have lessened due to several reduction endeavors, such as the implementation of a High Intensive Care model in acute psychiatric wards (19)(20)(21). However, seclusion rates in the Netherlands still remain one of the highest compared to other countries. More efforts are needed to reduce the use of seclusion (8,9).
Risk assessment has shown to be effective in reducing seclusion and is often incorporated in reduction efforts (8,(21)(22)(23). Reviews show a scarcity of well-designed studies addressing feasibility and effectiveness of de-escalating interventions as Gaynes et al. (24) remarked "The available evidence about relevant strategies is very limited. Only risk assessment decreased subsequent aggression or reduced use of seclusion and restraint (low strength of evidence). Evidence for de-escalating aggressive behavior is even more limited." The present article describes an innovative way of extracting words from the text available in the "Electronic Medical Record" (EMR) of patients admitted to psychiatric admission wards in order to predict seclusion (or assess risk); the focus here is on the prevention of seclusion as this is the most frequently used restraining measure in The Netherlands. The "Electronic Medical Record" (EMR) gives access to clinical data that was not readily available before its implementation. It allows largescale clinical analysis in daily routines in psychiatry, however, the precise extraction of clinical relevant data from the narrative medical and nursing notes and other files can be challenging. An example of strategies used to extract data from texts is the study of Perlis et al. (25) who used "Natural Language Processing" for a chart review by processing text into meaningful concepts on a set of rules. They were able to give a proper indication of the patients that could be regarded to "become therapy resistant." Cerrito et al. (26) wrote a white paper on the use of data-mining techniques on Electronic Medical Record in the emergency department of a hospital to improve care while lowering costs. They discovered that patients with similar complaints were treated very differently depending on the attending physician, and those differences can have an impact on both costs and care. Other examples are: predicting future risk of suicidal behavior using longitudinal historical data in electronic health records (27) or after discharge from general hospitals (28), detecting specific follow-up appointment criteria in hospital discharge records (29), extracting employment information of service members from the Electronic Health Record (30), identifying tapering patterns in switching of different antipsychotics (31) or identifying knowledge gaps in guidelines and exploring physicians' therapeutic decisions with data mining techniques to fill these knowledge gaps (32).
In the current explorative study text mining software is used to allow analysis of large amounts of text in which (patterns of) words are screened on whether or not they are more numerous in patients who are subsequently secluded. This method of analysis provides insight into what is relevant, what is related and what is representative from a large body of unstructured text (33). This technology has been used in several academic studies to perform text analysis in the medical domain (34,35). The intention of this study is purely to explore the use of text mining in daily psychiatric practice to determine if it could be a viable tool in reducing the use of seclusion in the future. If the results are promising the next step would be to link qualitative information from the "Electronic Medical Record" (EMR) to a predictive model of seclusion. After validation, this model could provide the opportunity to develop a screening-algorithm that checks in "real time" if the relevant "trigger" (or "discriminative") words and word-combinations (concepts) linked to seclusion appear in the "Electronic Medical Record" (EMR), thus giving a warning sign that a patient is at risk. This will provide means to de-escalate the behavior at an early stage and in turn reduce the number of seclusions. Such an alerting system should not lead to extra workload for the staff, be safe and have no negative impact on patient care and well-being.
The authors sought to answer the following question in this explorative study: could analyzing text in the files of patients be useful in the quest to reduce the use of seclusion in psychiatric practice? To answer this, the first step was to see if text mining in the Electronic Medical Record (EMR) could lead to the identification of meaningful concepts in the EMR that are numerically the most frequent in the medical files of the patients. The second step was to answer the question if any of these concepts typically relate to either a subsequent seclusion or, for non-seclusion, a subsequent discharge from the ward.
This study was purely explorative in nature to determine if text mining the EMR could result in useful concepts that typically precede seclusion on a psychiatric closed ward. This study is based on data mining: not hypothesis driven but data driven. The authors did not choose to formulate an expected outcome of concepts related to seclusion or non-seclusion. To the authors' knowledge, no studies were available at that time that indicated certain concepts would have a predictive value for seclusion or non-seclusion.

Study Design
A retrospective cohort study using unstructured data from routine patient reports and notes stored in the EMR written by nurses and physicians.

Setting
The study took place in a large regional psychiatric hospital in The Netherlands with an urban catchment area of ∼550,000 inhabitants. Data was gathered from an acute psychiatric admission ward which held 52 beds and 6 seclusion rooms (∼1,300 patients admitted per annum on average with a mean length of stay of 16 days) and from a non-acute psychiatric admission ward with 42 beds and 2 seclusion rooms (around 300 patients admitted per annum on average with a mean length of stay of 42 days) (3).

Indexed sentences
The patient made a substantial psychotic impression. During conversation he skipped from one topic to another. The patient was restless and threatening towards the physician and nursing staff. no conversation possible. The patient was frequently present on the ward. still very restless and chaotic during moments of contact. He has bizarre ideas. only wants to drink from the blue cup for example. sometimes looking suspiciously at his surroundings. Gentleman received as necessary medication. made a somewhat tense impression. He was angry when he had to go back to the time out. possibly because he didn't get what he wanted.  Black and italic are words that add to the concept such as pronouns or adverbs ('he'. 'sometimes'). Grey marking is a possible negation.

Participants
All nursing notes and medical reports written about patients admitted during the period August 2008-July 2012, on either the acute or the non-acute admission ward, were extracted from the EMR. Hence, including readmitted patients and secluded or nonsecluded patients. Every note and report was used of every single patient to fully reflect day-to-day psychiatric practice, including possible missing information in the EMR.

Procedure
After approval of the board of directors a request was made to the department of Internal Business Intelligence to extract all reports and notes from the EMR of the above described participants. These text files were deleted after the study and were anonymously analyzed by an external company which developed a text mining program.

Analysis
The goal of analysis was to first find frequently used concepts in the EMR and secondly if any of these concepts relate to either seclusion or non-seclusion of patients. Concepts were identified using text mining software. All the unstructured data in the EMR involving the day-to-day notes by the nursing staff and various psychiatric reports by physicians and other mental health professionals (excluding medication prescription) were analyzed using text mining software 1 . The approach of the software is to break texts into sentences, and to parse sentences into concepts and relation patterns, without predefined domain knowledge. The semantics analysis run by the software recognizes key elements such as concepts, relations, non-relevant words, and negations. Relations are commonly verbs, and nouns with adjusting words are concepts (Figure 1). The software itself automatically generates the most frequently used concepts. Frequency of concept is the number of times a concept appears in a text; note that this is not the same as the frequency of a word, because a concept can consist of multiple words (33). The concepts of secluded patients were analyzed during a maximum of 2 weeks prior to seclusion and were compared to the concepts in reports of non-secluded patients during the last 14 days of their admission. To control for the differences in the time admitted in the hospital and differences between the acute and non-acute ward, a period of 14 days prior to seclusion vs. the last 14 days of admission for the non-secluded patients was selected for this study. The last 14 days of admission was chosen for the non-secluded group, because this is the most stable phase for them. These periods were not compared in the same timeframes. In this strategy there is no "control group" in a strict sense, but only a dichotomy: a patient is either secluded or not.
Chi-square analyses were used to test if there was a significant difference in the frequency of the concepts for the secluded and non-secluded categories during the 14 days prior to the event. Additionally, concepts from days 14 to 7 prior to either seclusion or discharge were analyzed in the same way. A Bonferroni correction was applied on the p-value to correct for the multiple hypothesis testing; i.e., 1,500 hypotheses, one for each concept, were tested.

Ethical Considerations
Before conducting the study the authors consulted the Dutch Central Committee on Research Involving Human Subjects (CCMO) under the Dutch Medical Research Involving Human Subjects Act (WMO) regarding if approval of this study was needed. Seeing that this study does not include physically involved patients, interventions or subject patients to procedures that require them to follow rules of behavior, no approval of the ethical committee was sought. The study was approved by the medical director of the institute.

RESULTS
The study included 3,045 admissions for an acute psychiatric ward and a non-acute psychiatric ward from August 2008-July 1 iKnow smart indexing © , Intersystems. 2012. This accounted for 67,590 notes and reports of which 57,381 belonged to non-secluded patients and 10,209 to secluded patients. The total reports involved 2,816 patients of whom 1,687 (60%) were male and 1,129 (40%) were female. The mean age was 41 years (SD = 13) and 656 (23%) patients were secluded. The major diagnoses in this group were: schizophrenia (N = 967; 32%), mood disorders (N = 767; 25%), and other psychotic disorders (N = 672; 22%; Table 1).
The results were incorporated in a dashboard that computes graphs and tables when selecting a particular word or sociodemographic variable. Furthermore, the text mining analysis resulted in an overview of 1,500 (most meaningful) generated concepts from the EMR. The frequencies of these concepts were displayed for each of the 14 days prior to seclusion and discharge (non-seclusion). In total 1,500 concepts were mentioned 428,587 times, of which 67,088 were found in files of secluded patients and 361,499 in files of non-secluded patients. The overview of 1,500 concepts consisted of a number of repetitions that were seen as different concepts due to spelling or the use of abbreviations by staff. This was for example the case for the concepts regarding: mania, depression, hallucinations, paranoia, seclusion, and time-out room.
Chi-square analyses of all concepts and the occurrence of the concept in files of secluded or non-secluded patients in the 14 days prior to the event of seclusion or discharge, resulted in 115 concepts relating significantly to seclusion, ranging from the concept seclusion (Dutch abbreviation; χ 2 (1) = 287.89, p < 0.001) to the concept fell down (χ 2 (1) = 17.37, p < 0.05;    Table 2). For the non-secluded patients significant relationships were found for 49 concepts, ranging from the concept furlough (χ 2 (1) = 238.34, p < 0.001) to the concept sitting room (χ 2 (1) = 18.17, p < 0.05; Table 3). Analysis of the concepts from days 14 to 7 involved 1,499 concepts (letter of discharge not yet mentioned in the reports and notes), which were mentioned in total 209,796 times in the EMR: 31,143 times in files of secluded patients and 178,653 times in files of non-secluded patients. Chi-square analyses led to 54 significant relating concepts to seclusion, ranging from the concept behavior (χ 2 (1) = 114.18, p < 0.001) to not clear (χ 2 (1) = 17.39, p < 0.05; Table 4). Compared to the full 14 days leading up to the event of seclusion, the following 68 concepts are not yet significant: mania, 5 o'clock, several times, paranoid impression, defensive, agitation, physicians, bathroom, pounding, angry, trousers, cannabis, chaotic, claiming, colleague, cooperative, doors, directive, restless/boisterous, restless/boisterous presence, forceful, demanding, very suspicious, very restless/boisterous, very psychotic, excuses, substantial, fell down, god, boundaries, ground, hand, hands, custody measure, everyone, closet, complaint, lorazepam, mobile, difficult, motoric, wall, naked, night, eyes, affectless, affectless impression, restlessness, restless, uninhibited, force majeure, pills, psychotic utterances, smoking area, seclusion, seclusion room, cigarette, sleep, tranxene, verbal, nursing staff, confused, question, warning, desperate, water, gone, and fluctuating. In this week before the event of seclusion, seven additional concepts were significant but were not significant in the full 14 days before seclusion. These are the concepts: ambulant practitioner, short, loud, not clear, schedule, hunch/suspicion, and early shift.
Regarding concepts relating to non-seclusion, days 14 to 7 were significant during days 14 to 7 prior to discharge. These comprised of 35 less concepts that were significant than in the analysis of the full 14 days ( Table 5). Concepts that were no longer significant were the following: depressive state, present, adequate, adequate impression, helpful, happy, contacts, day structure, own way, as usual, no characteristics, no psychotic characteristics, no psychotic utterances, whole night, all night not awake, group, house, sitting room, impression, manic state, madam not awake, tomorrow, ms m.i, unnoticeable, discharge, admission, return, quietly present, slept, somber, sport, suicidal tendencies, woman, weekend, and work.

DISCUSSION
The present study explored the usefulness of analyzing text in the files of patients to identify concepts from reports and notes written by nurses and physicians that typically precede the incidence of seclusion. The authors were looking for a "proof of concept." Would it be possible to differentiate or identify concepts that precede seclusion? Text mining led to a list of 1,500 meaningful concepts from the EMR that are numerical the most frequent in files of patients. Of these 1,500 concepts, 115 seem to typically precede seclusion during 14 days. At first glance the majority of these 115 concepts correspond to (intuitive) clinical experience and can be viewed as five groups: 1. phrases that accompany reasons to use seclusion (i.e., concepts comprising the phrases: threatening, psychotic, restlessness, paranoia, verbal, angry, agitated, affectless, claiming, pounding, mania, chaotic, uninhibited, confusion,  and custody measure). These phrases are in line with literature that describe the reasons for using seclusion or restraint in psychiatric inpatient practice. For instance Keski-Valkama et al. (36) found that agitation/disorientation was the most frequent reason for the use of restraint and seclusion. Knutzen et al. (37) discovered that the restrained group in their study consisted of a large proportion of psychosis related primary diagnoses. Larue et al. (38) describe that the main reasons for seclusion were agitation, disorganization and aggressive behavior. Vollema et al. (39) found that the risk for seclusion increases in the presence of irritable/aggressive behavior, motoric restlessness, and the decrease of the feeling of safety among staff. Bowers et al. (40) mention aggressive behavior as a reason for seclusion. El-Badri and Mellsop (41) found that a primary diagnosis of schizophrenia, mania and substance abuse tended to be secluded more frequently than others and also threats of violence to staff, property and actual violence. Husum et al. (42) discovered that patients who are overactive and aggressive, experiencing hallucinations and delusions, executing self-injury or at risk of suicide have a higher risk of being secluded and restrained than patients not showing such behavior. They also found that diagnosis of schizophrenia or other psychosis was linked to seclusion. Tunde (43) wrote that those that were secluded were more likely to be young, involuntarily admitted, had a diagnosis of schizophrenia, were a risk to others, risk to self and at risk of absconding. Noorthoorn et al. (9) reported that higher seclusion rates were associated with psychotic disorders and male gender. 2. Other containment measures used in psychiatric practice (i.e., the concepts including time out and emergency medication). These "alternative" containment measures are for example described by Dack et al. (44). They defined a number of containment measures used in psychiatric practice, such as seclusion, PRN medication, physical restraint, time out, compulsory intramuscular medication. 3. implementing seclusion (i.e., the concepts: seclusion (three concepts-different spelling or abbreviation), ground, security, alarm, force majeure, and police). These concepts seem to describe the process of secluding a patient. 4. the working environment of nursing staff. For example the concepts: office, medication, colleague, confiscate, and physicians. 5. non-specific terms, such as cigarette, radio, night, everyone, water, bathroom, and 5 o'clock.
The concepts that show a relationship with non-seclusion also have face validity and seem to describe unobtrusive and calm patients. Striking are the words relating to depression and suicidal behavior. This does not seem to resonate with, for example, one of the findings of Vollema et al. (39) that depression was more common among those who were secluded. Also the word woman seems to be in line with El-Badri et al.'s (41) finding that men were more likely than women to be secluded. It was interesting to look at the significant relationships of the concepts a week before the event of seclusion or discharge. A little more than half of the concepts that were significant in the full 14 days were significant during the days 14 to 7. Even though a lot of the words are not yet significant, there are still words that describe reasons for seclusion (i.e., agitated, charged, threatening, psychosis) and the use of other containment measures (i.e., time out and emergency medication). This could mean that a seclusion can be predicted a week before commencing and makes text mining an interesting tool in the quest of reducing the use of seclusion. However, about one third of secluded patients are secluded more than once during an admission and seclusion usually takes place in the first week of admission (41,43). This could be a confounding factor in the concepts found in this study, as some are already describing a seclusion incident. This study took place during nationwide seclusion reduction initiatives that also affected the culture on most admission wards in The Netherlands (10,14) and resulted in a reduction of seclusion rates (10)(11)(12). These changes are not expected to have an impact on the presently found results and conclusions. The reason is that text-mining reflects the culture and way of working on a specific ward. Regarding concepts related to seclusion that describe the reason for using the restraining measure: these are expected to result in similar words, as reasons for using a restraining measure are universal (usually relating to aggression).
There are several limitations to this study. The present study used a particular text mining application. There are several other applications for text mining available on the market, which analyze text in the same way. Perhaps if the present study used different software the results would be different. This, however, is not to be expected.
A limitation is the question of generalizability. This study was conducted on a specific ward in the Netherlands, using Dutch words which may translate differently in other languages. Nevertheless, using text mining in a particular ward always starts with a baseline and training a model in the particular setting. It could be quite possible that depending on cultural or clinical setting and language other concepts can be identified in the EMR that precede or predict seclusion. However, it does seem plausible that similar concepts as found here will also result on another closed psychiatric ward (with the exception of phrases used in a particular hospital, such as the name of the ward or codes used to describe symptoms), because similar phrases as reasons for seclusion are also described in literature. But it is important to keep in mind that the present results only give an indication that text mining the EMR in this context is feasible. Another limitation is that staff do not report in the same way, such as using abbreviations or another spelling for words. The same word can be noted differently in the EMR. For example time-out room: t.o., time-out room, time out room, time-out room. The software did not seem to include these as same entities and resulted in these concepts having a lower frequency. These concepts will therefore have to be manually identified in the exploration phase and combined as input for a possible future predictive model. However, taking into consideration that this study was conducted several years ago and the field of data analysis has evolved and is momentarily thriving, it could be expected that these duplicates of concepts would already be considerably diminished in the first step of analysis with present day updated and new software. Furthermore, the period of 14 days studied here was not compared for the individual patients in the same time-frames. There could be confounding factors involved in these different timeframes, such as an incident that has taken place on the ward or the time of the year. Also, it could be that some staff members view certain patients in a biased way and write their reports accordingly. Additionally, each of the 14 days may not comprise of a comparable quantity of reports that were analyzed. It is advised that future analysis controls for this by making "buckets" of reports to improve comparison. Also, perhaps non-secluded patients as a comparison group can be selected in the middle of admission and not before discharge. This could possibly lead to less discharge-related concepts.
The most important future direction is building and testing a predictive model, for example as described in Barak-Corren et al. (27). In the future perhaps a trained and tested text mining model could lead to "real time" analysis of all day-to-day notes and reports in the Electronic Medical Record. This means that the staff can continue their "routine" way of recording without increasing administrative workload and in the meantime be supported in their judgment and prediction about patients at risk for seclusion. This judgement could be based, for example, on routinely applied structured risk assessment scales (Crisis Monitor) (22). With the use of a specific "User Interface, " data derived from the EMR database can be "transformed" into real time risk assessing information, indicating the probability of seclusion. This can, through a predictive algorithm, yield the signals per individual patient, for example: green indicating no problem, orange indicating providing extra preventative care for the patient and red indicating immediate action needed. Either at the nursing station or on a handheld device, a warning can be generated per individual patient. The type and sequence of the interventions in phase orange or red can be protocolled both in a general way and tailored to specific patient needs. On the basis of continuous feedback, validity of the system can be upgraded and adapted. Ultimately it can be fine-tuned to local resources and attitudes leading to a Clinical Decision Support System. This enhances safety of patients and staff in general, not only with regard to seclusion. Another aspect is that it may also support inter staff communication on a continuous base in an effective and efficient adjuvant way.
It is clear that this approach can also be used in many other contexts. Currently our institution is looking into the possibilities of text mining to support Assertive Community Teams with this approach to diminish (involuntary) admissions and screen outpatients for suicidal tendencies.
Altogether, these results answer the research question positively and it seems to be feasible to identify certain concepts in the EMR that typically precede a seclusion episode. These premature findings may be regarded as a "proof of concept" to use text in the EMR from patients admitted to an (acute) admission ward to help predict subsequent seclusion. Furthermore, these results may help process implicit (clinical) knowledge to become formal knowledge. As mentioned before, this is a pure exploratory study and the study should be repeated, a model built, trained and tested and further evaluation and validation before becoming part of an evidence-based clinical decision making tool. However, the results seem promising that "real time" text analysis of the EMR may be a clinical feasible and possible efficient way to identify patients at risk for seclusion in the future. Thus, offering opportunities for less invasive alternative interventions.

AUTHOR CONTRIBUTIONS
EH initiated the idea to use text mining techniques on reports in the EMR to predict seclusion. EH initiated the collaboration with Intersystems. RdW made it possible to gather data from the specific wards. MH and NM were responsible for delivering the EMR data for text mining analysis. DvH and DW used their company's software to analyze the data. MH wrote the article together with RdW and EH. RvE was responsible for the chi-square analyses.

FUNDING
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.