Leveraging Text Mining Approach to Identify What People Want to Know About Mental Disorders From Online Inquiry Platforms

Online inquiry platforms, which is where a person can anonymously ask questions, have become an important information source for those who are concerned about social stigma and discrimination that follow mental disorders. Therefore, examining what people inquire about regarding mental disorders would be useful when designing educational programs for communities. The present study aimed to examine the contents of the queries regarding mental disorders that were posted on online inquiry platforms. A total of 4,714 relevant queries from the two major online inquiry platforms were collected. We computed word frequencies, centralities, and latent Dirichlet allocation (LDA) topic modeling. The words like symptom, hospital and treatment ranked as the most frequently used words, and the word my appeared to have the highest centrality. LDA identified four latent topics: (1) the understanding of general symptoms, (2) a disability grading system and welfare entitlement, (3) stressful life events, and (4) social adaptation with mental disorders. People are interested in practical information concerning mental disorders, such as social benefits, social adaptation, more general information about the symptoms and the treatments. Our findings suggest that instructions encompassing different scopes of information are needed when developing educational programs.


INTRODUCTION
Coupled with the rapid expansion of Internet access, individuals have been actively utilizing online inquiry platforms when seeking health information (1). For instance, online inquiry platforms are a unique information source where users could anonymously but still interpersonally exchange sensitive health information. When it comes to mental disorders, using online inquiry platforms have become more salient compared with physical diseases, because the stigmatization of these types of disorders discourage people to ask for a face-to-face consultation (2). Several studies have indeed found this tendency across different cultures (3)(4)(5).
Using these online inquiry platforms is especially thriving in the Korean society, which is mostly due to the fact that Koreans are often reluctant to disclose their sensitive health-relevant issues (6). Also, we presumed that one may find online inquiry platforms more favorable due to their features, such as anonymity and accessibility (7). Therefore, exploring these online inquiry platforms where people would frankly open up their concerns and questions would be a great start when examining what people want to know about mental disorders. We expect that refining, classifying, and analyzing the questions made online would contribute to empirical advances in this regard.

The Biggest Restriction of an Open Communication About Mental Disorders: the Social Stigma
Although the number of people suffering from mental disorders has been growing during the last few decades, the stigmatization of mental disorders is still a global phenomenon that hinders one from receiving proper medical care and treatment (2). Nationwide research involving Korean adults revealed that only about 6% of those who met the criteria for the diagnosis of a mental disorder received medical treatment, and it was discussed that the social stigma played a role in this low rate (8). In fact, a study led by the Seoul National University College of Medicine (2011) (9) has explicitly identified that 18.2% of those who had not received any medical treatment for their mental disorders were concerned about the social stigma following the treatment. Under the circumstances that stigmatization is perceived to prevail, Korean's would rather go online to discuss their healthrelevant issues, collect information, and actively interact with other users (6). As such, it is likely that Korean's would prefer using the Internet over a face-to-face support as the main information source when they encounter mental health-relevant issues, so one can remain anonymous and feel less stigmatized (10,11).

Online Health Information Seeking Behavior
The Internet may serve a significant supplementary role in the health-relevant decision-making process. People go online before and after seeing their doctor to prepare for, supplement, or validate the consultation (12, 13). In Korea, the Knowledge iN based in Naver (https://kin.naver.com) and the TIP based in Daum-Kakao are the two biggest online inquiry platforms. In fact, an average of 55,000 questions are registered in Knowledge iN on a daily basis (14), and a total of 372 million answers have been accumulated since its first launch in 2002. Their success may be derived from the features of information that integrates experience-based and expertise-based types, even though these types of platforms are generally founded based on a laypersons' subjective experiences regarding the topic (15). In these sites, for example, health professionals are also encouraged to provide answers with the benefit of advertising their workplace for free (16), which attracts people to reveal their sensitive health issues by expecting to have professionals' opinions as well.
Even though there still exist concerns in terms of the quality of information gained online (7), those who worry about the prevailing stigmatization of mental disorders would be inclined to rely on the information provided by these types of online inquiry platforms. A few studies have identified that people sought information online regarding diagnosis, medication, mental health services, and side effects (7,17,18). Given the anonymous nature of the Internet, people may find posting sensitive questions online less burdensome than they do it in person. Therefore, examining the contents of these questions would enlighten professionals regarding the needs of their patients and clients, which would then contribute to developing proper intervention plans and educational programs.

Text Mining as a New Study Tool
By considering this information-seeking tendency as well as the restrictions of open communication, the current study aims to generate a new body of knowledge with the information collected from the online-based inquiry platforms. Using big data could help to find real-world evidence, and therefore to establish a further understanding and development of theories above the classical approaches, such as the survey method (19). Text mining has an incentive as a study methodology in this research (20,21). Online inquiry platforms guarantee anonymity, and anyone can ask questions without the limitation of time and place, so one can therefore expect that the data gained from these platforms would reflect what people authentically want to know about mental disorders. Employing text mining allows identifying patterns, trends, and relationships that would otherwise remain buried in a large amount of data, which is based on the realworld, yet still unconstructed (22,23). Previous studies indeed adopted text mining approaches and refined the textual data obtained from online health communities (24), blogs (25), and the published research literature (26) in order to identify patterns of information on diverse topics.
In particular, topic modeling is a useful and popular method for identifying latent topics from a large amount of text data (1). Topic modeling makes use of an algorithm to find topics in a vast unstructured literature group, and is a model for inferring latent topics in a way that groups words with similar meanings via the use of vectors of topic distributions of documents and word distributions of latent topics (27,28). Topic modeling is recognized as a standard methodology owing to its high degree of performance and convenience. It overcomes many problems facing the analysis of individual word frequencysuch as the sparsity problem, synonymy, polysemy, or that of semantic hierarchical structure. For our purposes, we adapted the latent Dirichlet allocation (LDA). LDA assumes that each document can cover a number of topics. In other words, the collected document data is considered a stochastic mixture of these topics, reflecting the nature of the real text. Compared to other topic modeling methods-such as latent semantic analysis (LSA) or probabilistic LSA (pLSA)-results from LDA are easier to interpret (28). In addition, smoothing hyperparameter values into random variables addresses the problem of overfitting (29). Since the current study aimed to examine the latent topics from the inquires of online platforms, we adapted LDA.

The Current Study Aim
Taking these ideas in conjunction, the current study aimed to investigate queries about mental disorders posted online by using text-mining approaches. As a result, we obtained the questions, which included the words mental disorders, mental disease, mental health, and mental illness, and computed the word frequencies, centralities, and latent Dirichlet allocation (LDA) topic modeling to extract the keywords of the questions and to explore the networks and latent topics thereof.
We excluded Google, because they provide a different format with the inquiry platform than the other two portal sites do. Using the application programming interfaces (API) provided by Naver and Daum-Kakao, we collected questions about mental disorders in August and November 2019 with the searching keywords of mental disorder, mental disease, mental health, and mental illness. A total of 4,714 queries were collected after deleting the duplicated queries cases.

Data Analysis
Preprocessing For the data preprocessing, photos, emoticons, characters with only consonants or vowels, hyperlinks, and special characters were deleted first. Stemming, lemmatizing, and tagging (partof-speech tagging, POS) were conducted. We performed a morphological analysis using MeCab from the Python KoNLPy package (30). We first deleted the special characters and formatting tags and then corrected the spellings and the spacing words. After that, we conducted text segmentation, which segments the text into sentences and words, and extracted every noun from the text. The stop words (https://www.ranks.nl/ stopwords/korean) were deleted, and the semantically identical words were homogenized using word stemming. All nouns extracted from the text were further analyzed. The preprocessing procedure is presented in Figure 1. After the transformation of the text into mathematical structures, the word frequencies, centralities, and the LDA topic modeling were computed.

Centralities
As for the centrality analysis, we calculated four centrality indicators, which included the degree centrality (DC), the betweenness centrality (BC), the closeness centrality (CC), and the eigenvector centrality (EC), to identify relationships between the nodes and to measure the characteristics of the nodes within the network (31). The DC is used to measure the number of links to the other nodes (i.e., nouns), while the EC identifies to what extent a node is directly connected to other more important nodes (32). The CC refers to the inverse of the sum of its distances to all other nodes, and the BC aids to identify the number of times a node bridges the shortest path between two randomly chosen nodes (33).

Latent Dirichlet Allocation Topic Modeling
The LDA is a generative probabilistic framework to model the topical structures of the collected documents, which allows computing the probabilities of each word belonging to the selected topics (34). LDA estimates the posterior distribution of the Bayesian probability model, which determines the percentages of topic compositions in documents and the word compositions of the topics themselves. The meaning of the topic can be inferred from words with high probabilities that constitute the topic in question. The number of topics was determined by referring to the perplexity score, which denotes the difference between the expected value according to the model and the observed value, and the coherence score, which measures the semantic similarities among the words with high probabilities within each topic. The perplexity was calculated from the loglikelihood of a particular word being given as the topic in unseen documents (held-out test set). Lower perplexity and higher coherence imply a better model. Finally, the group discussed and reviewed the plausibility of the suggested model to confirm the final model (Figure 2).

Frequencies
The searching keywords, such as mental disorder, mental, and disorder, were ranked high, but we omitted these words from the list presented in Table 1. The words human, symptom, thought, hospital, and treatment were ranked as the five most frequently used words.
Regarding the types of mental disorders, depression, anxiety, and schizoid showed high frequencies. Moreover, friend, mother, family, and parents were included within the top forty frequently used words, which represents that the queries concerning one's significant others also appeared frequently.

Centrality
Four centrality indexes were calculated to examine the network among the extracted words. We found that the word my showed the highest DC (0.78), BC (0.77), CC (0.80), and EC (0.36). Except my and symptom (CC = 0.50), there were not any words that showed coefficients higher than 0.50 across the four centrality indicators. These results indicate that the nodes were hardly connected with each other within the node network.

Topic Modeling
We conducted the LDA topic modeling to find the underlying topics by scanning the words and computing their distribution probabilities within the documents (34). Based on the perplexity score, the coherence score, and the group discussion over the plausibility, four topics were selected. Even though the perplexity score was reduced as the number of topics increased, the coherence score reached its peak with the topic number of four. The words and the distribution probabilities in each topic are presented in the Table 2. Based on the contents of the words that belong to the respective topics, we named the four topics as follows. The first topic, which is understanding general symptoms, explained 41% of the documents, and it included words that are like the types of mental disorders, which included depression, anxiety, obsession, schizoid, and bipolar disorder, symptoms, and treatments. The second topic, which is disability grading system and welfare entitlement, explained 30% of the documents and specifically included the words concerning social benefits, such as grade, welfare, and pension in conjunction with the contents relevant to the symptoms and the diagnosis. The third topic, which is stressful life events, explained 15% of the documents and was comprised of words that included mother, father, parents, theirs, they, school, teacher, and friends. The fourth topic, which is social adaptation with mental disorders, accounted for 14% of the documents and consisted of the words that included exemption, record, society, government employees, license, and public service worker. The inter-topic distance map according to the LDA model is presented in Figure 3.

Principal Results
In the present study, we investigated the contents of the queries posted on the two biggest online portal sites in Korea to identify what people authentically want to know about mental disorders. The results revealed that the extracted words were not strongly centered but rather contained varying underlying topics. The results from the topic modeling suggested four topics, which included (1) understanding general symptoms, (2) disability grading system and welfare entitlement, (3) stressful life events, and (4) social adaption with mental disorders. In the following, we will discuss what each topic portrays and how the professionals could employ the current findings for the further development of intervention plans and educational programs.

Comparison With Prior Work
From the previous review, there were several articles aiming to identify and analyze mental-related symptoms or status (e.g., mental health, anxiety, depression) of online communities using text mining and natural language processing; most of these focused on identification, detection, extraction, and the description of mental-related symptom terms (35)(36)(37). While our study focused on general portal online queries, and has tried to observe the patterns of general populations' attitudes and thoughts toward mental-related symptoms or status, those of the articles concerned a variety of both general (e.g., Twitter and patient portals) and disease-specific online communities. Among the types of mental disorders, the words depression, anxiety, and schizoid showed high frequencies, which were somewhat comparable with the actual prevalence rate of the mental disorder diagnosis as provided by the Ministry of Health and Welfare (38). According to this report, anxiety disorder showed the highest 1-year prevalence rate (5.7%), followed by alcohol use disorder (3.5%), nicotine use disorder (2.5%), depression (1.5%), and schizophrenia spectrum disorder (0.2%). Even though the word depression appeared most frequently in the queries, its actual prevalence rate seems to be relatively low. We presume that these queries may have reflected the depressive mood and the melancholy that one encounters in daily life. On the other hand, the public showed relatively higher interests regarding schizophrenia than it was reported in the prevalence rate. Given that the Korean society was in an uproar over some crimes of schizophrenia during the last few years, this frequency rate might indicate the increased public interest in this topic in general (24).
Besides the types of mental disorders, it should be also noted that words, such as friends, mother, parents, and family were also listed as frequent words, which implies that the queries concern both oneself and one's significant others. As the social relationship functions reciprocally, suffering from mental disorders would be a demanding event for oneself and also for the others who are closely related to the patients. It was in fact reported that the odds of having mental disorders increases by about eight times when their close others suffer from mental disorders (39). Therefore, the environment surrounding a person should be considered when it comes to dealing with mental disorders. Psycho-education to enhance the literacy for mental disorders and the coping strategies against stress would be helpful for both the patients and their close others. It is worth noting that the words mother and parents were ranked higher than wife, husband, daughter, and son. This reflects that younger adults are more likely to post queries and seek help online than older adults do. This may be due to the reason that younger adults are more familiar with using the Internet (40) or that the reported stress level is higher for this age group (41).
The overall coefficients across the four types of centrality were somewhat modest except for one word, which is "my." The word my had coefficients higher than 0.70 across DC, BC, and CC, which implies that this word was connected with the other nodes directly and indirectly, even though its low EC reveals that there hardly any other influential nodes that existed in this query network. These results of my may reflect that most queries are not general questions, but they are rather personal questions related to oneself, friends, or family. Furthermore, the word symptom showed a relatively higher frequency rate as well as CC compared to the other words, which represents that the information regarding symptoms and treatments might generally be of importance.
The topic modeling showed that the queries consist of four latent topics. According to the LDA results, the queries cover a wide range of topics, which include (1) understanding general symptoms, (2) disability grading system and welfare entitlement, (3) stressful life events, and (4) social adaptation with mental disorders. Even though the earlier works on the online health information-seeking behavior mainly dealt with the contents of general symptom understanding and treatments (7,17,18), the topics with a further focus on welfare entitlement, stressful life events, and social adaptation accounted for quite a large proportion of the collected data.
For instance, people sought information to confirm whether one is entitled to receive social benefits due to mental disorders, which is topic 2: disability grading system and welfare entitlement. The words relevant to this topic, such as disability, grade, assessment, test, and pension, seem to have a specific focus on mental disorders as a disability and welfare benefits granted to the registered disabled. Presumably, people seek information regarding the disability grade assessment, which determines the entitlement for the welfare program as well as the extent of the benefits that one could receive. However, this disability grading system has been abolished since July 2019 according to the Fifth Comprehensive Policy Plan for People with Disabilities (2018-2022) (42). Instead, it was replaced by a comprehensive approach in order to ensure the customized welfare services, and this type of detailed information should be further publicized perhaps by employing user-friendly guide materials. In addition to the grading system, the words counseling and treatment are also included in this topic, which is in line with the increased needs of people with mental disorders for welfare services (43).
Regarding topic 3, which is stressful life events, we found that people asked for a general consultation concerning the daily hassles that lead them to feel mentally disturbed. As Yi (6) addressed in the study, this phenomenon may reflect one's expectation that they would be empathized in the first-person in addition to practical and professional advice. Considering that the users of online inquiry platforms were satisfied with the answers that combined both the cognitive and the affective messages (6), practitioners from any platform could optimize the way to deliver the information regarding mental disorders. Moreover, a more accessible channel where one can feel secure to disclose their sensitive health issues should be provided.
On the other hand, topic 4, which is social adaptation with mental disorders, was comprised of the queries that mainly concern the disadvantages and barriers following mental disorders, which could be relevant to the stigmatization thereof (8). For example, the words record, government employees, license, driving, and employment could be understood in terms of the limitation of one's personal and social functioning and in particular the career-wise barriers. Moreover, some societyspecific questions were also found. The words exemption, military service, and public service worker may address the possibility of those who are diagnosed with mental disorders to be exempted from the mandatory military service and provide public services instead. Having focused on the topics for young adults, which include job seeking and military service, may be partially due to the average age of the Internet users, but it should be still worthwhile clarifying and providing the information that is particularly relevant to the age-and culturespecific issues.

LIMITATIONS
There are some caveats that should be considered when interpreting our findings. The current study analyzed the texts obtained from online inquiry platforms, which is where people can post questions anonymously. Therefore, any sophisticated distinction by socio-demographic characteristics were not available based on the current dataset. Even though this anonymity was considered to be advantageous when examining what people authentically know about mental disorders, it inevitably hinders planning the interventions to target people with a specific characteristic. Using the dataset that integrates demographical, psychological, or psychiatric information may enrich further discussions in this regard. Furthermore, because the data were collected only on online platforms, the findings may not fully encompass the needs of the Internet illiterate, who might share certain socioeconomic backgrounds or belong to certain age groups. In addition, our analyses would be strengthened by testing the validity of our data, given we had the possibility to examine the same period with similar online platforms.

CONCLUSIONS
This study proposed the use of text-mining approaches to detect queries concerning mental disorders which are posted on online inquiry platforms. The Latent Dirichlet allocation (LDA) topic modeling technique allowed us to successfully extract the keywords of questions relating to mental health problems. Using the real-world data, the findings revealed that the queries, which were posted by those who suffer from mental health problems or those who are close to the patients, covered a wide range of topics such as understanding the general symptoms, a disability grading system, welfare entitlement, stressful life events, and social adaptation with mental disorders. Even though the symptoms and the treatments about mental disorders are still actively discussed, more practical issues that are relevant to one's life, which include social benefits, military service, and employment, also accounted for a large proportion of the collected data. Knowing what people want to know has several theoretical and practical implications for health communication. Future studies will be devoted to investigating more useful mental health issues among the general population. Based on the current findings, one should develop campaigns or educational programs surrounding mental health issues that encompass the diverse scopes of information to effectively help people with mental health issues and to facilitate their social adaptation. Ultimately, the investigation of mental health-related messages from online queries has created many opportunities for improvements to inclusiveness.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
SP conceptualized and designed the study, provided administrative support, assisted in collection and assembly of data, data analysis and result interpretation, and drafted the manuscript. YK-K provided administrative support, assisted in collection and assembly of data, data analysis and result interpretation, and drafted the manuscript. J-aS conceptualized and designed the study, provided administrative support, assisted in the assembly of data, data analysis and result interpretation, and drafted the manuscript. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.