AUTHOR=Yoon Seowon , Jang Jihee , Son Gaeun , Park Soohyun , Hwang Jueun , Choeh Joon Yeon , Choi Kee-Hong TITLE=Predicting neuroticism with open-ended response using natural language processing JOURNAL=Frontiers in Psychiatry VOLUME=Volume 15 - 2024 YEAR=2024 URL=https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2024.1437569 DOI=10.3389/fpsyt.2024.1437569 ISSN=1664-0640 ABSTRACT=With the rapid advancement in natural language processing, predicting personality using this technology has recently generated great research interest. Neuroticism has been identified as one of the core psychological traits that predict psychological distress in various contexts. In this study, verbal responses to a series of open-ended questions, developed based on the five-factor model of personality, were utilized to predict individual levels of neuroticism.Previous personality prediction studies with pre-existing language data barely explored the importance of content. However, exploring appropriate questions for language-based personality assessment (LPA) is particularly important because questions determine the context of elicited responses. This study examined the model's accuracy and the influence of item content in predicting neuroticism.425 Korean adults were recruited through a consecutive sampling method and provided their consent. Psychological assessment batteries were administered, including the measurement of the Five-Factor Model traits, alongside collecting verbal answers to 18 open-ended questions about their personalities through the interview. In total, 30,576 Korean sentences were collected. To develop the prediction models, we employed the pre-trained language model KoBERT.We identified questions that effectively predicted participants' neuroticism based on their responses. Prediction models that were theoretically aligned with neuroticism exhibited greater predictive performance.Computational personality research benefits from computational science norms, especially in acquiring large datasets for machine-learning approaches. To advance computational personality science and provide meaningful psychological insights, these limitations must be addressed from a psychological perspective (8,9,20). Limitations may include but are not limited to, the validity of personality labels, practicability, the context of language use, content validity, and employing readily accessible data without hypotheses (9,17). One of the persistent issues in the field of computational science applied to psychological and personality assessments is the difficulty in understanding what