- 1Department of Psychiatry, The Affiliated Brain Hospital, Guangzhou Medical University, Guangzhou, China
- 2Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou Medical University, Guangzhou, China
Background: Formal thought disorder (FTD) is a core symptom of schizophrenia spectrum disorders (SSDs). As a key representational dimension of FTD, speech features have been shown in previous studies to hold potential as diagnostic biomarkers for SSD. However, relevant research remains limited, and such speech features have not yet been applied clinically for SSD diagnosis.
Objective: The aim of this research is to establish a Chinese speech database for multidimensional analysis of speech characteristics, quantify these high-dimensional linguistic features using natural language processing (NLP), and ultimately develop objective biomarkers for diagnosing and assessing the severity of SSD.
Methods: This will be a single-center, prospective, observational study. In accordance with the DSM-5 criteria, a total of 300 inpatients or outpatients meeting the diagnostic criteria for SSD are planned to be included. Healthy controls with no history of intellectual disability will subsequently be matched. Each participant will undergo a 1-to-2-hour task-guided interview conducted by a psychiatrist, which includes an app-based assessment of the PANSS(Positive and Negative Syndrome Scale), short passage reading, an animal fluency test, a pseudosentence reading task, a symptom severity rating task, an inner-world expression task, and a picture description task. All the interviews will be audio-recorded. After the interview, clinical rating scales will assess psychiatric symptom severity, social functioning, and thought-language disorders. During the study, at an interval of 2 weeks.
Discussion: By multidimensionally quantifying these speech characteristics and integrating machine learning, this study aims to screen highly discriminative speech feature combinations specific to SSD, thereby providing technical and theoretical support for the precise diagnosis and personalized intervention of SSD. These findings will deepen psychiatrists’ understanding of the linguistic pathological mechanisms underlying SSD and promote the development of diagnostic tools and intervention protocols based on novel biomarkers.
Introduction
Since 1990, the prevalence, incidence, and burden of schizophrenia have continued to increase (1). The global lifetime prevalence is approximately 1% (2), imposing a substantial burden on individuals, families, and society (3). Schizophrenia spectrum disorders (SSDs) are characterized by positive symptoms such as hallucinations, delusions and formal thought disorder (FTD); negative symptoms such as affective flattening, avolition and others; and cognitive dysfunction. Among these, formal thought disorder is among the most frequently occurring symptoms in SSD patients (4). FTD refers to disorganized and incoherent thought processes. Surveys of patients with schizophrenia report that its prevalence ranges from 25% to 75% (5).
FTD is primarily expressed through speech characteristics, manifesting as incoherent and disorganized speech (6, 7). Studies have shown that the nodes of the semantic networks in speech from FTD individuals with schizophrenia are more scattered than those from healthy participants (8). In terms of prosody, flat intonation occurs frequently among individuals with schizophrenia (9). With increasing FTD, the syntax becomes simpler (10). Speech characteristics correlate with both positive and negative symptoms in individuals with SSD. For example, conceptual disorganization (a positive symptom) involves disrupted thought coherence, such as tangential or verbose speech. In contrast, negative symptoms manifest as verbal poverty (e.g., brief, empty responses) and impaired emotional communication, which is characterized by rigid, unnatural conversations and potentially mechanical replies. Previous studies have shown that mathematical models built from acoustic parameters, including loudness, formant bandwidth, and amplitude, in individuals with schizophrenia correlate with PANSS negative symptom scores (11). In fact, FTD exists in the early stages of SSD and may be a biomarker of illness severity in the early stages of psychosis (12). A study using an FTD assessment method based on word connectivity reported 85% accuracy in distinguishing individuals with schizophrenia from healthy controls (13). Furthermore, studies have shown that speech features can serve as quantitative indicators with high accuracy for diagnosing SSD and are highly specific to psychosis (11). These studies indicate that speech features hold considerable potential in the development of digital SSD phenotypes. However, individuals with SSD exhibit significant clinical heterogeneity, as their speech manifestations show not only individual variability but also dynamic changes over time, making consistent analysis challenging. Traditional language analysis methods, which struggle to account for such variability, lack objectivity and reproducibility.
With the development of artificial intelligence (AI) and natural language processing (NLP) methods based on machine learning (ML), existing computational tools can extract linguistic features more accurately, including speech content and physical characteristics, and apply them to disease classification and functional evaluations (14). In recent years, studies using ML-based NLP techniques to distinguish psychiatric disorders, including schizophrenia (SZ) and affective disorders, have become increasingly common (15–17). Corcoran et al. validated ML classifiers based on NLP methods using part-of-speech tagging analysis (for measuring syntactic complexity) and latent semantic analysis (for measuring semantic coherence) to predict psychosis onset (18). Nonetheless, current related studies still have issues of insufficient comparability and reproducibility. The application of automated NLP analysis of speech features in diagnosis and treatment across neuropsychiatry remains in its infancy.
In recent years, several exploratory studies on Asian samples have laid an important foundation for cross-linguistic research on SSD (19, 20). Meanwhile, relevant research in the Chinese context still has room for further expansion and refinement: in terms of sample size, the sample sizes of existing studies are mostly concentrated around 40–50 cases, and large-scale, cross-regional datasets have not yet been established; in terms of research methods, previous studies have mostly adopted a single task paradigm (e.g., picture description task), with relatively limited types of linguistic texts analyzed, making it difficult to comprehensively and systematically capture the multidimensional manifestations of speech features in patients with SSD.
Methods
Dataset creation
In this section, we describe the methods for creating large Chinese datasets of adult SSD patients and healthy controls. We plan to make the datasets openly available. After completing the ethical approval modification process, they will be available to researchers with legitimate needs in the field of mental illness language science.
Participants
This study is a multicenter, prospective, observational study. Subject recruitment began in 2025 and remains ongoing. Participants were recruited from psychiatric outpatient clinics or inpatient departments of psychiatric hospitals and general hospitals in South China. The inclusion criteria for patients with SSD were as follows: (i) outpatients or inpatients meeting the Diagnostic and Statistical Manual of Mental Disorders-Five Edition (DSM-5) diagnostic criteria (including acute and transient psychotic disorders, schizophrenia, schizoaffective disorders, delusional disorders, etc.); (ii) right-handed, aged 18 to 59, with an educational level ≥ 9 years; (iii) a Positive and Negative Syndrome Scale (PANSS) score ≥ 60; and (iv) speaks Mandarin fluently. The exclusion criteria included nervous system disease, intellectual disability, hearing loss, alcohol or drug abuse, treatment with electroconvulsive therapy within the past six months, and inability to cooperate with the study because of physical illness or acute mental status. All patients with SSD received antipsychotic medication during the study, and their condition was stable.
Healthy volunteers will be healthy individuals with no history of psychiatric disorders who volunteered via the research team’s website or recruitment advertisements and aged ≥ 18 years. Written informed consent for study participation, data storage, and the use of de-identified data for academic research-related public purposes will be obtained from all healthy volunteers; for patients, it will be signed by their legal guardians. Participants may withdraw at any time. All procedures involving human subjects/patients were approved by the Medical Ethics Committee of The Affiliated Brain Hospital, Guangzhou Medical University.
Data collection
In this study, demographic characteristics and language data (voice and text information) were collected. We assigned a study number to each participant and managed the data accordingly. The clinical assessment results were linked to the study number and stored separately from identifiers such as name and address. Voice data, which might contain identifiers, was stored strictly on a password-locked server. Such identifiers were removed from the public dataset.
Demographic characteristics
Sex, age, educational background, diagnosis, disease course, age of onset, treatment medications, and complications will be collected from medical records. We will also collect as much information as possible about contact details, address, outpatient or inpatient status, occupation, employment status, and marital status. If some information is missing, participants will be contacted via phone or email to provide it.
Conversation data
All the subjects underwent speech data collection in a quiet assessment room. Recordings were made using a ZOOM H1N recorder (with a bit rate of 24 bits and a sampling rate of 96 kHz). During the collection process, the distance between the subjects’ heads and the microphone was maintained between 0.5 and 1.0 m. Before the tasks began, assessors instructed the subjects to avoid noise-generating activities as much as possible. For the analysis of linguistic features in this study, data preprocessing is first conducted, including steps such as denoising and standardization, to ensure data quality. Subsequently, the acquired speech data need to be converted into text using speech recognition technology, followed by manual correction to guarantee text accuracy. Afterwards, the data are processed using automatic language calculation methods. The tasks are designed in ascending order of cognitive load: First, the SCI-PANSS is used to establish the clinical background, followed by a series of evaluation tasks, specifically including the following (Figure 1).
Figure 1. SSD speech feature analysis flowchart. SSD, schizophrenia spectrum disorders; PANSS, Positive and Negative Syndrome Scale.
PANSS assessment
All participants will complete the PANSS assessment to quantify their psychiatric symptoms over the past week and establish a basic clinical profile. The assessment will be conducted using a researcher-developed app integrated with NeuroVoice software for structured interviews. Researchers will remain silent throughout, providing only brief pretask instructions; the entire process will be guided and managed by the app. Equipped with an AI voice generation system, the app will automatically ask standardized, neutral questions one by one. Participants will respond in natural language, enabling human–machine voice interaction to avoid subjective researcher interference and ensure a standardized environment. The app will also automatically record response timestamps and voice features (e.g., duration, speech rate, intonation). After the completion of the APP-based questionnaire, the assessor conducts open-ended follow-up questions based on the participants’ responses to further verify symptoms and ensure the reliability of the scale assessment. Notably, no speech data is collected during the open-ended questioning phase. Currently, this software is in Version 1.0, and subsequent iterations will be carried out to launch a public multilingual version for researchers worldwide.
Short text reading task
Participants will read the classic short text The North Wind and the Sun to assess their basic language abilities, speech fluency, and pronunciation accuracy.
Animal fluency task
Participants will name as many different animals as possible within 1 minute. This task assesses language fluency, thought organization, and lexical fluency.
Pseudosentence reading task
Participants will read aloud a set of pseudosentences generated from publicly available textual materials. These pseudosentences all follow the grammatical structure of “subject + predicate + object,” with each containing 12 Chinese characters, including 3 key words. For example, the English translation of a Chinese nonsense sentence “一个男声可能复习两个课本” is “A male voice may review two textbooks.” The keywords underlined are subject, predicate and object. The pseudosentences are grammatically correct but lack actual semantic content, and the number of Chinese syllables is balanced across sentences. Prior to formal use, all pseudosentences were checked by two researchers to verify their grammatical correctness and absence of practical semantic meaning. This task enables the assessment of participants’ language fluency and expressive ability under conditions free from actual semantic load.
Psychopathological symptoms task
The participants will read text about hallucination and delusion experiences. The text, designed by researchers, is based on common hallucinations and delusions in schizophrenia patients, such as being watched, hearing voices or seeing unreal images.
To create the text scientifically, this study collected self-reported data from 30 schizophrenia patients using semistructured interviews, covering details of their daily hallucinations, delusions and mood swings. All the participants signed written consent forms. Through qualitative analysis, these symptoms were thematically classified, identifying common symptoms such as being watched, controlled or disturbed by voices. The text was developed on the basis of these findings for the reading task.
Expression of inner thoughts
The participants expressed their reflections and feelings about the previous hallucination and delusion text. This assesses their inner responses to hallucinations and delusions, emotional states, and linguistic expressive abilities at both the emotional and the cognitive levels.
Picture description task
The participants are required to describe three types of emotional expressions on Chinese facial expressions: positive, neutral, and negative. This task assesses their emotional cognitive processing and linguistic ability in emotional expression.
In this study, linguistic information during disease progression and the correlation between disease severity and linguistic information are highly important. Therefore, participants will be followed up 2 weeks after the baseline assessment, and follow-up data collection will follow the same procedures as the baseline assessment.
Clinical assessment
After completing the voice tasks, all patients will undergo a systematic clinical evaluation, including assessments of psychiatric symptoms, social functioning, and thought-language disorders. Senior psychiatrists will use the PANSS to evaluate the psychiatric symptoms of patients with SSD over the past week. This assessment will be based exclusively on the PANSS results from the voice tasks, without reinterview. Social functioning will be measured using the Personal and Social Performance Scale (PSP), where lower scores indicate more severe impairment in social functioning. Thought and language disorders will be assessed using the Thought, Language and Communication Scale (TLC) (21). Each symptom is scored from grade 0 (absent) to grade 4 (severe), with the total score reflecting the overall level of thought and communication disorders.
Assessments will be conducted by psychiatrists who have received standardized training, with the assessors remaining blind to the voice data. Prior to the initiation of the assessments, interrater reliability tests will be conducted for all the scale scores (Kappa value > 0.8) to ensure the reliability and scientific validity of the data. Moreover, the assessors will remain blinded to the results of the voice analysis to reduce subjective bias during the assessment process.
Data processing and annotation
The acquired voice data will be converted into text information using speech recognition technology, with manual correction performed. It will also be annotated, including pauses, repetitions, word types, word counts, erroneous statements, and so forth. When converting voice data into text information, personal identifiers (e.g., names) will be masked. On the basis of the text data obtained above, we will employ natural language processing (NLP) techniques for morphological and syntactic analyses. Specifically, we will use an independently developed Chinese word classifier (22, 23) to extract and calculate various variables. These analyses will include the frequency of each part of speech, vocabulary (negative and positive words), syntactic complexity, speech length, frequency of first-person usage, and use of pronouns, among others. We will use NLP techniques to represent text data as n-grams and word embeddings obtained using pretrained models (e.g., BERT).
In this study, 12 audio metrics will be extracted from the recorded audio data to utilize the physical properties of the audio data in the machine learning (ML) model, including duration (s), mean fundamental frequency (F0) (Hz), standard deviation of F0 (Hz), minimum F0 (Hz), maximum F0 (Hz), harmonics-to-noise ratio (HNR) (dB), jitter (%), shimmer (%), intensity (dB), root mean square (RMS) amplitude (dB), spectral centroid (Hz), and spectral spread (Hz). These audio metrics will be extracted using the Parselmouth package (Praat in Python).
Language features specific to the characteristics of certain mental illnesses will be selected and statistically compared between different groups (e.g., patients vs. healthy subjects and Subgroups of schizophrenia).
Machine learning
This study will train machine learning models to perform the following tasks: (i) to predict whether subjects have SSD; (ii) to predict disease severity on the basis of scores from scales such as the Positive and Negative Syndrome Scale (PANSS), Personal and Social Performance Scale (PSP), and Thought, Language and Communication Scale (TLC); (iii) to predict changes in a subject’s condition with respect to a previously recorded state if the subject has undergone a prior assessment; and (iv) to predict scores for each item of clinical rating scales, where these items reflect different dimensions of the disease, such as positive symptoms, negative symptoms, and symptoms of disorganization. The model training data will include voice features from voice data, linguistic information from transcribed text data, and demographic data.
Data will be processed, and feature engineering will be performed using natural language processing (NLP) techniques to extract disease features. Machine learning models such as decision trees, support vector machines (SVMs), and deep learning architectures will be employed, and model performance will be evaluated using leave-one-subject-out cross-validation. The feature importance estimation methods will be used to determine the relative importance of each feature, and the prediction results will be assessed using the mean absolute error, coefficient of determination, and correlation coefficient. The extracted features will be validated for their ability to (i) distinguish between normal and disease states, (ii) identify different disease types, and (3) reflect temporal changes in disease severity.
Statistical analysis
The demographic analyses will be performed using SPSS 24.0. On the basis of different observation indicators and data types, we will describe count data as frequencies (percentages) and measurement data as the mean ± standard deviation. To compare measurement data, we will adopt the independent samples t test; for count data, we will use the chi-square test. We will apply Pearson correlation analysis to explore the relationships between speech features and psychiatric symptoms in schizophrenia patients, while controlling for demographic variables such as age, sex, and educational level. For all the statistical tests above, a P value < 0.05 will be considered to indicate statistical significance.
Sample size
The sample size was estimated using G-Power 3.1 software. With a two-tailed test set at α = 0.05, a power of 0.95, and an expected effect size of 0.50, 270 samples were required for each group. Considering potential sample loss due to factors such as refusal to participate or poor cooperation in the actual survey, the total sample size was increased by 10%. After adjustment, the sample size was 270 × (1 + 0.10) =297. In summary, approximately 300 patients with SSD need to be enrolled in this study.
Discussion
This study aims to establish a speech database and diagnostic model for SSD in the Han population by combining diverse speech data collection methods with NLP and machine learning technologies. As of July 2025, we have collected speech data from >100 individuals diagnosed with SSD whose speech data have been initially integrated into a dataset. This study further aims to create a large open Chinese speech dataset, which will provide innovative and objective biomarkers for the diagnosis and treatment of SSD.
Speech dysfunction in SSD patients is closely linked to thought disorders and represents an important clinical feature (24). Compared with clinical reports, the improved accuracy of speech analysis provides more precise symptom monitoring (25). A large body of research indicates that individuals with SSD exhibit significant and systematic abnormalities in speech characteristics during verbal expression (26–28). However, these results are inconsistent. These studies typically use a single speech feature assessment and task paradigm, limiting the types and depth of speech analysis (20, 28, 29).
Our study will use a multitask comprehensive assessment method to evaluate multidimensional functions, including language fluency, grammatical comprehension, language structure, and emotional expression. Data collection will adopt a stepwise task paradigm (e.g., PANSS assessment→short text reading→picture description), covering these dimensions plus repetition of pathological symptoms. Compared with a single task, this approach is more comprehensive and yields a richer dataset. We will use a self-developed app for standardized assessment of PANSS, ensuring the standardization of data collection and providing effective support for multidimensional speech data analysis. Furthermore, this study introduces a novel pseudosentence reading task. Previous speech research in schizophrenia has largely employed picture-description or semispontaneous conversation paradigms, which are susceptible to interference by patients’ cultural backgrounds and cognitive abilities, thereby obscuring their core linguistic profiles. By eliminating semantic content, pseudosentences reduce the semantic processing load and circumvent language-specific constraints, enabling direct examination of formal linguistic features such as Mandarin tone accuracy, pause distribution, and articulator motor coordination (30, 31). Our research will contribute to a comprehensive analysis of speech characteristics in patients with schizophrenia and will improve the accuracy of early diagnosis.
The further development of NLP and ML has significantly enhanced the potential value of linguistic biomarkers. These technologies allow for the rapid and objective measurement of language features (32). Current studies still suffer from insufficient comparability and reproducibility. Many of them are based on small samples, with inadequate cross-validation of the classifiers (33). To address these limitations, our research intends to apply ML techniques with higher classification efficiency to multidimensional speech feature data, screen and validate SSD-specific speech marker combinations with high discriminative power from high-dimensional speech features and construct a high-performance auxiliary diagnostic prediction model for SSD. In subsequent work, we plan to conduct cross-modal analyses using machine learning methods by combining speech features, brain function networks, and abnormal neuroelectrophysiological patterns.
With the help of NLP methods, speech analysis can enable objective quantification of speech abnormalities in SSD. However, most previous SSD speech studies have focused on Western populations. Additionally, several factors, such as sociodemographic factors, clinical heterogeneity, and cross-linguistic variation, can interfere with SSD speech features, particularly in terms of coherence measures (34). Linguistic and cultural differences can exert crucial effects on some quantitative speech markers measured in the subjects (35, 36). This may lead to biases in computational evaluations such as semantic coherence. A study on a global corpus revealed that Chinese and German patients with schizophrenia showed lower coherence compared with controls across several measures. In contrast, compared with controls, Danish patients showed a mixed pattern: higher semantic coherence across multiple measures but lower coherence at the lexical and sentential levels (19). Such differences may be related to the linguistic characteristics of Chinese, which relies on “tone to distinguish meaning,” and Danish, which uses “stress to distinguish meaning.” Additionally, different preprocessing options (e.g., transcript length normalization and the omission of fillers/punctuation marks) can affect various coherence measurements (34). Therefore, there is an urgent need to conduct systematic research on the Han population to explore their unique linguistic manifestations and underlying biological mechanisms. Our study will construct a multi-dimensional speech database for the Han population, filling the research gap concerning the “scarcity of Chinese-language data”.
This study has several limitations. First, although an attempt was made to increase the representativeness of the sample, it failed to cover Han populations from every province, autonomous region, and municipality directly under the Central Government in China. Thus, the initial sample still lacks sufficient representativeness, and we plan to conduct data collection in multiple regions across China in subsequent work. Second, our study included only adult patients with SSD, making it impossible to infer the speech and language characteristics of patients with adolescent-onset or late-onset SSD or to analyze the correlation between age and symptom progression. We aim to expand the sample to include children and adolescents under 18 years old in future studies. Additionally, the unique linguistic features of Chinese individuals increase the difficulty in analyzing the speech characteristics of SSD patients. For instance, Chinese communication commonly involves expressions such as “subject omission” and “inversion,” along with distinct regional diversity in “dialects and cultures,” which may complicate text annotation and analysis.
Despite these challenges, this research will enable us to investigate the language of psychiatric disorders statistically and comprehensively. It will provide psychiatrists with a new interpretation of language.
Ethics statement
The studies involving humans were approved by Institutional Review Board of the Affiliated Brain Hospital of Guangzhou Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.
Author contributions
JL: Data curation, Investigation, Validation, Visualization, Writing – original draft, Writing – review & editing. SZ: Investigation, Writing – review & editing. GD: Investigation, Writing – review & editing. MJ: Investigation, Writing – review & editing. XZ: Investigation, Writing – review & editing. XH: Investigation, Writing – review & editing. QK: Conceptualization, Investigation, Resources, Supervision, Writing – review & editing. SS: Conceptualization, Data curation, Funding acquisition, Supervision, Writing – original draft, Writing – review & editing.
Funding
The author(s) declared financial support was received for this work and/or its publication. Prof. Shenglin She was supported by the the Planed Science and Technology Projects of Guangzhou (2025A03J3353). Prof Qijie Kuang was supported by the Guangzhou Municipal Health Commission Science and Technology Project (20241A011052). Funding by Guangzhou Municipal Key Discipline in Medicine (2025-2027), Guangzhou High-level Clinical Key Specialty, and Guangzhou Research-oriented Hospital; Guangzhou Key Clinical Specialty (Clinical Medical Research Institute). The funders had no role in the study design, data collection analysis, decision to publish, or manuscript preparation.
Acknowledgments
We would like to express our sincere gratitude to all individuals and organizations who supported and assisted us throughout this research. We extend our thanks to everyone who has supported and assisted us along the way. Without your support, this research would not have been possible.
Conflict of interest
The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Solmi M, Seitidis G, Mavridis D, Correll CU, Dragioti E, Guimond S, et al. Incidence, prevalence, and global burden of schizophrenia - data, with critical appraisal, from the Global Burden of Disease (GBD) 2019. Mol Psychiatry. (2023) 28:5319–27. doi: 10.1038/s41380-023-02138-4
2. Hyman SE, Saha S, Chant D, Welham J, and McGrath J. A systematic review of the prevalence of schizophrenia. PLoS Med. (2005) 2:e141. doi: 10.1371/journal.pmed.0020141
3. Millier A, Schmidt U, Angermeyer MC, Chauhan D, Murthy V, Toumi M, et al. Humanistic burden in schizophrenia: A literature review. J Psychiatr Res. (2014) 54:85–93. doi: 10.1016/j.jpsychires.2014.03.021
4. Cavelti M, Kircher T, Nagels A, Strik W, and Homan P. Is formal thought disorder in schizophrenia related to structural and functional aberrations in the language network? A systematic review of neuroimaging findings. Schizophr Res. (2018) 199:2–16. doi: 10.1016/j.schres.2018.02.051
5. Roche E, Creed L, MacMahon D, Brennan D, and Clarke M. The epidemiology and associated phenomenology of formal thought disorder: A systematic review. Schizophr Bull. (2015) 41:951–62. doi: 10.1093/schbul/sbu129
6. Hart M and Lewine RRJ. Rethinking thought disorder. Schizophr Bull. (2017) 43:514–22. doi: 10.1093/schbul/sbx003
7. Palaniyappan L, Homan P, and Alonso-Sanchez MF. Language network dysfunction and formal thought disorder in schizophrenia. Schizophr Bull. (2023) 49:486–97. doi: 10.1093/schbul/sbac159
8. Nettekoven CR, Diederen K, Giles O, Duncan H, Stenson I, Olah J, et al. Semantic speech networks linked to formal thought disorder in early psychosis. Schizophr Bull. (2023) 49:S142–52. doi: 10.1093/schbul/sbac056
9. She S, Gong B, Li Q, Xia Y, Lu X, Liu Y, et al. Deficits in prosodic speech-in-noise recognition in schizophrenia patients and its association with psychiatric symptoms. BMC Psychiatry. (2024) 24. doi: 10.1186/s12888-024-06065-8
10. Kircher T, Bröhl H, Meier F, and Engelen J. Formal thought disorders: from phenomenology to neurobiology. Lancet Psychiatry. (2018) 5:515–26. doi: 10.1016/S2215-0366(18)30059-2
11. de Boer JN, Voppel AE, Brederoo SG, Schnack HG, Truong KP, Wijnen FNK, et al. Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool. Psychol Med. (2021) 53:1302–12. doi: 10.1017/S0033291721002804
12. Oeztuerk OF, Pigoni A, Wenzel J, Haas SS, Popovic D, Ruef A, et al. The clinical relevance of formal thought disorder in the early stages of psychosis: results from the PRONIA study. Eur Arch Psychiatry Clin Neurosci. (2021) 272:403–13. doi: 10.1007/s00406-021-01327-y
13. Voppel AE, de Boer JN, Brederoo SG, Schnack HG, and Sommer IEC. Quantified language connectedness in schizophrenia-spectrum disorders. Psychiatry Res. (2021) 304:114110. doi: 10.1016/j.psychres.2021.114130
14. Cecchi GA and Corcoran CM. Exploring language and cognition in schizophrenia: Insights from computational analysis. Schizophr Res. (2023) 259:1–3. doi: 10.1016/j.schres.2023.07.030
15. Nikzad AH, Cong Y, Berretta S, Hansel K, Cho S, Pradhan S, et al. Who does what to whom? graph representations of action-predication in speech relate to psychopathological dimensions of psychosis. Schizophr (Heidelb). (2022) 8:58. doi: 10.1038/s41537-022-00263-7
16. Irving J, Patel R, Oliver D, Colling C, Pritchard M, Broadbent M, et al. Using natural language processing on electronic health records to enhance detection and prediction of psychosis risk. Schizophr Bull. (2021) 47:405–14. doi: 10.1093/schbul/sbaa126
17. Spencer TJ, Thompson B, Oliver D, Diederen K, Demjaha A, Weinstein S, et al. Lower speech connectedness linked to incidence of psychosis in people at clinical high risk. Schizophr Res. (2021) 228:493–501. doi: 10.1016/j.schres.2020.09.002
18. Corcoran CM, Carrillo F, and Fernández-Slezak D. Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatry. (2018) 17:67–75. doi: 10.1002/wps.20491
19. Parola A, Lin JM, Simonsen A, Bliksted V, Zhou Y, Wang H, et al. Speech disturbances in schizophrenia: Assessing cross-linguistic generalizability of NLP automated measures of coherence. Schizophr Res. (2023) 259:59–70. doi: 10.1016/j.schres.2022.07.002
20. Zhang H, Parola A, Zhou Y, Wang H, Bliksted V, Fusaroli R, et al. Linguistic markers of psychosis in Mandarin Chinese: Relations to theory of mind. Psychiatry Res. (2023) 325:115253. doi: 10.1016/j.psychres.2023.115253
21. Andreasen NC. Scale for the assessment of thought, language, and communication (TLC). Schizophr Bull. (1986) 12:473–82. doi: 10.1093/schbul/12.3.473
22. MeCab: yet another part-of-speech and morphological analyzer. Available online at: http://taku910.github.io/mecab/ (Accessed July 1).
23. Kurohashi-Chu-Murawaki Lab, D.o.I.S.a.T. Graduate School of Informatics, Kyoto University. Available online at: https://nlp.ist.i.kyoto-u.ac.jp/?JUMAN (Accessed July 1, 2021).
24. Voleti R, Liss JM, and Berisha V. A review of automated speech and language features for assessment of cognitive and thought disorders. IEEE J Sel Top Signal Process. (2020) 14:282–98. doi: 10.1109/JSTSP.2019.2952087
25. DeSouza DD, Tang SX, and Danilewitz M. The burgeoning role of speech and language assessment in schizophrenia spectrum disorders. Psychol Med. (2022) 53:4825–6. doi: 10.1017/S0033291722001325
26. Chan CC, Norel R, Agurto C, Lysaker PH, Myers EJ, Hazlett EA, et al. Emergence of language related to self-experience and agency in autobiographical narratives of individuals with schizophrenia. Schizophr Bull. (2023) 49:444–53. doi: 10.1093/schbul/sbac126
27. Çabuk T, Sevim N, Mutlu E, Yağcıoğlu AEA, Koç A, Toulopoulou T, et al. Natural language processing for defining linguistic features in schizophrenia: A sample from Turkish speakers. Schizophr Res. (2024) 266:183–9. doi: 10.1016/j.schres.2024.02.026
28. Dalal TC, Liang L, Silva AM, Mackinley M, Voppel A, Palaniyappan L, et al. Speech based natural language profile before, during and after the onset of psychosis: A cluster analysis. Acta Psychiatr Scand. (2025) 151:332–47. doi: 10.1111/acps.13685
29. Tahir Y, Yang Z, Chakraborty D, Thalmann N, Thalmann D, Maniam Y, et al. Non-verbal speech cues as objective measures for negative symptoms in patients with schizophrenia. PloS One. (2019) 14:e0214314. doi: 10.1371/journal.pone.0214314
30. Gong B, Li N, Li Q, Yan X, Chen J, Li L, et al. The Mandarin Chinese auditory emotions stimulus database: A validated set of Chinese pseudo-sentences. Behav Res Methods. (2022) 55:1441–59. doi: 10.3758/s13428-022-01868-7
31. Finocchiaro C, Cattaneo L, Lega C, and Miceli G. Thematic reanalysis in the left posterior parietal sulcus: A TMS study. Neurobiol Lang. (2021) 2:416–32. doi: 10.1162/nol_a_00043
32. Corcoran CM, Mittal VA, Bearden CE, Gur RE, Hitczenko K, Bilgrami Z, et al. Language as a biomarker for psychosis: A natural language processing approach. Schizophr Res. (2020) 226:158–66. doi: 10.1016/j.schres.2020.04.032
33. Voppel AE, de Boer JN, Brederoo SG, Schnack HG, and Sommer IEC. Semantic and acoustic markers in schizophrenia-spectrum disorders: A combinatory machine learning approach. Schizophr Bull. (2023) 49:S163–71. doi: 10.1093/schbul/sbac142
34. Cabana A, Valle-Lisboa JC, Elvevag B, and Mizraji E. Detecting order-disorder transitions in discourse: implications for schizophrenia. Schizophr Res. (2011) 131:157–64. doi: 10.1016/j.schres.2011.04.026
35. Palaniyappan L. More than a biomarker: could language be a biosocial marker of psychosis? NPJ Schizophr. (2021) 7:42. doi: 10.1038/s41537-021-00172-1
Keywords: formal thought disorder, machine learning, natural language processing, schizophrenia spectrum disorders, speech
Citation: Liu J, Zhou S, Deng G, Ji M, Zhu X, He X, Kuang Q and She S (2026) Speech analytics across the schizophrenia spectrum disorders: multimodal natural language processing and machine learning modelling in a Chinese-speaking population. Front. Psychiatry 16:1725859. doi: 10.3389/fpsyt.2025.1725859
Received: 15 October 2025; Accepted: 08 December 2025; Revised: 05 December 2025;
Published: 06 January 2026.
Edited by:
Animesh Kumar Paul, University of Alberta, CanadaCopyright © 2026 Liu, Zhou, Deng, Ji, Zhu, He, Kuang and She. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Qijie Kuang, a3VhbmdxaWppZUBnemhtdS5lZHUuY24=; Shenglin She, c2hlbmdsaW5zaGVAZ3pobXUuZWR1LmNu
†These authors have contributed equally to this work
Jiaqi Liu†