- 1Department of Special Examination, Shaoxing People’s Hospital, Shaoxing, China
- 2School of Medicine, Shaoxing University, Shaoxing, China
- 3Department of Neurology, Shaoxing People’s Hospital, Shaoxing, China
- 4College of Mathematic Medicine, Zhejiang Normal University, Jinhua, China
- 5Department of Neurosurgery, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- 6Department of Rehabilitation, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
- 7Key Laboratory for Biomedical Engineering of Ministry of Education of China, Zhejiang University, Hangzhou, China
Schizophrenia (SCZ) is a severe mental disorder that impairs brain function and daily life, while its early and objective diagnosis remains a major clinical challenge due to the reliance on subjective assessments. This study aims to develop a machine learning-based framework for the auxiliary diagnosis of SCZ using multi-dimensional electroencephalogram (EEG) features and to investigate the underlying neural alterations. Resting-state EEG data were obtained from 45 male patients with pediatric SCZ and 39 age-and gender-matched healthy controls. Three types of EEG features (relative power (RP), fuzzy entropy (FuzEn), and functional connectivity (FC)) were extracted under various time window lengths and fed into four ensemble learning models. A data-driven feature selection approach (Recursive Feature Elimination) was applied to identify the most informative features, resulting in 212 most discriminative features (48 RP, 40 FuzEn, and 124 FC) out of the initial 760. Leveraging the selected features, the Categorical Boosting model achieved the highest classification accuracy of 99.60% at the 4-s window. Further analysis of the discriminative features revealed that the altered EEG characteristics were mainly in the alpha, beta, and gamma bands. Particularly, altered FCs exhibited a fronto-increase-parieto-decrease pattern mainly in the right hemisphere along with spectral-dependent RP alterations and a universally reduced FuzEn in the pediatric SCZ group. In summary, this study not only showcases the potential of advanced ensemble learning algorithms in precisely identifying pediatric SCZ, but also provides new insights into the altered brain functions in pediatric SCZ patients, which may benefit the future development of automatic diagnosis systems.
1 Introduction
Schizophrenia (SCZ) is a severe mental disorder characterized by symptoms such as cognitive impairments, persistent hallucinations and delusions, which significantly affect patients’ daily functioning and quality of life (Hirano and Uhlhaas, 2021). According to the latest research estimates in 2024, SCZ affects approximately 0.32% of the global population (Oprea et al., 2024). There are approximately 24 million SCZ patients worldwide, making it one of the top 25 leading causes of disability (Rahul et al., 2024). SCZ typically onsets in late adolescence or early adulthood and can have a lasting impact throughout the patient’s life (Buckley and Miller, 2015). Currently, antipsychotic medications are the primary treatment for SCZ, with about 70% of patients experiencing symptom improvement through appropriate treatment. However, the long-term outcomes of the disease vary widely among individuals: about 25% of patients achieve good recovery, 50% have moderate disability, and 25% experience significant persistent symptoms throughout their lives (Ranjan et al., 2024). The mortality rate among SCZ patients is two to three times higher than that of the general population, and their average lifespan is ten to twenty years shorter than that of healthy individuals (Laursen, 2011). SCZ not only causes profound suffering to patients but also imposes a heavy burden on families and society. Therefore, early diagnosis and deeper understanding of the neural mechanisms of SCZ, especially for pediatric SCZ, are crucial for improving treatment outcomes, reducing the disease burden, and enhancing patients’ quality of life (Wang et al., 2024; Li et al., 2024).
EEG, as a non-invasive technique for monitoring neural electrical activity, has the advantages of low cost, high temporal resolution, portability, flexible experimental design, and strong real-time feedback ability (Pan et al., 2024), and has extensive application value in neuroscience and brain function research (Pan et al., 2025; Pei et al., 2025). SCZ, as a complex and highly heterogeneous mental disorder, EEG shows unique advantages in revealing its underlying neural mechanisms. Numerous studies have demonstrated significant EEG abnormalities in patients with SCZ (Perrottelli et al., 2021; Perrottelli et al., 2022; Grohn and Eriksson, 2022; Hamilton and Northoff, 2021). For instance, aberrations in EEG power have been associated with specific cognitive impairments, such as deficits in verbal learning and memory function in SCZ patients (Koshiyama et al., 2021; Tanaka-Koshiyama et al., 2020). The elevated or reduced power anomalies exhibited by these patients significantly correlate with cognitive dysfunction, reflecting disturbances and imbalances in brain activity (Hamilton and Northoff, 2021; Iglesias-Tejedor et al., 2022). Moreover, the irregularity and complexity of EEG signals as assessed by entropy exhibited abnormal patterns in EEG signals of SCZ patients both at rest and during cognitive tasks (Goshvarpour and Goshvarpour, 2022; Molina et al., 2020). These entropy anomalies not only indicate disrupted neural network synchronization within the brain but also directly relate to patients’ negative symptoms and impairments in verbal memory ability, indicating that entropy disturbances might be a core factor in cognitive decline and could potentially serve as a biomarker for assessing disease progression and treatment efficacy (Molina et al., 2020). Recent findings have revealed that SCZ is associated with functional dysconnectivity between disparate brain regions (Fornito et al., 2012). Studies have found significantly reduced connectivity between key functional areas such as the prefrontal cortex and temporal region, and this diminished connectivity may be closely linked to the manifestation of symptoms, including cognitive impairments (Naim-Feil et al., 2018). Additionally, EEG-based dynamic studies of brain networks have revealed altered dynamic patterns of brain network connectivity in SCZ patients during cognitive tasks, characterized by increased global efficiency, decreased clustering coefficients, and changes in connection strength within specific brain regions. These changes are particularly evident within specific time windows following cognitive stimulation, further reflecting dynamic imbalances in brain function (Sun et al., 2019; Yan T. et al., 2023). In sum, multi-dimensional EEG measures have emerged as potent tools for unveiling the underlying mechanisms of SCZ and provide profound insights into the pathophysiological basis of SCZ.
In recent years, machine learning techniques have played an increasingly pivotal role in the diagnosis of SCZ, particularly in the analysis of resting-state EEG signals, where they have demonstrated remarkable advantages (Perellón-Alfonso et al., 2023; Yan W. et al., 2023; Lin et al., 2023). Numerous studies has concentrated on extracting multi-dimensional features from EEG to precisely identify patterns of brain electrical activity associated with SCZ. These studies can be broadly categorized into two main directions. First, a significant body of research has focused on applying advanced machine learning algorithms, such as adaptive neuro-fuzzy inference systems and 3D convolutional neural networks, achieving high classification accuracies for SCZ at 99.92% (Najafzadeh et al., 2021) and 97.74% (Shen et al., 2023), respectively. These investigations not only validate the efficacy of machine learning in the interpretation of complex EEG signals but also lay a solid foundation for its application in SCZ diagnosis. Second, other studies’ efforts are directed towards extracting EEG features and integrating them with traditional machine learning or deep learning models to further enhance diagnostic accuracy and provide interpretable findings. For instance, a study based on brain functional connectivity analysis, which fused different connectivity measures combining Partial Directed Coherence and PLI features, attained an accuracy of 95.16% (Zhao et al., 2021). Another study combined three effective connectivity measures (partial directed coherence, direct directed transfer function, and transfer entropy) with convolutional neural networks and transfer learning, elevating the diagnostic accuracy to 96.67% (Bagherzadeh and Shalbaf, 2024). These achievements not only uncover specific alterations in brain functional connectivity and other features in SCZ patients but also provide new insights into optimizing machine learning for EEG feature extraction and disease diagnosis.
While EEG has shown great promise in SCZ research, most existing studies rely on single-dimensional features or focus solely on adult populations, limiting their ability to reveal comprehensive neural patterns. To address these gaps, this study proposed a systematic analytical framework to dissect the brain functional mechanisms in pediatric SCZ patients. The primary contributions of this paper are summarized as follows:
1. Integration of multi-dimensional EEG features: Univariate power spectrum, fuzzy entropy, and multivariate functional connectivity were extracted to capture spectral, nonlinear, and network-level characteristics of brain activity.
2. Machine learning based feature selection and classification: Ensemble learning algorithms were employed to identify the most informative feature subset, enabling accurate differentiation between pediatric SCZ patients and healthy controls.
3. Revealing abnormal brain mechanisms: Group-level analyses were conducted to uncover specific alterations in power, entropy, and functional connection, providing insights into the electrophysiological dysfunctions associated with pediatric SCZ.
2 Materials and methods
2.1 Participants
The data used in this study were publicly available from the Mental Health Research Center (MHRC), Russian Academy of Medical Sciences, including 45 boys diagnosed with schizophrenic disorders (infant SCZ, schizotypal and schizoaffective disorders corresponding to F20, F21 and F25 according to the ICD-10) and 39 age-matched healthy participants. The age of patients ranged from 10 years and 8 months to 14 years, while the healthy participants ranged from 11 years to 13 years and 9 months. The mean age of both groups is 12 years and 3 months. The diagnoses of the patients were performed and confirmed by specialists of the MHRC. None of the patients were undergoing chemotherapy during the examination period at the MHRC. Further details pertaining to the clinical characteristics of patients could be found in reference (Borisov et al., 2005). The current study with the objective of data analysis, was approved by the Institutional Review Board of the Shaoxing People’s Hospital.
2.2 EEG data recording and preprocessing
EEG data were recorded from 16 channels while the participants were in an awake and relaxed state with their eyes closed. The electrode positions (i.e., F3, F4, F7, F8, C3, C4, Cz, T3, T4, T5, T6, P3, P4, Pz, O1, and O2) were placed in accordance with the 10–20 international standard system. The reference electrode was the left and right mastoid and the sampling frequency was set as 128 Hz. A previously validated standard EEG preprocessing pipeline was adopted for raw EEG signals (Dimitrakopoulos et al., 2018), the specific preprocessing procedure included the following steps: (1) A bandpass filter was applied to filter the data to the 0.5–45 Hz range, in order to remove low-frequency drifts and high-frequency EMG interference; (2) The EEG signals were re-referenced using an average reference across all electrodes; (3) Fast ICA was used to extract independent components, and artifacts were identified with the help of manual inspection and the ICLabel tool, in order to remove non-neural artifacts such as eye movements, blinks, and muscle activity; (4) The same filter was used to further divide the signals into five standard frequency bands: Delta (0.5–4 Hz), Theta (4–8 Hz), Alpha (8–13 Hz), Beta (13–30 Hz), and Gamma (30–45 Hz); (5) The preprocessed and band-divided signals were segmented into six time windows of different durations, providing a foundation for the extraction of multidimensional EEG features across multiple temporal scales. All preprocessing steps were conducted using customized codes in MATLAB 2021b (The MathWorks, Inc., U. S.) and the EEGLAB toolbox (Delorme and Makeig, 2004).
2.3 Feature extraction
After obtaining the artifact-free preprocessed EEG data, three widely used features that cover linear/nonlinear univariate and multivariate domains were adopted in this work for feature extraction, including relative power spectrum (RP), fuzzy entropy (FuzEn), and phase lag index (PLI). The extracted features were subsequently used as inputs for the following machine learning models.
Relative Power Spectrum (RP): For a given EEG signal x(t) (t = 1, 2, 3, …, N; N is the time point of x(t)), its spectrum x(f) can be estimated using Fast Fourier Transform. The power spectrum Px(f) is then obtained via . Then, RP of EEG band can be estimated by Equation 1:
where fh and fl are the upper and lower limits of different rhythms, and fm and fn are the frequency bounds of the EEG signal. The RP was estimated within each frequency band for each channel, resulting 16 × 5 RP features.
Fuzzy Entropy (FuzEn): For a given EEG signal x(t), it can be reconstructed into a set of m-dimensional vectors , (t = 1, 2, …, N-m + 1), where m is the embedding dimension, is the mean value, and N is the length of the given signal x(t). Then the distance between two vectors and can be calculated as Equation 2:
Then the Om(r) could be estimated by Equation 3:
The FuzEn of the given signal x(i) could be obtained by Equation 4 (Al-sharhan et al., 2001):
In the current work, the embedding dimension m is set to 2, and r is determined by k × δ. Here, k is a constant value set to 0.2 (typically ranging between 0.10 and 0.25), and δ is the standard deviation of the EEG signal x(i). Within each frequency band, the FuzEn was estimated for each channel, leading to 16 × 5 features.
Phase Lag Index (PLI): PLI was adopted to estimate the functional connectivity for its superiority in minimizing the influence well-known volume conduction and common sources (Stam et al., 2007). For a given pair of two EEG signals xk(t) and xl(t), the instantaneous phase is calculated using the Hilbert transform as Equation 5:
where Zk and Zl are instantaneous amplitude, ϕk(t) and ϕl(t) are the instantaneous phases at t moment, is the Hilbert transform of each time series. Then the PLI between these two signals could be defined as Equation 6:
where means the absolute value, and sign stands for the signum function as Equation 7:
The range of PLI values is from 0 to 1. A large PLI value indicates a strong degree of phase synchronization between the pair of EEG signals, e.g., PLI = 0 indicates no coupling while PLI = 1 means two signals are in complete phase synchronization. After the functional connections of all pairs of channels were estimated, a 16 × 16 PLI matrix was obtained for each frequency band. Given that PLI(k, l) = PLI(l, k), a total number of (16 × 15 / 2) × 5 PLI features were obtained.
2.4 Ensemble learning models
Once we obtained the EEG features, four widely-used ensemble learning models that were popular in classification studies of EEG signals were adopted here to assess the performance of pediatric SCZ identification, including Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM).
RF is an ensemble learning method that enhances prediction accuracy and stability by combining multiple decision trees. Each tree is trained on a random subset of the data and selects features randomly at each split, which helps reduce the risk of overfitting. Ultimately, RF derives the final prediction by aggregating the predictions of all trees. It is widely used in data science due to its simple implementation, rapid training, and robustness to outliers and noise.
XGBoost is a powerful gradient boosting algorithm, highly regarded for its exceptional predictive performance. It constructs a series of weak learners (typically decision trees) to progressively reduce prediction errors and improve model accuracy. XGBoost incorporates regularization techniques to control model complexity and overfitting, while employing efficient column sampling and parallel processing strategies that significantly enhance training speed and predictive performance. Additionally, it can handle missing values and offers a variety of flexible parameter tuning options, making it excellent in practical applications.
CatBoost is an efficient ensemble learning framework particularly suitable for handling complex categorical feature data. It employs symmetric decision trees and ordered boosting techniques to reduce overfitting and enhance the model’s generalization ability. CatBoost automatically processes categorical features and missing data, simplifying the data preprocessing pipeline, and accelerates model convergence through adaptive learning rate adjustments, thereby demonstrating outstanding performance in various application scenarios.
LightGBM is a fast and efficient gradient boosting framework designed for large-scale datasets. It adopts a histogram-based learning approach and incorporates a series of optimization strategies, such as Gradient-based One-Side Sampling (GOSS) and parallel processing of leaf splitting, to improve training speed and prediction accuracy. LightGBM excels in handling high-dimensional features and big data, making it widely applicable across various domains.
Taking the 4-s time window as an example, the dataset contained a total of 1,260 samples, including 585 HC samples and 675 SCZ patient samples. A 10-fold cross-validation approach was employed as the model basis, where the dataset was randomly divided into ten subsets, with nine as the training set and the remaining one as the test set. As such, by iterating this procedure 10 times, we could obtain the average classification performance metrics, including accuracy, precision, Recall, and F1 score.
2.5 Feature selection
Recursive feature elimination (RFE) is a model-based feature selection algorithm designed to identify the most significant features for model prediction performance from a feature set (Yan and Zhang, 2015). It operates through an iterative process to progressively evaluate and eliminate features that contribute less to model performance, thereby achieving feature selection and model optimization. Initially, a baseline model is trained using all features, and importance scores are calculated for each feature. These importance scores are typically derived from model coefficients or the extent to which features influence prediction outcomes. Subsequently, features are ranked based on their importance, and those with the least contribution to the model are gradually eliminated. After each feature elimination, RFE retain the model and recalculates the importance scores for the remaining features. This process continues until a predetermined number of features remain or until there is no significant improvement in model performance. This stepwise elimination strategy effectively identifies the most predictive features while reducing the interference of redundant features, thereby enhancing model accuracy and stability. By systematically selecting features, RFE helps us to identify the most discriminative features. This approach not only improves the computational efficiency but also mitigates the risk of overfitting, thereby improving the model’s generalization ability. Heuristically, RFE provides a pure data-driven approach for quantifying the contribution of each feature to the final prediction outcomes and reveals the role of each feature within the overall model through feature importance ranking, thereby facilitating the interpretation of the etiology of SCZ.
3 Results
3.1 Classification performance
The abovementioned multidimensional EEG features were estimated using a time window approach. To evaluate the influence of the window length on the classification performance, the proposed analysis framework was applied using six different window lengths: 1, 2, 3, 4, 5, and 6 s. Specifically, multidimensional EEG features were estimated within each non-overlapping window and set as input for the ensemble learning models. Figure 1 illustrates the classification performance of four ensemble learning models under six window lengths. As shown in Figure 1, longer window lengths led to improved estimation of EEG features, which in turn enhanced classification performance. All models achieved satisfactory results when the window length exceeded 2 s, while further increases in window duration produced only marginal improvements. The CatBoost classifier demonstrated the best classification performance with a 4 s time window (Table 1). Therefore, the window length of 4-s was used for the following analyses.
3.2 Feature selection
Given that the full feature set comprises 760 features, potentially leading to feature redundancy, this study conducted feature selection based on the entire feature set. Considering the superior performance of CatBoost, it was chosen as the base model for RFE. By applying the RFE algorithm to the CatBoost model, the contribution scores were obtained for all features and then sorted in ascending order, thereby determining the feature importance ranking. Subsequently, the least contributive features were iteratively removed based on their importance and the CatBoost model was retrained by adding one feature per cycle until all features were traversed. The accuracy peaked at 99.60% when the number of input features for the model was 212. The variation of its accuracy rate with the number of features is shown in Figure 2. Therefore, the top 212 features with the highest contribution scores were selected as the optimal feature subset. We then further interrogated the frequency distribution of the obtained optimal feature subset and found a predominance toward high frequency bands (Delta /Theta /Alpha /Beta/Gamma = 12/31/54 /64/51) (Table 2).
3.3 Spatio-spectral distribution of the discriminative features
Once we have obtained the optimal feature subset, we then look into the spatio-spectral distribution of the discriminative features separately. The brain topographic maps of relative power features are presented in Figure 3. Specifically, an increase of RP in delta, theta and beta and a decrease in alpha and gamma band were observed in pediatric SCZ patients. The regions with differences in delta were mainly distributed in frontal, temporal, and occipital areas, the regions with differences in theta were mainly distributed in central and occipital areas, whereas the regions with differences in beta, alpha, and gamma were spread across the entire brain. In Figure 4, we showed the topographic maps of FuzEn in both groups. For pediatric SCZ patients, a universal decrease pattern was observed in four frequency bands. No discriminative FuzEn feature was revealed in the Delta band. In terms of the spatial distribution, we found a fronto-central predilection in Theta, Alpha and Beta bands, while spread across the brain in the Gamma band. The PLI distribution results, as well as the corresponding proportion of each brain region are depicted in Figure 5. The research findings a predominantly increased PLI pattern was revealed in Theta, Beta and Gamma bands, linking frontal, central and parietal areas, where a decreased PLI pattern was found in the Alpha band, linking frontal, parietal and occipital regions with a rightward predilection.

Figure 3. Topographic maps of relative power (RP) for five EEG rhythms in (a) HC and (b) pediatric SCZ groups. The red dots indicate the channels selected in the optimal feature subset.

Figure 4. Topographic maps of fuzzy entropy (FuzEn) for five frequency bands in (a) HC and (b) pediatric SCZ groups. The red dots indicate the channels selected in the optimal feature subset.

Figure 5. Topological distribution of the PLI features and the spatial distribution pie plot over five frequency bands. Red edges indicate that the PLI values for pediatric SCZ are higher than those for the HC group, while blue edges represent that the PLI values for pediatric SCZ are lower than those for the HC group.
4 Discussion
This study has established an innovative analytical framework, which relies on a multi-dimensional EEG feature set determined by the optimal time window length. By integrating ensemble learning models with feature selection algorithms, the framework aims to extract EEG features that contribute most significantly to the classification accuracy of pediatric SCZ and exhibit the most pronounced differences, thereby providing deeper insights into the brain functional mechanisms of pediatric SCZ patients. The main findings are as follows: (1) Satisfactory classification performance is achieved through incorporating multidimensional EEG features with ensemble learning models, and reaches the best performance using the CatBoost model under 4-s time window (classification accuracy = 99.60%). (2) Based on the analysis of the optimal feature subset corresponding to the highest accuracy, we investigate the spatio-spectral distribution and find pediatric SCZ is characterized as a complex dysconnectivity pattern mainly in the alpha, beta and gamma bands. This dysconnectivity pattern is accompanied by abnormal distributions of relative power and fuzzy entropy features in specific frequency bands. These findings will be discussed in detail below.
4.1 Classification performance of Pediatric SCZ
Due to the high temporal resolution of EEG signals, the corresponding time window length for feature extraction would inevitably influence the performance of the machine learning framework. The determination of optimal time window length has long been a popular research topic in recent SCZ classification studies. However, complex findings were reported in determining the optimal time window length. For instance, Shen et al. introduced a deep learning framework to identify SCZ using the same publicly available dataset (Shen et al., 2023). Specifically, EEG dynamic functional connectivity features were extracted with different time-window lengths (i.e., 2, 5, 10, 30 s) before a 3D convolutional neural network. They reported that a monotonic increasing trend of classification performance was obtained with the increase of time window length (from 80.13% in a 2-s window to 97.74% in a 30-s window) (Shen et al., 2023). However, a relatively short time window (1 s with 50% overlapping) was used on the same dataset to compute the effective FC features and achieved a satisfactory classification accuracy of 91.69% (Phang et al., 2020). The short time window (0.1–0.6 s post stimulus onset) was also adopted in a recent study and led to a classification performance of 95.15%. In exploring the influence of EEG time window on the classification performance of pediatric SCZ, we reveal the crucial role of time window length through rigorous experimental design and multi-dimensional EEG feature extraction. We found that the classification performance was saturated when the time window length was higher than 2 s and reached the best performance at 4 s (accuracy = 99.21%). The discrepancies could stem from the following two aspects: feature extraction methods and experimental design (resting-state vs. task design). Collectively, these studies underscore the criticality of time window selection in EEG signal processing and highlight the importance of optimizing time window length for improving classification performance.
The satisfactory recognition accuracy achieved in this study with a 4-s time window is attributed to the combined application of the CatBoost ensemble learning algorithm and feature selection algorithm. As an advanced ensemble learning method, the CatBoost algorithm excels in handling high-dimensional data and imbalanced datasets. The findings in this study reaffirm its effectiveness in complex EEG signal analysis. Meanwhile, the introduction of the feature selection algorithm, by eliminating redundant and irrelevant features, retains the most discriminative EEG features, thereby significantly enhancing classifier performance. This finding aligns with other research, which similarly achieved a significant improvement in SCZ patient classification accuracy through Bayesian optimization for selecting the best machine learning model and hyperparameters (Keihani et al., 2022). Notably, this study complements the research by Soria et al. They compared different machine learning systems and found that the ensemble learning algorithm performed well in SCZ classification with an accuracy of 94% (Soria et al., 2023). Therefore, these findings not only provide powerful technical support for the early diagnosis and treatment of SCZ but also offer new research ideas and methodological guidance for future EEG signal analysis in mental disorders, which may benefit the future development of automatic diagnosis systems. Of note, we have also compared the classification performance of the current work with several most recent studies using the same publicly available dataset (Table 3). In comparison with these previous studies, where fine-tuning neural network structure was utilized on single-domain EEG features, a lightweight ensemble learning model was adopted that delivers satisfactory performance (2nd best). We believe the rich information embedded in the EEG signals could be extracted from multiple domains that would lead to a comprehensive understanding of the etiology of pediatric SCZ.

Table 3. Comparison of the best performance in the current work and recent studies in SCZ classification using the same dataset.
4.2 Spatio-spectral distribution of the Most discriminative features
In order to explore the characteristics of EEG signals in pediatric SCZ patients, this study introduced a data-driven framework through incorporating ensemble learning models and a feature selection approach. Compared to the previous studies with complex deep learning or neural network structures, the framework provides direct correspondence with EEG characteristics with interpretable neurophysiological meanings. Specifically, pediatric SCZ patients exhibit complex alteration patterns across different frequency bands (i.e., an increase in delta, theta, and beta bands, while a decrease in alpha and gamma bands). This finding was in line with previous studies (Iglesias-Tejedor et al., 2022; Light et al., 2006; Zhang et al., 2021). This power change pattern is particularly prominent in the temporal and occipital regions, suggesting that these regions may play crucial roles in the pathological process of SCZ. Given the important role of the temporal region in memory, emotion and auditory processing and the occipital regions in visual processing, these alterations may represent a disrupted brain dynamic oscillation that may lead to the well-known hallucination and delusion symptoms. Moreover, a universal decrease pattern was revealed in fuzzy entropy in theta, alpha, beta, and gamma bands, indicating a less complex and unpredictable nature of EEG signals in patients. This finding was in line with a recent work, where Molina and colleagues reported deficits in spectral entropy modulation in patients with chronic and first-episode SCZ (Molina et al., 2020). In terms of brain network alterations/reorganization, the current work employed advanced methods to conduct in-depth research on functional connectivity across the whole brain in SCZ patients. Specifically, a widely-used PLI was adopted here to estimate the functional connectivity for its superiority in attenuating the influence of EEG volume conduction and common sources (Stam et al., 2007), leading to the intrinsic functional interactions. Among the most discriminative features, over half of them are PLI features (124 out of 212), indicating that SCZ is related to aberrant connectivity between distinct brain regions rather than abnormalities within the separate regions themselves. Moreover, we found a complex and widespread dysconnectivity pattern across five frequency bands in pediatric SCZ patients. It is noteworthy that the results of this study are highly consistent with those of previous brain connectome studies (Koshiyama et al., 2020). Contemporary theories suggest SCZ as a disorder of brain dysconnectivity or a disorder of brain network organization (Sheffield and Barch, 2016; Bassett et al., 2012). Our observations therefore extend previous studies of chronic and/or first-episode SCZ in adults to pediatric SCZ and provide further evidence for the notion of SCZ as a disconnection syndrome. Collectively, through a comprehensive analysis of relative power, fuzzy entropy, and functional connectivity features of EEG signals in SCZ patients, this study has revealed functional connectivity reorganization across brain regions and frequency bands, as well as abnormal distributions of relative power and fuzzy entropy features in specific frequency bands. These findings not only provide important clues for understanding the neurophysiological basis of pediatric SCZ but also offer potential biomarkers for future diagnosis and treatment.
4.3 Future considerations
Some issues should be considered when interpreting our findings. First, a widely-used publicly available dataset was used in the current work that includes 84 participants (pediatric SCZ/HC = 45/39). The relatively small sample size and the inclusion of only male participants may limit the generalizability and reproducibility of our findings. Evidence of gender differences in the brain and neurocognitive function in SCZ has long been recognized (Mendrek and Mancini-Marie, 2016). We opted for this choice to maximize the number of existing classification studies with which our results could be directly compared without the need to consider the influences of clinical and demographic differences between different datasets. Nevertheless, further studies with a larger independent study sample and the inclusion of both genders are recommended to confirm our observations. Second, a heterogeneous group of patients was recruited in the current work that includes infant SCZ, schizotypal and schizoaffective disorders. It is noteworthy that the heterogeneous phenotype of SCZ patients might be a potential influence in extracting EEG features that contribute to classification due to divergent neurophysiological mechanisms. Future research should delve deeper into the brain function characteristics of pediatric SCZ subtypes to provide more specialized strategies to assist in automatic diagnosis.
5 Conclusion
This study proposes an analytical framework that leverages multidimensional EEG features combined with ensemble learning models and feature selection algorithms, to identify the most discriminative EEG features between the pediatric SCZ and HC groups, ultimately revealing unique brain functional alterations in pediatric SCZ patients. The results indicated that the CatBoost algorithm achieved a 99.21% accuracy in identifying pediatric SCZ patients. Additionally, 212 most discriminative features were screened from a total of 760 features, constituting a key subset for pediatric SCZ recognition. Further analysis of the optimal feature subset revealed that pediatric SCZ patients exhibited complex dysconnectivity architecture accompanied by abnormal distributions of relative power and fuzzy entropy features in specific frequency bands. The findings of this study not only improved the accuracy of pediatric SCZ identification but also provided potential biomarkers for the automatic diagnosis of pediatric SCZ.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: http://brain.bio.msu.ru/eeg_schizophrenia.htm.
Ethics statement
The studies involving humans were approved by Institutional Review Board of Shaoxing People’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.
Author contributions
YM: Conceptualization, Methodology, Validation, Writing – original draft. FW: Conceptualization, Validation, Writing – original draft. SW: Validation, Writing – original draft. ZW: Validation, Writing – original draft. GL: Conceptualization, Formal analysis, Methodology, Writing – original draft. XQ: Conceptualization, Funding acquisition, Supervision, Validation, Writing – original draft, Writing – review & editing. YS: Conceptualization, Formal analysis, Funding acquisition, Methodology, Supervision, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported in part by the Science and Technology Special Project of the Institute of Wenzhou, Zhejiang University under gran XMGL-KJZX-202203 and XMGL-CX-202401, and in part by the Zhejiang Provincial Natural Science Foundation of China under grant LQ19E050011 and LTGY23H180015.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Gen AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Al-sharhan, S., Karray, F., Gueaieb, W., and Basir, O., Fuzzy entropy: a brief survey, 10th IEEE international conference on fuzzy systems, Melbourne, Australia, (2001), pp. 1135–1139.
Aslan, Z., and Akin, M. (2020). Automatic detection of schizophrenia by applying deep learning over spectrogram images of EEG signals. Traitement du Signal 37, 235–244. doi: 10.18280/ts.370209
Bagherzadeh, S., and Shalbaf, A. (2024). EEG-based schizophrenia detection using fusion of effective connectivity maps and convolutional neural networks with transfer learning. Cogn. Neurodyn. 18, 2767–2778. doi: 10.1007/s11571-024-10121-0
Bassett, D. S., Nelson, B. G., Mueller, B. A., Camchong, J., and Lim, K. O. (2012). Altered resting state complexity in schizophrenia. NeuroImage 59, 2196–2207. doi: 10.1016/j.neuroimage.2011.10.002
Borisov, S. V., Kaplan, A., Gorbachevskaia, N. L., and Kozlova, I. A. (2005). Analysis of EEG structural synchrony in adolescents suffering from schizophrenic disorders. Fiziol. Cheloveka 31, 16–23
Buckley, P. F., and Miller, B. J. (2015). Schizophrenia research: a Progress report. Psychiatr. Clin. North Am. 38, 373–377. doi: 10.1016/j.psc.2015.05.001
Delorme,, and Makeig, S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. doi: 10.1016/j.jneumeth.2003.10.009
Dimitrakopoulos, G. N., Kakkos, I., Dai, Z., Wang, H., Sgarbas, K., Thakor, N., et al. (2018). Functional connectivity analysis of mental fatigue reveals different network topological alterations between driving and vigilance tasks. IEEE Trans. Neural Syst. Rehabil. Eng. 26, 740–749. doi: 10.1109/TNSRE.2018.2791936
Fornito, A., Zalesky, C. P., and Bullmore, E. T. (2012). Schizophrenia, neuroimaging and connectomics. NeuroImage 62, 2296–2314. doi: 10.1016/j.neuroimage.2011.12.090
Goshvarpour,, and Goshvarpour, A. (2022). Schizhophrenia diagnosis by weighting the entropy measures of the selected EEG channels. J. Med. Biol. Eng. 42, 898–908. doi: 10.1007/s40846-022-00762-z
Grohn, E. N., and Eriksson, L. (2022). A systematic review of the neural correlates of multisensory integration in schizophrenia. Schizophr. Res. Cogn. 27:100219. doi: 10.1016/j.scog.2021.100219
Hamilton,, and Northoff, G. (2021). Abnormal ERPs and brain dynamics mediate basic self disturbance in schizophrenia: a review of EEG and MEG studies. Front. Psychiatry 12:642469. doi: 10.3389/fpsyt.2021.642469
Hirano, Y., and Uhlhaas, P. J. (2021). Current findings and perspectives on aberrant neural oscillations in schizophrenia. Psychiatry Clin. Neurosci. 75, 358–368. doi: 10.1111/pcn.13300
Iglesias-Tejedor, M., Diez, A., Llorca-Bofi, V., Nunez, P., Castano-Diaz, C., Bote, B., et al. (2022). Relation between EEG resting-state power and modulation of P300 task-related activity in theta band in schizophrenia. Prog. Neuro-Psychopharmacol. Biol. Psychiatry 116:110541. doi: 10.1016/j.pnpbp.2022.110541
Keihani, S. S., Sajadi, M. H., and Ferrarelli, F. (2022). Bayesian optimization of machine learning classification of resting-state EEG microstates in schizophrenia: a proof-of-concept preliminary study based on secondary analysis. Brain Sci. 12:1497. doi: 10.3390/brainsci12111497
Koshiyama, M., Miyakoshi, K., Tanaka-Koshiyama, Y. B., Joshi, J. L., Molina, J., Sprock, D. L. B., et al. (2020). Neurophysiologic characterization of resting state connectivity abnormalities in schizophrenia patients. Front. Psych. 11:608154. doi: 10.3389/fpsyt.2020.608154
Koshiyama, M., Miyakoshi, K., Tanaka-Koshiyama, Y. B., Joshi, J., Sprock, D. L. B., and Light, G. A. (2021). Abnormal phase discontinuity of alpha-and theta-frequency oscillations in schizophrenia. Schizophr. Res. 231, 73–81. doi: 10.1016/j.schres.2021.03.007
Laursen, T. M. (2011). Life expectancy among persons with schizophrenia or bipolar affective disorder. Schizophr. Res. 131, 101–104. doi: 10.1016/j.schres.2011.06.008
Li, Z., Ren, H., Tian, Y., Zhou, J., Chen, W., Ouyang, G., et al. (2024). Neurofeedback technique for treating male schizophrenia patients with impulsive behavior: a randomized controlled study. Front. Psych. 15:1472671. doi: 10.3389/fpsyt.2024.1472671
Light, G. A., Hsu, J. L., Hsieh, M. H., Meyer-Gomes, K., Sprock, J., Swerdlow, N. R., et al. (2006). Gamma band oscillations reveal neural network cortical coherence dysfunction in schizophrenia patients. Biol. Psychiatry 60, 1231–1240. doi: 10.1016/j.biopsych.2006.03.055
Lin, R., Li, Q., Liu, Z., Zhong, S., Sun, Q., Guo, H., et al. (2023). Abnormalities in electroencephalographic microstates among violent patients with schizophrenia. Front. Psych. 14:1082481. doi: 10.3389/fpsyt.2023.1082481
Mendrek,, and Mancini-Marie, A. (2016). Sex/gender differences in the brain and cognition in schizophrenia. Neurosci. Biobehav. Rev. 67, 57–78. doi: 10.1016/j.neubiorev.2015.10.013
Molina, V., Lubeiro, A., de Luis Garcia, R., Gomez-Pilar, J., Martin-Santiago, O., Iglesias-Tejedor, M., et al. (2020). Deficits of entropy modulation of the EEG: a biomarker for altered function in schizophrenia and bipolar disorder? J. Psychiatry Neurosci. 45, 322–333. doi: 10.1503/jpn.190032
Naim-Feil, J., Rubinson, M., Freche, D., Grinshpoon, A., Peled, A., Moses, E., et al. (2018). Altered brain network dynamics in schizophrenia: a cognitive electroencephalography study. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 3, 88–98. doi: 10.1016/j.bpsc.2017.03.017
Najafzadeh, H., Esmaeili, M., Farhang, S., Sarbaz, Y., and Rasta, S. H. (2021). Automatic classification of schizophrenia patients using resting-state EEG signals. Phys. Eng. Sci. Med. 44, 855–870. doi: 10.1007/s13246-021-01038-7
Oprea, D. C., Mawas, I., Morosan, C. A., Iacob, V. T., Camanaru, E. M., Cristofor, A. C., et al. (2024). A systematic review of the effects of EEG neurofeedback on patients with schizophrenia. J. Pers. Med. 14:763. doi: 10.3390/jpm14070763
Pan, H., Li, Z., Fu, Y., Qin, X., and Hu, J. (2024). Reconstructing visual stimulus representation from EEG signals based on deep visual representation model. IEEE Trans. Hum. Mach. Syst. 54, 711–722. doi: 10.1109/THMS.2024.3407875
Pan, H., Tong, S., Song, H., and Chu, X. (2025). A miner mental state evaluation scheme with decision level fusion based on multidomain EEG information. IEEE Trans. Hum. Mach. Syst. 55, 289–299. doi: 10.1109/THMS.2025.3538162
Pei, Y., Zhao, S., Xie, L., Luo, Z., Zhou, D., Ma, C., et al. (2025). Identifying stable EEG patterns in manipulation task for negative emotion recognition. IEEE Trans. Affect. Comput., 1–15. doi: 10.1109/TAFFC.2025.3551330
Perellón-Alfonso, R., Oblak, A., Kuclar, M., Skrlj, B., Pileckyte, I., Skodlar, B., et al. (2023). Dense attention network identifies EEG abnormalities during working memory performance of patients with schizophrenia. Front. Psych. 14:1205119. doi: 10.3389/fpsyt.2023.1205119
Perrottelli, G. M., Giordano, F., Brando, L., Giuliani, P., Pezzella, A. M., and Galderisi, S. (2022). Unveiling the associations between EEG indices and cognitive deficits in schizophrenia-spectrum disorders: a systematic review. Diagnostics 12:2193. doi: 10.3390/diagnostics12092193
Perrottelli, G. M., Giordano, F., Brando, L. G., and Mucci, A. (2021). EEG-based measures in at-risk mental state and early stages of schizophrenia: a systematic review. Front. Psychiatry 12:653642. doi: 10.3389/fpsyt.2021.653642
Phang, C. R., Noman, F., Hussain, H., Ting, C. M., and Ombao, H. (2020). A multi-domain connectome convolutional neural network for identifying schizophrenia from EEG connectivity patterns. IEEE J. Biomed. Health Inform. 24, 1333–1343. doi: 10.1109/JBHI.2019.2941222
Rahul, J., Sharma, D., Sharma, L. D., Nanda, U., and Sarkar, A. K. (2024). A systematic review of EEG based automated schizophrenia classification through machine learning and deep learning. Front. Hum. Neurosci. 18:1347082. doi: 10.3389/fnhum.2024.1347082
Ranjan, R., Sahana, B. C., and Bhandari, A. K. (2024). Deep learning models for diagnosis of schizphrenia using EEG signals: emerging trends, challenges, and prospects. Arch. Computat. Methods Eng 31, 2345–2384. doi: 10.1007/s11831-023-10047-6
Sairamya, N. J., Subathra, M. S. P., and Thomas George, S. (2022). Automatic identification of schizophrenia using EEG signals based on discrete wavelet transform and RLNDiP technique with ANN. Expert Syst. Appl. 192:116230. doi: 10.1016/j.eswa.2021.116230
Sheffield, J. M., and Barch, D. M. (2016). Cognition and resting-state functional connectivity in schizophrenia. Neurosci. Biobehav. Rev. 61, 108–120. doi: 10.1016/j.neubiorev.2015.12.007
Shen, M., Wen, P., Song, B., and Li, Y. (2023). Automatic identification of schizophrenia based on EEG signals using dynamic functional connectivity analysis and 3D convolutional neural network. Comput. Biol. Med. 160:107022. doi: 10.1016/j.compbiomed.2023.107022
Soria, Y., Arroyo, A. M., Torres, M. A., Redondo, C. B., and Mateo, J. (2023). Method for classifying schizophrenia patients based on machine learning. J. Clin. Med. 12:4375. doi: 10.3390/jcm12134375
Stam, C. J., Nolte, G., and Daffertshofer, A. (2007). Phase lag index: assessment of functional connectivity from multi channel EEG and MEG with diminished bias from common sources. Hum. Brain Mapp. 28, 1178–1193. doi: 10.1002/hbm.20346
Sun, Y., Collinson, S. L., Suckling, J., and Sim, K. (2019). Dynamic reorganization of functional connectivity reveals abnormal temporal efficiency in schizophrenia. Schizophr. Bull. 45, 659–669. doi: 10.1093/schbul/sby077
Tanaka-Koshiyama, K., Koshiyama, D., Miyakoshi, M., Joshi, Y. B., Molina, J. L., Sprock, J., et al. (2020). Abnormal spontaneous gamma power is associated with verbal learning and memory dysfunction in schizophrenia. Front. Psych. 11:832. doi: 10.3389/fpsyt.2020.00832
Wang, L., Wang, L., Chen, J., Qiu, C., Liu, T., Wu, Y., et al. (2024). Five-week music therapy improves overall symptoms in schizophrenia by modulating theta and gamma oscillations. Front. Psych. 15:1358726. doi: 10.3389/fpsyt.2024.1358726
Yan, T., Wang, G., Liu, T., Li, G., Wang, C., Funahashi, S., et al. (2023). Effects of microstate dynamic brain network disruption in different stages of schizophrenia. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 2688–2697. doi: 10.1109/TNSRE.2023.3283708
Yan, W., Yu, L., Liu, D., Sui, J., Calhoun, V., and Lin, Z. (2023). Multi-scale convolutional recurrent neural network for psychiatric disorder identification in resting-state EEG. Front. Psych. 14:1202049. doi: 10.3389/fpsyt.2023.1202049
Yan, K., and Zhang, D. (2015). Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors Actuators B Chem. 212, 253–363. doi: 10.1016/j.snb.2015.02.025
Zhang, Y., Geyfman, A., Coffman, B., Gill, K., and Ferrarelli, F. (2021). Distinct alterations in resting-state electroencephalogram during eyes closed and eyes open and between morning and evening are present in first-episode psychosis patients. Schizophr. Res. 228, 36–42. doi: 10.1016/j.schres.2020.12.014
Keywords: pediatric schizophrenia, electroencephalogram, ensemble learning, feature selection, brain function
Citation: Mao Y, Wang F, Wang S, Wang Z, Li G, Qi X and Sun Y (2025) Ensemble learning techniques reveals multidimensional EEG feature alterations in pediatric schizophrenia. Front. Hum. Neurosci. 19:1530291. doi: 10.3389/fnhum.2025.1530291
Edited by:
Chang-Hwan Im, Hanyang University, Republic of KoreaReviewed by:
Lei Shang, Nanjing University of Aeronautics and Astronautics, ChinaJaved Khan, University of Science and Technology, Pakistan
Copyright © 2025 Mao, Wang, Wang, Wang, Li, Qi and Sun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Gang Li, bGlnYW5nQHpqbnUuY24=; Xuchen Qi, cWl4dWNoZW5Aemp1LmVkdS5jbg==