Identifying Clinically and Functionally Distinct Groups Among Healthy Controls and First Episode Psychosis Patients by Clustering on EEG Patterns

Objective The mismatch negativity (MMN) is considered as a promising biomarker that can inform future therapeutic studies. However, there is a large variability among patients with first episode psychosis (FEP). Also, most studies report a single electrode site and on comparing case–control group differences. Few have taken advantage of the full wealth of multi-channel EEG signals to examine observable patterns. None, to our knowledge, have used machine learning (ML) approaches to investigate neurophysiological derived subgroups with distinct cognitive and functional outcome characteristics. In this study, we applied ML to empirically stratify individuals into homogeneous subgroups based on multi-channel MMN data. We then characterized the functional, cognitive, and clinical profiles of these neurobiologically derived subgroups. We also explored the underlying low frequency range responses (delta, theta, alpha) during MMN. Methods Clinical, neurocognitive, functioning data of 33 healthy controls and 20 FEP patients were collected. 90% of the patients had 6-month follow-up data. Neurocognition, social cognition, and functioning measures were assessed using the NCCB Cognitive Battery, the Awareness of Social Inference Test, UCSD Performance-Based Skills Assessment, and Multnomah Community Ability Scale. Symptom severity was collected using the PANSS. MMN amplitude and single-trial derived low frequency activity across 24 frontocentral channels were used as main variables in the ML k-means clustering analyses. Results We found a consistent pattern of two distinctive subgroups. We labeled them as “better functioning” and “poorer functioning” clusters, respectively. Each subgroup can be mapped onto either better or poorer clinical, cognitive, and functioning profiles. Also, we identified two subgroups of patients: one showed improved MMN and one showed worsening of MMN over time. Patients with improved MMN had better follow-up clinical, cognitive, and functioning profile than those with worsening MMN. Among the low frequency bands, delta frequency appeared to be the most relevant to the observed MMN responses in all individuals. However, higher delta responses were not necessarily associated with a better functioning profile, suggesting that delta frequency alone may not be useful in clinical characterization. Conclusions The ML approach could be a robust tool to explore heterogeneity and facilitate the identification of neurobiological homogeneous subgroups in FEP.


INTRODUCTION
Psychosis is one of the most disabling conditions worldwide (1,2). Early intervention can play a substantial role in improving longterm outcomes, although there is a large variability in treatment responsiveness in the first episode of psychosis (FEP) patients (3). FEP patients demonstrate a wide range of cognitive and neurophysiological impairments and are considerably heterogeneous in the functional outcome trajectories (4)(5)(6). Progress is undoubtedly hampered by considerable biological and clinical heterogeneity across FEP: effective treatments are unlikely to advance substantially until disease mechanisms are better understood, and biologically-based objective markers are available to tag the cardinal dysfunction, not the diagnoses or symptoms.
Detecting unexpected stimuli in the environment is a critical function of the auditory system. The mismatch negativity (MMN) is the best-studied electrophysiological marker of deviance detection. It is typically elicited in oddball paradigms and measured as the difference between the event-related potential (ERP) responses to deviant (e.g., duration, pitch) and standard stimuli. The magnitude of MMN ERP is typically greater in the frontocentral electrode sites and peaks between 100 and 250 ms (7,8). Larger responses (more negative) to rare/ deviant stimuli are thought to represent the detection of regularity violation (9)(10)(11). Such predominantly automatic (preattentive) process of detecting a "mismatch" between the deviant stimulus and a sensory memory trace might be a critical transitional step from sensory-based processing to the subsequent engagement of higher attentional neural networks necessary for cognitive and psychosocial functioning (12)(13)(14)(15)(16).
MMN, particularly the duration-deviant MMN (dMMN), is considered as a promising candidate biomarker that can inform future therapeutic studies of FEP (17)(18)(19)(20)(21)(22). Evidence suggests that MMN peak amplitude is a sensitive index of NMDA (15,(23)(24)(25) and nicotinic receptor functioning (26). In healthy and chronic schizophrenia (SZ) patients, MMN activity is significantly correlated with distinct domains of cognitive (27)(28)(29) and psychosocial, work functioning, and independent living functioning (17,(30)(31)(32). Furthermore, MMN deficit increases with the progression of the disease. Meta-analysis found that in FEP patients, the effect size of dMMN deficit is small to medium (Cohen's d = 0.47) (20); in chronic schizophrenia patients, a systematic increase in effect size was found as a function of illness duration, indicating that dMMN deficit reflects, to some extent, disease progression (33,34). Although dMMN deficit is reported in FEP patients, the small effect size suggests a large variability among patients. Given that heterogeneity is a key feature of FEP that manifests on clinical, neurobiological, and functioning levels resulting in a substantial barrier to understand disease mechanisms, a data driven approach to stratify FEP patients into homogeneous subgroups would allow a better understanding of the source of variability and biological mechanisms.
Moreover, the relationships between dMMN change over time and its corresponding clinical changes are largely unknown. Longitudinal MMN studies in FEP are rare. Pitch-deviant MMN studies showed that FEP patients with the most impaired MMN amplitudes at baseline showed the most severe disability at followup (35) and that MMN was intact at baseline in a majority of FEP patients but worsened at follow-up (36). Identifying subgroups of FEP patients with distinct patterns of dMMN changes over time as well as the corresponding clinical, cognitive, or functioning changes may facilitate early identification of a subgroup of patients at heightened risk for cognitive and functioning decline.
The majority of clinical studies of MMN typically focus on using a single electrode site (Fz) and on comparing case-control group differences. Few have taken advantage of the full wealth of information about brain dynamic processes and observable patterns contained in multi-channel MMN EEG signals. Conventional approaches based on group (casecontrol) comparisons assume controls and patients as homogeneous populations which does not adequately address the heterogeneity of within-group individual differences. Machine learning (ML) data-driven approaches to address heterogeneity have received renewed interest in partitioning individuals into more homogeneous subgroups (37,38). The advances in ML approaches make it possible to extract information from complex and high-dimensional data.
In terms of exploring the underlying EEG frequency responses during MMN, previous studies showed that MMN was primarily comprised of lower range frequency evoked oscillations including delta, theta, and alpha (39)(40)(41)(42). In controls, theta-alpha frequency was found to be the most significant contributor for MMN, while in patients with SZ spectrum disorders, delta range activities were found to explain the most variance of observed MMN abnormalities. MMN reflects activity primarily in low frequency band, which is thought to depend primarily upon interplay between cortical pyramidal neurons and somatostatin type local circuit GABAergic interneurons (42). However, few studies have examined patients with SZ (41). It is currently unknown to what extent the theta frequency activity plays a role in the generation of MMN in FEP patients. Moreover, it remains unclear to what extent the low frequency in the delta, theta, and alpha range activity is related to cognitive or daily functioning measures; that is, whether abnormal low frequency activity in the delta, theta, and alpha bands is correlated with impaired cognitive or functional measures with respect to the correlations found using MMN amplitude. Finally, the longitudinal change of low frequency range activity in FEP has not been reported in the literature.
In this study, we collected an array of functional, cognitive, clinical, and multi-channel MMN data from a cohort of controls and FEP patients (90% had 6-month follow-up data). We aimed to use an unsupervised ML clustering technique to address several key research questions: i) to stratify individuals into more homogeneous subgroups based on multi-channel MMN activities and examine whether each neurobiologically derived subgroup could be mapped onto a consistent pattern of functional, cognitive, and clinical profile; ii) to stratify FEP patients into distinct clusters based on the change of dMMN over a 6-month period and examine whether patient's MMN change in each cluster could correspond to a consistent pattern of follow-up functional, cognitive, and clinical profile; iii) to investigate the magnitude of low-frequency (delta, theta, alpha) activities in both controls and patients; iv) to stratify individuals into more homogeneous subgroups based on multi-channel low frequency activity and examine whether delta, theta, and alpha range activities are related to cognitive or daily functioning measures.
We applied data driven strategy, K-means, to first addressed heterogeneity in MMN and low frequency measures among patients and controls in a longitudinal study. ML algorithm empirically classifies individuals based on mathematical calculations of individual's multimodal MMN or each delta, theta, or alpha activities. We then performed analyses to characterize the cognitive, symptom severity, and functioning performances of the empirically derived clusters to address the question whether data driven classification results were clinically meaningful.

Participants
Demographics, MMN, cognition (NCCB MATRICS Consensus Cognitive Battery), and clinical data of 33 healthy controls and 20 FEP patients were collected. Eighteen of the FEP patients also had 6month follow-up data ( Table 1). Patients were identified and recruited for the study within 12 months of first episode of psychosis. After enrollment, clinical diagnostic interview and the series of tests including EEG and neurocognitive tests were administered within 45 days. Each subject was assessed by the Structured Clinical Interview for DSM-IV (SCID). Patients were clinically stable. Study inclusion criteria were: 1) age between 18 and 45 years; 2) fluency in English; 3) IQ > 70; 4) patients with FEP diagnosed with SZ, schizoaffective disorder, schizophreniform disorder, psychotic disorder not otherwise specified, or psychotic bipolar disorder. Exclusion criteria consisted of: 1) diagnosed neurological disorder; 2) brain injury including stroke or serious head injury resulting in loss of consciousness; 4) hearing impairments, blindness, or deafness; 5) electroconvulsive therapy within the past 6 months; 6) outside the age range of 18-45 years. HC subjects were recruited from the Partners Research Portal and subject to the same exclusion criteria plus the following: no current or past history of psychotic or affective disorders, no substance abuse or previous chronic dependence, and no first-degree relative with a history of psychosis or bipolar disorder. Patients with substance abuse or dependence within 6 months were excluded. As a history or lifetime diagnosis of substance abuse or dependence is common among patients, FEP patients with previous substance abuse history were not an exclusion. Similarly, depression is common in the general population (~20-25%), healthy controls having relatives with a history of depression were not an exclusion. The study was approved by the McLean Hospital Institutional Review Board. All subjects provided written informed consent after receiving a complete description of the study.  (46,47). The UPSA-B is a performance-based measure designed to evaluate participants' abilities to perform everyday tasks considered necessary for independent functioning in the community. Total scores range from 0 to 100 points; higher scores reflect better performance. Community functioning was evaluated using an abbreviated version of the Multnomah Community Ability Scale (MCAS) (48), an interview-based measure developed for assessment of community outcomes in psychiatric populations. This brief version probes several aspects of community functioning including independence in daily living, instrumental role functioning, and social interest and engagement (49)(50)(51). Clinical assessment was performed using the Positive and Negative Syndrome Scale (PANSS) subscales for Positive, Negative, and General symptoms (52).

EEG Procedures and Data Processing
The electroencephalogram (EEG) was recorded continuously using the BioSemi Active Two system (BioSemi Inc, Amsterdam, Netherlands) at a digitization rate of 512 Hz, with a bandpass of DC-104 Hz, and a Common Mode Sense (CMS) as the reference (PO2 site) using a 64-channel electrode cap. EOG electrodes were placed below and at the outer canthi of the left eye. A duration MMN paradigm was used to elicit MMN. Stimuli consisted of 1,200 trials presented to the subjects through foam insert earphones. 85% of the stimuli were standard [S1] tones (1,000 Hz, 100 ms), and 15% were duration deviant [S2] tones (1,000 Hz, 150 ms) with an inter-stimulus interval of 200 ms. Participants were instructed to watch a silent cartoon/video clip (BBC natural program or Charles Brown) during the stimulus presentation.
Data processing and analysis pipelines are presented in Figure 1 Flow Chart. Data processing was performed offline using Brain Vision Analyzer 2 (Brain Products GmbH, Munich, Germany) and MATLAB R2017b (The MathWorks, Massachusetts, USA) and blind to group membership using automated procedures. Signals were re-referenced to an average of the mastoids and bandpass filtered between 0.01 and 20 Hz. Data were segmented by stimulus marker from −100 to 400 ms for MMN analysis and from −100 to 280 ms for frequency analysis. Segments were baseline corrected using −100 to 0 ms pre-stimulus time and eye-blink corrected using established measures (53). Artifact rejection for individual channels was performed and a given segment was rejected if the voltage gradient exceeded 50 mV/ms, amplitude was +/−100 mV, or the signal was flat (<0.5 mV for >100 ms).
MMN waveforms were generated by subtracting standard (S1) from the deviant (S2) waveforms and the MMN amplitude was calculated as the mean amplitude (uV) between the time window of 120 to 250 s. Event-related low frequency measures were computed from the single-trial segments. Single-trial S1 and S2 segments were extracted after artifact rejection procedure using the Morlet wavelet transformation (squared wavelet transformation (uV 2 ) for delta: 1-4 Hz; theta: 4-8 Hz; alpha: 7.5-13 Hz). Originally, there were a total of 1,020 S1 and 180 S2 segments. Artifact rejection procedure and an additional step for removing bad intervals led to having approximately 981 S1 segments and 172 S2 segments for each subject per frequency band.
For each frequency band, we computed the sum of the average difference (AverageDifference) index to capture the overall differences between the averaged S2 and S1 segments in each of the 24 channels (see equation below). First we computed the average of S2 (AvgS2) and S1 (AvgS1) in each channel 'e'. C S2 is the number of S2 segments and C S1 is the number of S1 segments (e.g., C S1 = 981 for S1 segments vs. C S2 = 172 for S2 segments). Then AverageDifference is computed by summing up the difference between AvgS2 and AvgS1 over time 't', starting from the stimulus onset (0 ms) to the end of the segment (280 ms).
The AverageDifference for each channel e is defined as the following: Since AverageDifference was derived from squared wavelet transformed values and the summation of differences between AvgS2 and AvgS1, it is susceptible to extreme values which can significantly affect the clustering results. For instance, clustering a sample with one extreme value into two clusters can cause one cluster to contain only one subject while the other cluster contains the rest. To identify extreme outliers, we applied Median Absolute Deviation (MAD) procedure on the sum of the 24 channels, considered any observation with the AverageDifference value over three deviations away from the median to be an "outlier," and removed such observations from clustering analyses. This procedure removed a total of four observations (two controls and two follow-up FEP patients).

Cluster Analysis
To stratify subjects into more homogeneous subgroups, each individual's MMN amplitudes and AverageDifference values across 24 channels were used to derive cluster assignments. The k-means algorithm was used. We applied the elbow method to empirically estimated the optimal number of clusters (54,55). The elbow method calculates the cost function J for each of the cluster numbers (e.g., from 1 to 10) by minimizing the error. The steeper drop of cost function (error) the better modelling of the data. The empirical elbow method indicates that there were two distinct clusters (see Figure  4 and Supplementary Figures S1 and S2).

Clustering Analyses Using MMN Amplitude Over 24 Channels
Because this study includes controls and a patient cohort with longitudinal follow-up data, four sets of analyses were run, clustering among (i) controls and patients at baseline, (ii) controls and patients at follow-up, (iii) patients at baseline and follow-up, and (iv) patients' MMN changes over 6-month period (i.e., follow-up MMN amplitude minus baseline MMN amplitude at each electrode channel). Once clusters were

Clustering Analyses Using AverageDifference Over 24 Channels
Three separate clustering analyses were run based on the AverageDifference index, clustering among (i) controls and patients at baseline (ii) controls and patients at follow-up, and (iii) patients at baseline and patients at follow-up. Once clusters were determined at each run, the resulting cluster assignments were mapped onto individuals' clinical, cognitive, and functioning performances, as described earlier.

Clustering Analyses Using MMN Amplitude
Clustering results using MMN amplitude over 24 frontocentral channels were shown in Table 2A. In all three sets of clustering analyses (HC & baseline FEP; HC & follow-up FEP; baseline & follow-up FEP), the MMN amplitudes across all 24 electrode sites were larger (i.e., more negative) in Cluster 1 than those in Cluster 2. In each channel differences between cluster 1 and 2 were significant at p < 0.05.
We labeled the individuals in Cluster 1 as "Better functioning" and Cluster 2 as "Poorer functioning", respectively. The demographic, clinical, cognitive, functioning profiles of these clusters are presented in Table 2B. Patient only clustering results were in Supplementary Table S1. Results of clustering among controls and patient baseline ( Table 2B, left, Supplementary Figure  S3) showed individuals in the "Better functioning" group, 31% of whom were patients and performed better but not significantly on all of the neurocognition, social cognition, and functioning measures, with an exception of UPSA task. Results of clustering among controls and follow-up patients ( Table 2B, middle, Supplementary Figure S4) showed individuals in the "Better functioning" group, 30% of whom were patients and performed significantly better on all except two of the neurocognition, social cognition, and functioning measures. Results of clustering among baseline and follow-up patients ( Table 2B, right) showed that patients in the "Better functioning" performed better on all the neurocognition, social cognition, and functioning measures, as well as had lower symptom severity scores (less symptomatic), and that group differences were significant on MCAS independent, MCAS social subscore, and MATRICS social subscore. 30% of patients in the "Better functioning" had SZ diagnosis.

Clustering Analysis Using Changes in MMN Amplitude Over 6 Months
Results of patient's MMN change over 6-months across 24 frontocentral channels were shown in Table 3A. Consistently across all 24 channels, Cluster 1 patients had, on average, bigger MMN amplitudes at follow-up than their baseline MMN, resulting in more negative MMN changes. In contrast, patients in the Cluster 2 group had smaller MMN at follow-up than their baseline amplitude, resulting in more positive MMN change. In each channel except FC5 site, differences between clusters 1 and 2 were significant at p < 0.05 (Table 3A). These results indicated that patients in Cluster 1 as a group had improved MMN over time whereas patients in Cluster 2 had worsening MMN responses. The demographic, clinical, cognitive, functioning profiles of "better" and "poorer" clusters are presented in Table 3B. Patients in the "Better functioning" cluster performed better on all the neurocognition, social cognition, and functioning measures, as well as had lower symptom severity scores than those in the "Poorer functioning" group. However, differences in each cognitive or functioning measure were not statistically significant except for the MCAS social subscore.

The Magnitude of Delta, Theta, Alpha Frequency Activity in Controls and Patients
The overall magnitudes of the AverageDifference indices across three frequency bands were presented in Figure 5. Results FIGURE 3 | Averaged S1 and S2 responses across 40 channels of all subjects. X axis: time (−100 ms to 280ms). Y axis: squared wavelet values. In all participants, the frontal and central channels showed the greater averaged S1 and S2 response patterns than parietal channels.     showed that in both controls and patients the magnitudes of AverageDifference at delta frequency were statistically significantly higher than theta or alpha frequencies across most of the 24 channels ( Figure 5). In controls p-value differences between delta and theta and between delta and alpha were p =1.1E-28 and p = 1.8E-22, respectively; in patient's baseline, p-value differences were p =1.7E-21 and p = 1.1E-05, respectively. Thus, clustering analyses were only performed using AverageDifference index on delta frequency. We labeled Cluster 1 as "higher AverageDifference" and Cluster 2 as "lower AverageDifference", respectively. The demographic, clinical, cognitive, functioning profiles of these clusters were presented in Table 4B. Patient only clustering results were in Supplementary Table S2. Results of the clustering among controls and baseline patients ( Table 4B, left) showed that 41% of individuals in the "higher AverageDifference" group were patients. Individuals in the "higher AverageDifference" performed better on MCAS functioning measures (total score, independent subscore, and social subscore) than those in the "lower AverageDifference" but not on neurocognition, social cognition, or UPSA functioning tasks. Among controls and follow-up patients (Table 4B, middle), results showed that 29% in the "higher AverageDifference" cluster were patients. Individuals in the "higher AverageDifference" performed better on MCAS independent subscore and MATRICS social subscore, but worse on neurocognition, TASIT, and other functioning measures. Results of clustering among baseline and follow-up patients ( Table 4B, right) showed patients in the "higher AverageDifference" cluster performed better on MCAS measures (total score, independent subscore, and social subscore) and had lower symptom severity compared to those in the "lower AverageDifference" group but not on all the other measures (UPSA, neurocognition, MATRICS social, and TASIT). None was statistically significant at p < 0.05.

DISCUSSION
In this study, we used an unsupervised ML k-means algorithm to empirically stratify individuals into more homogeneous subgroups on the basis of multi-channel MMN data in a sample of controls and patients with FEP. We then characterized the functional, cognitive, and clinical profiles of these neurobiologically derived subgroups. We found, firstly, a consistent pattern of two distinctive subgroups across 24 frontocentral channels. Secondly, the two subgroups derived from MMN amplitude could be mapped onto either better or poorer clinical, cognitive, and functioning profiles. Thirdly, we examined the longitudinal MMN change over time and identified two subgroups of patients, one who showed improved MMN overtime and one who showed worsening of MMN overtime. Patients with improved MMN also had better follow-up clinical, cognitive, and functioning profile than those with worsening MMN. Fourthly, among the low frequency in the delta, theta, and alpha frequency bands, delta frequency appeared to be mostly relevant to the observed S1 and S2 EEG responses in both controls and patients. Finally, although delta frequency AverageDifference index could also empirically produce two distinctive subgroups, individuals in the higher AverageDifference cluster were not necessarily associated with better clinical, cognitive, and functioning profile than those in the lower AverageDifference group. To our knowledge, our study is the first to link neurophysiological derived subgroups with distinct cognitive and functional outcome characteristics. In addition, we demonstrated that variability in MMN change overtime is associated with symptom and functional outcomes.

Heterogeneity in Patients and Controls
Heterogeneity is a major barrier to understand disease mechanisms and identify individuals with different recovery trajectories. As Figure 4 shows, the EEG data of our sample clearly indicates that there exists a large variability not only among patients but also among controls. In addition, when both controls and patients were included in the MMN clustering analyses, 30% of patients were classified in the "better functioning" group. The combination of clustering techniques and multi-channel MMN activity employed in the current study facilitates the identification of neurobiological homogeneous subgroups. In two prior studies, we have used Kmeans analyses and identified distinct "Bio-classes" (38) among patients and controls and unique functional trajectories (4) among FEP patients that do not respect clinical diagnosis boundaries. Within each class, individuals shared a similar neurobiological profile or functional outcome trajectories that uniquely distinguished among the groups. These studies present a diagnosis-free approach to integrate information across biomarkers, yielding neurobiologically distinct subgroups and providing strong evidence supporting the superiority of neurobiological vs. clinical classification in differentiating psychotic disorders. The presence of distinct neurobiological profiles among controls and patients brings into question the appropriateness of using diagnosis based on patient-control comparison analysis in MMN research, particularly during the early stage of illness. The relatively small effect size of dMMN deficit (20) reported in the meta-analyses is consistent with the notion of a large variability among FEP patients. The two subgroups identified in each of the three clustering models using 24-channel MMN are highly consistent and concordant in terms of the overall sample characteristics. MMN across 24 channels was consistently and significantly bigger in one subgroup over the other (Table 2A). We labeled the larger MMN subgroup as "Better functioning" as individuals in this group were consistently showing an overall pattern of better cognitive, social cognition, and functioning performances than the "poor functioning" individuals, regardless of whether controls were included in the models or not. That is, although significant differences were not observed in some clinical, cognitive, and functioning variables, the overall "gestalt" pattern was consistent. These results suggested that the ML data driven approach is a useful strategy to reduce heterogeneity and provide insight into clinically meaningful subgroups of a cohort. Also, results among baseline and follow-up patient clustering (Table 2B, right) showed that, patients in the "Better functioning" performed better on all the neurocognition, social cognition, and functioning measures, as well as had lower symptom severity scores (less symptomatic). Among patients in the "Better functioning", 30% had SZ diagnosis. This    result is in line with the literature that patients with SZ generally exhibit more impairment and greater symptom severity compared to patients with affective psychosis. One prior study (35) has found that the most impaired baseline MMN amplitudes corresponded to the most severe functional disability at follow-up, consistent with the findings obtain in our study.     2.9 (9.9) -1.5 (9.9) 3.4 (11) -1.6 (11) 1.7 (0.9) -1.1 (0.9)

Longitudinal MMN Change Is Associated With Better Clinical and Outcome Characteristics
Results of our analyses showed that MMN changes across 24 channels were significantly improved in one subgroup over the other and that patients with improved MMN over the 6-months follow-up period also had better follow-up clinical, cognitive, and functioning profile than those with worsening MMN (Tables 3A, B).
It is striking that we observed again a consistent overall "gestalt" pattern that patients in the "Better functioning" cluster performed better on all the neurocognition, social cognition, and functioning measures, as well as had lower symptom severity scores than those in the "Poorer functioning" group. Although differences in most cognitive and functioning variables were largely non-significant, this is likely due to small sample size in each of the clusters (Ns = 10 and 8). Nonetheless, the overall pattern is consistent and among the variables examined, the UPSA, MATRICS social, and PANSS total had the biggest mean differences. These results support the notion that the ML data driven approach is useful to explore heterogeneity and facilitate the identification of neurobiological homogeneous subgroups. Koshiyama et al. (56) reported that MMN of FEP patients do not change significantly over time, while (36) observed that MMN deteriorated in patients over time. These prior reports also indicate heterogeneity in MMN change over time among FEP patients. Our present findings suggest that combining ML and follow-up clinical characterization approaches can potentially identify individuals at greater risk of poorer functional outcomes at a "critical period" of neuronal and psychosocial plasticity and for whom there is a "window of opportunity" for treatment to achieve disproportionately favorable outcomes. These individuals can be targeted for earlier, more aggressive treatment interventions, both pharmacologically and psychosocial/ cognitive intervention therapy, to reduce function deterioration and improve recovery.

Significant Contribution of Delta Frequency During MMN
In this study, we explored the underlying EEG frequency responses during MMN by decomposing MMN data into low frequency range activity and calculated frequency specific "sum difference" (AverageDifference) indices to capture the overall differences between the averaged S2 and S1 segments. We found that the magnitudes of AverageDifference at delta frequency were significantly higher than those of theta or alpha frequencies across all 24 channels in controls and patients ( Figure 5), suggesting that AverageDifference of delta frequency plays an important role in the MMN generation. Our results are consistent with Hong et al., 2012 who observed that delta range activities were found to explain the most variance of observed MMN abnormalities in SZ (41).

Comparison of MMN and AverageDifference of Delta Frequency
As shown in Table 4, two distinct subgroups could also be consistently obtained in all three clustering models using the AverageDifference index of delta frequency, one cluster with "higher AverageDifference" values and the other with "lower AverageDifference". While K-means generated a consistent pattern of two clusters using either multimodal MMN or low frequency measure, individuals in the higher AverageDifference cluster were not necessarily associated with better clinical, cognitive and functioning profile than those in the lower AverageDifference group. There are two major differences between MMN and the AverageDifference of delta frequency. MMN waveforms were generated directly by subtracting the average of S1 from the average of S2 waveforms between the time window of 120 to 250 ms. Although the same window was used for the AverageDifference, squared wavelet transformation is applied to each single trial response of the original S1 and S2 signals before computing the sum of the difference between averaged S1 and S2. In addition, AverageDifference index was derived using delta frequency only, which constitutes a subset of total MMN signals. As a result, the associated clinical and functioning variances were not fully captured in the AverageDifference of delta frequency. Our results suggest that MMN appeared to yield clinically meaningful interpretation, and MMN is superior to the AverageDifference index as a neurobiological measure for identifying clinically distinct subgroups and the application of squared wavelet transformation may not be an optimal method. Our study has a number of limitations. Although several interesting and consistent findings were found, the study sample size is relatively small, particularly the patient's longitudinal data. K-means is an exploratory research tool to discover new patterns. Although k-means clustering can sufficiently uncover patterns with a relatively small number of subjects (57,58), supervised learning methods would be used, and a larger number of subjects in the future is needed to prove the results are generalizable for a larger population (59). Second, there are no follow-up data for controls, which limit our ability to determine the stability of cognitive, functioning, and EEG measures or typical changes that occur in all individuals. Third, because our focus was on deriving more homogeneous subgroups using agnostic approach on the basis of MMN or low frequency activities, we did not investigate the degree of consistence within patients by comparing PANSSderived subgroups with MMN derived subgroups.

CONCLUSIONS
In conclusion, the ML data-driven approach is a useful tool in FEP psychosis research to address heterogeneity and facilitate identification of clinically meaningful subgroups and patterns between MMN and clinical, cognitive, functioning characteristics.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Partners Human Research Committee. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XQ, TH, and M-HH contributed to the conception and design of the study. AH organized the database. XQ, SL, and JT performed the statistical analysis. XQ wrote the first draft of the manuscript. M-HH, SL, and JT wrote sections of the manuscript. All authors contributed to the article and approved the submitted version.