- Faculty of Computing, Harbin Institute of Technology, Harbin, China
Introduction: Resting-state functional magnetic resonance imaging (rs-fMRI) is a widely used non-invasive technique for investigating brain function and identifying potential disease biomarkers. Compared with task-based fMRI, rs-fMRI is easier to acquire because it does not require explicit task paradigms. However, functional connectivity measures derived from rs-fMRI often exhibit poor reliability, which substantially limits their clinical applicability.
Methods: To address this limitation, we propose a novel method termed time-enhanced functional connectivity, which improves reliability by identifying and removing poor-quality time points from rs-fMRI time series. This approach aims to enhance the quality of functional connectivity estimation without extending scan duration or relying on dataset-specific constraints.
Results: Experimental results demonstrate that the proposed method significantly improves performance in downstream machine learning tasks, such as sex classification. In addition, time-enhanced functional connectivity yields higher test–retest reliability and reveals more pronounced statistical differences between groups compared with conventional functional connectivity measures.
Discussion: These findings suggest that selectively removing low-quality time points provides a practical and effective strategy for improving the reliability and sensitivity of functional connectivity measurements in rs-fMRI, thereby enhancing their potential utility in both neuroscience research and clinical applications.
1 Introduction
Functional magnetic resonance imaging (fMRI) has emerged as a widely used non-invasive technology for exploring neurophysiology and identifying biomarkers (Piani et al., 2022). In recent years, there has been an exponential growth in research focusing on resting-state fMRI (rs-fMRI; Buckner et al., 2013). Functional connectivity, which refers to the statistical relationships between the time series of blood-oxygen level dependent (BOLD) signals (Friston, 2011), is a popular method for investigating features of the human brain (Vértes, 2012), making inferences about individual subjects (Gratton et al., 2018), and predicting cognitive behavior (Finn et al., 2015).
Typically, Pearson correlation is commonly used to estimate the functional connectivity matrix, and it has demonstrated relatively high accuracy in identifying individual “fingerprints.” However, it is more susceptible to temporal fluctuations in the BOLD signal compared to other frequency-based connectivity estimation methods (Mahadevan, 2021). Additionally, functional connectivity measurements suffer from poor reliability. Studies have shown that the reliability of functional connectivity can range from poor to moderate (Braun, 2012; Guo et al., 2012; Li et al., 2012), which falls short of clinical standards.
One of the most widely discussed factors affecting the reliability of functional connectivity is excessive head motion, which leads to scan artifacts (Mahadevan, 2021; Van Dijk et al., 2012; Vanderwal et al., 2015; Noble et al., 2019). The reliability of rs-fMRI can be improved by excluding subjects with extreme motion or by regressing out head motion. Furthermore, other factors that may degrade the reliability of functional connectivity include system-related noises (Foerster et al., 2005; Power, 2017), subtle movements during scanning (Power, 2017; Hajnal et al., 1994; Power et al., 2015), and physiological signals such as cardiac and respiratory fluctuations (Evans et al., 2015; Yan, 2009; Chang et al., 2009; Windischberger, 2002; Birn et al., 2006).
Research has shown that functional connectivity with higher test–retest reliability performs better than lower reliability connectivity in machine learning prediction tasks (Guo et al., 2012; Elliott et al., 2019; Wang, 2017). As a result, many researchers have sought to improve the reliability of functional connectivity. For instance Elliott et al. (2019) computed functional connectivity by combining both rs-fMRI and task-based fMRI (t-fMRI) data, Wang (2017) removed volumes associated with strong sleepiness, and Ciric et al. (2018); Gorgolewski (2013); and Zuo et al. (2013) attempted to reduce the impact of motion artifacts.
All these studies aim to enhance the reliability of functional connectivity either by maintaining the length of the time series or by incorporating additional time series data. Even in studies that remove time points related to drowsiness (Wang, 2017), a fixed proportion of time points is discarded, followed by a comparison of reliability metrics between relatively drowsy and relatively alert states. However, the methods mentioned above are challenging to directly apply to other datasets. First, few rs-fMRI datasets include additional t-fMRI data for integration, as used in Elliott et al. (2019). Second, most fMRI datasets lack the necessary physiological signals for regression, and it is also difficult to ensure that the proportion of drowsiness during scanning is consistent across subjects. To address these limitations, we propose an approach to improve the calculation of functional connectivity by removing time points based on a fixed criterion. In this method, we compute the functional connectivity matrix for each subject using a personalized time series length, determined by how many time points are removed according to a consistent threshold. We refer to this new functional connectivity matrix as time-enhanced functional connectivity (TeFC).
We tested our hypothesis on a dataset that includes time-point labels, published by Li (2023), which provides detailed annotations of the periods during fMRI scans when self-generated thoughts occurred. Self-generated thought appears to be an unconstrained mental process that lacks much direction from attention or cognitive control (Mildner and Tamir, 2019). This phenomenon commonly occurs during resting-state fMRI scans and can reduce the reliability of the data (Zuo and Xing, 2014). This process may involve activities such as visual mental imagery, inner language, auditory mental imagery, and somatosensory awareness (Delamillieure et al., 2010; Diaz, 2014), each engaging different brain regions. Consequently, self-generated thought could influence the representation of various brain areas across different networks. Therefore, in our research, we categorized the time points associated with self-generated thought as poor-quality time points in rs-fMRI and removed them from the analysis.
In our study, we used three different types of time series to calculate the functional connectivity matrix for each subject. The first type involved time series after removing the noisy time points, and the resulting matrix was labeled as TeFC. The second type included the entire time series, without excluding any points except for basic preprocessing. The third type consisted of the time points that were dropped, and the matrix calculated from this subset was termed thought functional connectivity (tFC). We then assessed the test–retest reliability of these connectivity measures and used each to train machine learning models. Additionally, we conducted statistical analyses based on the different functional connectivity matrices and compared the results. Ultimately, the TeFC outperformed the original functional connectivity in our experiments, validating our hypothesis that removing poor-quality time points based on a fixed criterion enhances the measurement of functional connectivity.
2 Materials and experiments
2.1 Datasets
We conducted our experiments on the fMRI data from the Think-Aloud dataset published by Li (2023), which contains 86 healthy adult participants (41 males and 45 females; mean age = 22.1 ± 2.7 years) from the same center. All participants were free from MRI contraindications, psychiatric or neurological disorders, the use of psychotropic medications, and any history of substance or alcohol abuse. As described in Li (2023), each participant was instructed to speak aloud in the scanner whenever self-generated thoughts occurred, with the start and end times of these events being recorded.
2.2 Preprocessing
The fMRI data was preprocessed using the DPARSF (Data Processing Assistant for Resting-State fMRI) module within the DPABI (Data Processing & Analysis for Brain Imaging) toolbox (Yan et al., 2016). The preprocessing steps and parameters were as follows:
Slice-timing correction (Sladky et al., 2011) to account for differences in acquisition times across slices.
Realignment using a six-parameter linear transformation with a two-pass procedure.
Co-registration with T1-weighted MPRAGE images.
Segmentation was performed using Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra (DARTEL; Ashburner, 2007).
Normalization of the images to the Montreal Neurological Institute (MNI) space using DARTEL, with the voxel size resampled to 3 × 3 × 3 mm.
Smoothing with a 4 mm full-width at half-maximum (FWHM) Gaussian kernel.
We did not apply global signal regression (GSR) because there is a great deal of controversy in the application of GSR (Murphy and Fox, 2017; Liu et al., 2017).
2.3 Functional connectivity computation
We calculated the functional connectivity using the mean time series extracted from two different templates: the Automated Anatomical Labeling (AAL) template (Tzourio-Mazoyer et al., 2002) and the Schaefer-400 template (Schaefer et al., 2018). The AAL template includes 116 regions of interest (ROIs), while the Schaefer-400 template consists of 400 ROIs. In the Schaefer-400 template, each ROI is assigned to a corresponding network within the seven-network parcellation as defined by Yeo (2011). These networks include the default mode network (DMN), visual, somatomotor, dorsal attention, salience/ventral attention, limbic, and control networks. The time series data for each subject is represented as , where is the number of ROIs, and is the length of the time series. In this study, is either 116 or 400, depending on the template used, and =305.
For each subject, we split the time series into two segments based on the labels of the time points. The first segment is , which excludes self-generated thought periods, while the second segment is , consisting only of self-generated thought time points, where . In this research, the mean for all subjects is 178.8 while the mean is 126.2. The detailed proportion for each subject can be found in the Supplementary Table S1. We then calculate the Pearson correlation coefficient (PCC) for the two segments and independently, as well as for the entire original time series for comparison. The Pearson correlation between the time series of the -th region and -th region is computed as follows:
where represents the time series of -th region. We then estimate the fully connected functional connectivity matrices based on , , and , respectively. The corresponding results are (functional connectivity), (TeFC), and (tFC), respectively.
In addition, we also estimate functional connectivity by combining different proportions of and , and divide and into four equal segments for subsequent calculations, as will be elaborated in Section 2.4 and Section 2.5.
2.4 Gender classification
This dataset’s demographic information, as described in Li (2023), includes data on sex, age, and several psychological scales. Since the dataset consists of healthy young adults, significant differences in these variables are difficult to capture. Therefore, we selected gender classification as the machine learning task. A support vector machine (SVM) is trained for classification. SVM is considered a robust approach for classification and could also be tested as a baseline for performance improvement comparison.
To perform gender classification, we trained SVM models using three different functional connectivity matrices: original functional connectivity, TeFC, and tFC, respectively. For each model, the upper triangle of the functional connectivity matrix was flattened into a feature vector, which served as the input to the SVM. The feature count was 6,670 for the AAL template and 79,800 for the Schaefer-400 template. We evaluated model performance using 10-fold cross-validation, ensuring robust and unbiased results.
Additionally, we trained SVM models based on functional connectivity matrices computed from different proportions of time points and . We constructed functional connectivity matrices by concatenating varying proportions. For instance, a proportion of 0.4 meant selecting 40% time points of and 60% time points of , concatenating them to form a new time series, and calculating the functional connectivity from that to train the SVM model. We conducted experiments with proportions ranging from 0 to 1 (0 means the model are trained by and vice versa), in increments of 0.05 (i.e., 0, 0.05, 0.1,..., 0.95, 1), for both the AAL and Schaefer-400 templates, which yielded 21 results per template.
2.5 Test–retest reliability
To evaluate the test–retest reliability of both TeFC and tFC, we divided the time series and into four equal segments, treating each segment as a separate session. Each of these segments was independently used to estimate functional connectivity, yielding 16 functional connectivity matrices in total: 4 for TeFC using the AAL template, 4 for tFC using the AAL template, 4 for TeFC using the Schaefer-400 template, 4 using the Schaefer-400 template. To quantify the reliability, we computed the intra-class correlation (ICC) across these different sessions for each connectivity. Specifically, we applied ICC (3.1) as described in Shrout and Fleiss (1979), since we assume that functional connectivity in rs-fMRI should be a stable feature over time. The computation is as follows:
where is the between-subject mean squared strength, is the within-subject mean squared strength, and is the number of sessions, which in this case is 4. We compute these ICC values using the Pingouin package in Python.
In addition, we also calculated the reliability of several graph-theoretical metrics, such as degree centrality and clustering coefficient. To be more specific, we constructed the undirected graph of functional connectivity based on the threshold from 0.2 to 0.8 and calculate the value of degree centrality and clustering coefficient of each region, respectively. This is designed to discover the reliability of functional connectivity calculation based on different nodes and different strength of connection.
2.6 Statistical analysis
To systematically compare different functional connectivity measures at the level of statistical analysis, we applied multivariate distance matrix regression (MDMR; Shehzad, 2014) to identify the primary brain network differences between males and females. MDMR enables parameter-free quantification of whole-brain network reorganization, facilitating unbiased detection of connectomes differences. For our analysis, we filtered the functional connectivity matrices through MDMR to pinpoint key regions exhibiting statistically significant differences (p < 0.05) between sexes. We performed the MDMR analysis separately using the original functional connectivity, TeFC and tFC with the Schaefer-400 template, to assess and compare the sensitivity of each functional connectivity measure in capturing sex-based differences at the network level.
We also conducted pairwise t-tests to compare the mean Framewise Displacement (FD) Jenkinson between the two states in order to evaluate the potential influence of head motion on the verbal report. In addition, we compared the DVARS values across the two states to assess differences in BOLD signal fluctuations between conditions.
3 Result
3.1 Prediction performance
Table 1 and Table 2 show the result of sex classification based on different templates, respectively. On the one hand, in Table 1, the SVM model trained by TeFC has the best accuracy (0.743), recall (0.698), precision (0.826) and AUC score (0.760), while the model trained by tFC has the lowest accuracy (0.552), recall (0.552) and AUC score (0.549). In addition, the model trained by the original functional connectivity and TeFC together with tFC have the moderate performance between TeFC and tFC. On the other hand, Table 2 shows the highest accuracy (0.741) in the model trained by TeFC while lowest (0.649) in tFC. In addition, the model trained by both TeFC and tFC have highest recall (0.762), precision (0.755) and AUC score (0.734). The SVM model trained by tFC have the lowest accuracy (0.649), recall (0.651), precision (0.694) and AUC score (0.655), which is similar to the result of Table 1. Moreover, the model trained by original functional connectivity have the moderate performance between them.
Table 1. The performance of SVM model trained by different functional connectivity measures based on AAL template (oFC, original functional connectivity).
Table 2. The performance of SVM model trained by different functional connectivity measures based on Schaefer-400 template (oFC, original functional connectivity).
Based on the result of Table 1 and Table 2, it can be obviously found that the template of Schaefer 400 is better than the AAL in the area of sex classification. The average performance of it is better than the model trained by AAL template FC even if the worst tFC is used to train the model. In addition, the model trained by both TeFC and tFC based on the Schaefer 400 template may discover more information than only using TeFC.
Furthermore, Figure 1 illustrates the performance of gender classification trained using functional connectivity matrices computed with different ratios of and . The figure demonstrates that SVM performance in the gender classification task improves as more low-quality time-series data are excluded. Strong associations are observed for both the AAL (Spearman’s R = 0.851, p < 0.001) and Schaefer-400 (Spearman’s R = 0.872, p < 0.001) templates, indicating large effect sizes. Drawing from the results in Tables 1, 2, as well as Figure 1, it is evident that the SVM performance remains consistent across templates, with minimal differences observed after most low-quality time points have been excluded.
Figure 1. SVM model performance trained on functional connectivity matrices computed with different proportions of the time series.
3.2 Reliability analysis
Figure 2 presents the test–retest reliability of different functional connectivity computations using Intraclass Correlation Coefficient (ICC) scores. The figure shows that most connectivity values achieve higher ICC scores when calculated with TeFC, regardless of the segmentation template used. Additionally, Tables 3, 4 reinforce this finding, indicating that ICC scores are generally higher for TeFC than for tFC. For the Schaefer-400 template, the highest ICC score for tFC only meets the benchmark for “good” reliability, whereas TeFC achieves this benchmark for 296 connections, with one connection even reaching the “excellent” reliability standard. The average ICC score is also notably higher in TeFC (0.374) compared to tFC (0.234). Similar results were observed in the AAL template. Numerous connections reach the “moderate” reliability benchmark, highlighting the validity of connections between ROIs in resting-state fMRI scans. These results demonstrate that TeFC, which excludes low-quality time points, provides a more reliable and stable measurement of functional connectivity.
Figure 2. The ICC scores for each value in the functional connectivity matrix. Subfigures (A,B) show the ICC scores for TeFC and tFC based on the AAL template, respectively, with subfigure (C) illustrating the difference between them. Similarly, subfigures (D,E) depict the ICC scores for TeFC and tFC based on the Schaefer-400 template, with subfigure (F) showing the difference between them (Vis, visual network; SomMot, somatomotor network; DorsAttn, dorsal attention network; SalVentAttn, salience/ventral attention network; Limbic, limbic network; Cont, control network; Default, default mode network).
We also observe the ICC distributions in different Yeo subnetworks. In Figure 3, it is evident that all TeFC is more stable than tFC across all subnetworks. Additionally, the limbic network demonstrates the highest average ICC score in intra-connections while exhibiting the lowest in inter-connections, as shown in Figure 4. This phenomenon is present in both TeFC and tFC, and the reasons will be discussed in the next chapter.
Figure 3. The mean ICC score of each subnetwork, where (A) shows the mean ICC score of intra-connections within each subnetwork, and (B) shows the mean ICC scores of inter-connections between each subnetwork and other subnetworks. The blue and yellow bars represent TeFC and tFC, respectively (Vis, visual network; SomMot, somatomotor network; DorsAttn, dorsal attention network; SalVentAttn, salience/ventral attention network; Limbic, limbic network; Cont, control network; Default, default mode network).
Figure 4. The mean ICC score of each subnetwork, where (A) shows the mean ICC score of TeFC and (B) shows the mean ICC scores of tFC. The blue bars and yellow bars represent the internal connections of each subnetwork and the external connections with other subnetworks, respectively (Vis, visual network; SomMot, somatomotor network; DorsAttn, dorsal attention network; SalVentAttn, salience/ventral attention network; Limbic, limbic network; Cont, control network; Default, default mode network).
Figure 5 presents a very interesting phenomenon: when brain networks are computed with lower thresholds, the ICC of graph-theoretical metrics is higher in the non-thinking state; however, when higher thresholds are applied, the ICC of these metrics becomes higher in the thinking state. A consistent trend was observed within each subnetwork. The corresponding results are provided in the Supplementary Figure S1 for details. In addition, we also tried to make the sex classification based on degree centrality and clustering coefficient based on different thresholds. However, it is difficult to achieve stable and accurate gender classification using these metrics; therefore, we did not present the corresponding results in the manuscript.
Figure 5. The mean ICC score of degree centrality and clustering coefficient based on the atlas of AAL and Schaefer-400. The blue curve and the yellow curve represent the trends of the mean ICC values of degree centrality and clustering coefficient across all nodes in the resting state and the thinking state, respectively, as the edge connection threshold varies. The shaded areas represent the range of the computed standard deviation.
3.3 Data distribution
Figure 6 demonstrates that Multivariate Distance Matrix Regression (MDMR) analysis using TeFC identifies more significant ROIs than when using original functional connectivity. Specifically, MDMR with TeFC detects 7 significant ROIs (1 area of default mode network in right temporal lobe, 1 area of control network in right lateral prefrontal cortex, 1 area of limbic network in right temporal pole, 1 area of somatomotor network in right hemisphere, 1 area of visual network in right hemisphere, 1 area of salience/ventral attention network in left precentral gyrus, and 1 area of somatomotor network in left hemisphere), whereas the original functional connectivity identifies only 3 (1 area of default mode network in right temporal lobe, 1 area of control network in right lateral prefrontal cortex, and 1 area of limbic network in left temporal pole). Furthermore, MDMR analysis using traditional functional tFC does not reveal any ROIs with significant differences.
Figure 6. The result of MDMR analysis based on TeFC (A), tFC (B), and original functional connectivity (C), respectively.
3.4 Head motion and DVARS
Figure 7 presents the differences in motion- and signal-related quality metrics between the “rest” and “think” states. The mean FD (Jenkinson) is significantly higher in the “think” state compared with the “rest” state (p < 0.001), with group-level averages of 0.124 and 0.091, respectively. In addition, DVARS values are also elevated during the “think” state for both the AAL and Schaefer-400 parcellations (p < 0.05). Using the AAL template, the mean DVARS values are 8.637 for the “rest” state and 9.353 for the “think” state. Using the Schaefer-400 template, the corresponding values increase from 11.401 (“rest”) to 12.141 (“think”). These results indicate that both head motion and BOLD signal fluctuations are greater during the verbal report period than during rest.
Figure 7. Distributional differences in mean FD (Jenkinson) and DVARS between the “rest” state and the “think” state.
4 Discussion
4.1 The explanation of results
The results in Table 1 and Table 2 demonstrate the superiority of our TeFC approach in the machine learning task of gender classification, with Figure 1 indicating improved SVM performance as more low-quality time points are excluded. To ensure that the observed results were not driven by insufficient time points, we excluded subjects whose time length in either state was less than 30 time points, as shown in Supplementary Table S1 (3 subjects excluded for the resting state and 12 for the thinking state). SVM models were then trained using the remaining subjects. The results exhibited the same overall trend, indicating that the findings are not attributable to the reduced number of time points. Detailed results are provided in the Supplementary Tables S2, S3.
Moreover, to establish a baseline for gender classification, we trained SVM models using an equivalent number of rs-fMRI scans from the Human Connectome Project (HCP) dataset (Van Essen, 2013). The HCP dataset is a high-quality resource that includes both resting-state and task-based fMRI scans and is widely used across various types of fMRI studies (Elliott et al., 2019; Bedel, 2023; Cho, 2021). In this analysis, we randomly selected 86 non-overlapping resting-state scans using three different random seeds. For each selection, functional connectivity was computed using both the full time series (1,200 time points) and a truncated series containing the first 305 time points, matching the data length used in our study. Each of the six resulting sets of functional connectivity matrices (AAL template) was used to train SVM models for the gender classification task, employing 10-fold cross-validation. The mean classification accuracies obtained from functional connectivity computed using the full time series and the truncated 305-point series were 0.694 and 0.675, respectively. These values are comparable to the accuracy achieved in our study when using the AAL template for sex classification. Furthermore, SVM models trained using all available rs-fMRI scans in the HCP dataset yielded mean accuracies of 0.930 (full time series) and 0.871 (truncated 305-point series). Detailed results are provided in Supplementary Table S4. These findings indicate that the limited sample size in our study substantially constrains the performance of the sex classification models.
Additionally, Table 3 and Table 4 confirm that the remaining time points exhibit higher reliability than those excluded. These findings contrast with the conclusions in Elliott et al. (2019), which propose concatenating time series from different conditions to enhance the reliability of functional connectivity. This discrepancy can be understood in light of Cho (2021), which found that concatenating fewer, more uniform states tends to yield higher reliability. This suggests that data concatenation within a single, stable scan condition—or among more homogeneous and reliable conditions—may better enhance functional connectivity reliability. In contrast, time points associated with self-generated thought are unlikely to meet the criteria for reliable or homogeneous conditions, as these thoughts are unconstrained and lack specific tasks or assignments (Mildner and Tamir, 2019).
In addition, Figure 5 presents a counterintuitive result. The ICC scores of graph-theoretical matrices computed during the resting state are not consistently higher than those computed during the thinking state. Notably, degree centrality and clustering coefficient exhibit higher ICC scores when the brain network graph is constructed using a threshold greater than 0.5. Our explanation is that, in the gender classification task, the model primarily relies on functional connections with relatively low strength. Therefore, the highly reliable strong connections in the thinking state contribute little to capturing effective gender differences. Moreover, directly using these graph-theoretical metrics for classification yields no meaningful results, indicating that the gender classification task does not depend on these metrics.
4.2 The cognition load and self-generated thought
According to Li (2023), self-generated thoughts are closely linked to cognitive control, with undemanding environments prompting increased mind-wandering, particularly among individuals with strong cognitive control skills (Kane et al., 2007; Levinson et al., 2012; Smallwood and Schooler, 2015). Consequently, resting-state fMRI scanning, which lacks specific tasks, may lead to extensive mind-wandering and self-generated thoughts. The absence of a structured task guiding participants to focus on consistent content across scans introduces variability in BOLD signal phases, reducing the reliability of resting-state functional connectivity measurements relative to task-based functional connectivity (Greene et al., 2020). While resting-state FC can reveal individual differences that are predictive of task-based performance (Gruskin and Patel, 2022; Elliott et al., 2019) suggests concatenating data from task scans, which maintains phase similarity across individuals. However, it remains uncertain whether the observed benefits result from combining different states or from small, similar segments across scans. Additional studies also highlight the importance of distinguishing between spontaneous brain activities, noting that the BOLD signal time series in resting-state scans are more susceptible to interference from self-generated thoughts in the absence of cognitive engagement, potentially reducing stability and test–retest reliability.
In addition, the self-generated thought segment can be regarded as a task state, in which the only task is “speaking.” As a result, activation within language-related cortical regions is consistently observed (Li, 2023), which may further contribute to a reduction in inter-individual variability (Gratton et al., 2018). Compared with resting-state conditions—where spontaneous thought and unconstrained cognitive processes introduce substantial variability across participants—task states impose structured cognitive demands that synchronize neural activity and attenuate idiosyncratic fluctuations (Cole et al., 2014). This externally driven alignment leads to more homogeneous connectivity patterns, particularly within task-relevant networks, thereby diminishing the extent to which functional connectivity reflects stable, trait-like individual differences. In this sense, task paradigms may enhance the reliability of specific neural circuits but simultaneously constrain the expression of individual variability, whereas resting-state paradigms better capture intrinsic trait-level differences.
However, it is challenging to fully eliminate the impact of self-generated thoughts due to their complexity (Wang et al., 2018), as these thoughts can encompass a range of contents, including images, words, or emotions across multiple dimensions (Gorgolewski et al., 2014). Current technology cannot yet accurately distinguish time points associated with self-generated thoughts; although we attempted to do so, accuracy remained below 70%. As a result, the limitations in accurately identifying and filtering out self-generated thought effects prevent us from directly applying these findings to enhance reliability in other datasets.
4.3 The noise during the verbal report period
There are additional physical factors that may influence the test–retest reliability of functional connectivity. As shown in Figure 7, head motion during the verbal report period is significantly higher than during the rest period. Although Li (2023) excluded subjects whose mean FD (Jenkinson) exceeded 0.2 mm in this dataset, several subjects still exhibited mean FD values above this threshold during the verbal report stage. To ensure that these cases did not affect the primary conclusions of the study, we excluded these subjects and repeated both the sex classification and test–retest reliability analyses, obtaining consistent results. Nonetheless, it remains impossible to entirely rule out the possibility that elevated head motion may reduce test–retest reliability, even when the group-level mean FD remains below 0.2 mm. Furthermore, Li (2023) instructed participants to keep their mouths as still as possible during the verbal report period, which likely mitigated head motion to some degree. However, such instructions cannot eliminate the subtle jaw movements relative to the skull that naturally occur during speech, and this component is difficult to remove through standard preprocessing pipelines. Therefore, increased head motion may also be one of the factors contributing to the reduced test–retest reliability observed in this period.
In addition, fluctuations in carbon dioxide (CO₂) may also affect the measurement of BOLD signals. Because the BOLD response reflects a combination of neuronal and vascular contributions (Golestani and Chen, 2020), the relative proportion of these components cannot be precisely determined. Prior work has shown that dynamic CO₂ fluctuations constitute one of the strongest modulators of rs-fMRI signals in gray matter (Golestani et al., 2015). As CO₂ levels typically increase during speech-related behaviors, elevated CO₂ production during the verbal report period represents another potential factor that could reduce the test–retest reliability of functional connectivity.
4.4 How to improve the test of rs-fMRI
Studies such as Li et al. (2022) and Raffaelli et al. (2021) have shown that these types of data exhibit similar characteristics even in the absence of overt speech, suggesting that alternative methods may enhance the reliability of resting-state scans. One potential approach is to establish a robust criterion to evaluate the reliability of each time point or interval, allowing us to exclude lower-quality segments and thereby improve functional connectivity calculations. Moreover, we can investigate a deep learning model capable of automatically segmenting the time series into two parts and selecting the more reliable segment for functional connectivity analysis. Other studies have also explored similar techniques; for instance, Jie (2020) proposed using weighted time series to calculate functional connectivity. However, fixed weighting does not account for the variability of real-world conditions, underscoring the need for a personalized segmentation approach to enhance functional connectivity reliability across different individuals.
Additionally, efforts should be made to reduce the proportion of self-generated thoughts and head motion during rs-fMRI scanning. On the one hand, self-generated thoughts can be categorized into intentional and unintentional mind-wandering (Seli et al., 2015; Smallwood and Schooler, 2015), with intentional mind-wandering occurring more often during easy tasks (Martínez-Pérez et al., 2021; Seli et al., 2016). Consequently, the resting-state condition may encourage much intentional mind-wandering (Li, 2023). To address this, participants could be instructed to avoid engaging in self-generated thoughts prior to the resting-state scan, potentially minimizing their occurrence and improving data reliability. On the other hand, we have already observed that the removed time points exhibit more pronounced head motion, suggesting that head motion may contribute to the reduction in test–retest reliability. Although we cannot determine the exact proportion of influence attributable to each factor, we should still make every effort to minimize this source of interference.
4.5 Deficiency and future work
In this study, we evaluated the most widely used functional connectivity measure, Pearson correlation. Future research could expand this work by testing other functional connectivity estimation methods, such as Spearman correlation and partial correlation. A key limitation in our study was the availability of suitable datasets; our dataset included only healthy young adults, and other datasets lack time-point annotations. In addition, all data were acquired on 3 Tesla GE MR750 scanners. No other scanner brands were used in this study, which also limits the generalizability of the findings to some extent. This limitation restricts us to conducting only simple classification tasks on a small-scale dataset, and the overall accuracy of gender classification still does not reach the level typically achieved when training on large-scale datasets. In addition, to minimize the impact of speaking on brain activity, Li (2023) also designed a control condition without verbalization, and they obtained consistent brain pattern results. However, in our study, the verbal report still brings some noise to the BOLD signal. However, we cannot use the no verbal report data because there is no specific time label that which time point contains thought, which is also a limitation of this study. The datasets with the label of time points are exceedingly rare. Consequently, we were unable to extend our analysis to pediatric, geriatric, or clinical populations, which often exhibit greater fluctuations during rs-fMRI scans and tend to show lower reliability (Song et al., 2012; Noble et al., 2021; O'Shaughnessy et al., 2008). Additionally, other types of noise—such as fatigue Evans et al. (2015) and Yan (2009) or cardiac fluctuations (Chang et al., 2009; Shmueli, 2007) could also be detected and addressed to improve data quality. This research relied on existing labels for low-quality time points, without implementing a specific paradigm for exclusion. To address this, we are developing a deep learning method that can automatically exclude low-quality time points, which we aim to complete in future work.
5 Conclusion
In conclusion, we introduce the concept of TeFC and demonstrate that it is possible to calculate functional connectivity by systematically excluding low-quality time points. The enhancements in reliability and performance in machine learning tasks have been validated, with TeFC showing superior performance in gender classification and exhibiting higher reliability. Future research in rs-fMRI could explore additional criteria for excluding time points, further refining the methodology for analyzing functional connectivity.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found at: https://rfmri.org/content/rmp-think-aloud-fmri-dataset.
Author contributions
ZC: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft. HL: Funding acquisition, Project administration, Supervision, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the National Natural Science Foundation of China (grant no. 32441112).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was used in the creation of this manuscript. We used the ChatGPT to refine our manuscript and check the grammar. The first manuscript and citations are completed without the help of ChatGPT. It only take part in the refinement of manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins.2025.1730402/full#supplementary-material
References
Ashburner, J. (2007). A fast diffeomorphic image registration algorithm. NeuroImage 38, 95–113. doi: 10.1016/j.neuroimage.2007.07.007,
Bedel, H. A. (2023). BolT: fused window transformers for fMRI time series analysis. Med. Image Anal. 88:102841. doi: 10.1016/j.media.2023.102841
Birn, R. M., Diamond, J. B., Smith, M. A., and Bandettini, P. A. (2006). Separating respiratory-variation-related fluctuations from neuronal-activity-related fluctuations in fMRI. NeuroImage 31, 1536–1548. doi: 10.1016/j.neuroimage.2006.02.048
Braun, U. (2012). Test–retest reliability of resting-state connectivity network characteristics using fMRI and graph theoretical measures. NeuroImage 59, 1404–1412. doi: 10.1016/j.neuroimage.2011.08.044,
Buckner, R. L., Krienen, F. M., and Yeo, B. T. (2013). Opportunities and limitations of intrinsic functional connectivity MRI. Nat. Neurosci. 16, 832–837. doi: 10.1038/nn.3423,
Chang, C., Cunningham, J. P., and Glover, G. H. (2009). Influence of heart rate on the BOLD signal: the cardiac response function. NeuroImage 44, 857–869. doi: 10.1016/j.neuroimage.2008.09.029,
Cho, J. W. (2021). Impact of concatenating fMRI data on reliability for functional connectomics. NeuroImage 226:117549. doi: 10.1016/j.neuroimage.2020.117549
Ciric, R., Rosen, A. F. G., Erus, G., Cieslak, M., Adebimpe, A., Cook, P. A., et al. (2018). Mitigating head motion artifact in functional connectivity MRI. Nat. Protoc. 13, 2801–2826. doi: 10.1038/s41596-018-0065-y,
Cole, M. W., Bassett, D. S., Power, J. D., Braver, T. S., and Petersen, S. E. (2014). Intrinsic and task-evoked network architectures of the human brain. Neuron 83, 238–251. doi: 10.1016/j.neuron.2014.05.014,
Delamillieure, P., Doucet, G., Mazoyer, B., Turbelin, M. R., Delcroix, N., Mellet, E., et al. (2010). The resting state questionnaire: an introspective questionnaire for evaluation of inner experience during the conscious resting state. Brain Res. Bull. 81, 565–573. doi: 10.1016/j.brainresbull.2009.11.014,
Diaz, B. A. (2014). The ARSQ 2.0 reveals age and personality effects on mind-wandering experiences. Front. Psychol. 5:271. doi: 10.3389/fpsyg.2014.00271
Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., et al. (2019). General functional connectivity: shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage 189, 516–532. doi: 10.1016/j.neuroimage.2019.01.068,
Evans, J. W., Kundu, P., Horovitz, S. G., and Bandettini, P. A. (2015). Separating slow BOLD from non-BOLD baseline drifts using multi-echo fMRI. NeuroImage 105, 189–197. doi: 10.1016/j.neuroimage.2014.10.051,
Finn, E. S., Shen, X., Scheinost, D., Rosenberg, M. D., Huang, J., Chun, M. M., et al. (2015). Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nat. Neurosci. 18, 1664–1671. doi: 10.1038/nn.4135,
Foerster, B. U., Tomasi, D., and Caparelli, E. C. (2005). Magnetic field shift due to mechanical vibration in functional magnetic resonance imaging. Magnetic Resonance in Med.: Official J. Int. Society for Magnetic Resonance Med. 54, 1261–1267. doi: 10.1002/mrm.20695
Friston, K. J. (2011). Functional and effective connectivity: a review. Brain Connect. 1, 13–36. doi: 10.1089/brain.2011.0008,
Golestani, A. M., Chang, C., Kwinta, J. B., Khatamian, Y. B., and Jean Chen, J. (2015). Mapping the end-tidal CO2 response function in the resting-state BOLD fMRI signal: spatial specificity, test–retest reliability and effect of fMRI sampling rate. NeuroImage 104, 266–277. doi: 10.1016/j.neuroimage.2014.10.031,
Golestani, A. M., and Chen, J. J. (2020). Controlling for the effect of arterial-CO2 fluctuations in resting-state fMRI: comparing end-tidal CO2 clamping and retroactive CO2 correction. NeuroImage 216:116874. doi: 10.1016/j.neuroimage.2020.116874
Gorgolewski, K. J. (2013). Single subject fMRI test–retest reliability metrics and confounding factors. NeuroImage 69, 231–243. doi: 10.1016/j.neuroimage.2012.10.085,
Gorgolewski, K. J., Lurie, D., Urchs, S., Kipping, J. A., Craddock, R. C., Milham, M. P., et al. (2014). A correspondence between individual differences in the brain's intrinsic functional architecture and the content and form of self-generated thoughts. PLoS One 9:e97176. doi: 10.1371/journal.pone.0097176,
Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., et al. (2018). Functional brain networks are dominated by stable group and individual factors, not cognitive or daily variation. Neuron 98, 439–452.e5. doi: 10.1016/j.neuron.2018.03.035
Greene, A. S., Gao, S., Noble, S., Scheinost, D., and Constable, R. T. (2020). How tasks change whole-brain functional organization to reveal brain-phenotype relationships. Cell Rep. 32:108066. doi: 10.1016/j.celrep.2020.108066
Gruskin, D. C., and Patel, G. H. (2022). Brain connectivity at rest predicts individual differences in normative activity during movie watching. NeuroImage 253:119100. doi: 10.1016/j.neuroimage.2022.119100
Guo, C. C., Kurth, F., Zhou, J., Mayer, E. A., Eickhoff, S. B., Kramer, J. H., et al. (2012). One-year test–retest reliability of intrinsic connectivity network fMRI in older adults. NeuroImage 61, 1471–1483. doi: 10.1016/j.neuroimage.2012.03.027,
Hajnal, J. V., Myers, R., Oatridge, A., Schwieso, J. E., Young, I. R., and Bydder, G. M. (1994). Artifacts due to stimulus correlated motion in functional imaging of the brain. Magn. Reson. Med. 31, 283–291.
Jie, B. (2020). Designing weighted correlation kernels in convolutional neural networks for functional connectivity based brain disease diagnosis. Med. Image Anal. 63:101709. doi: 10.1016/j.media.2020.101709
Kane, M. J., Brown, L. H., McVay, J., Silvia, P. J., Myin-Germeys, I., and Kwapil, T. R. (2007). For whom the mind wanders, and when: an experience-sampling study of working memory and executive control in daily life. Psychol. Sci. 18, 614–621. doi: 10.1111/j.1467-9280.2007.01948.x,
Levinson, D. B., Smallwood, J., and Davidson, R. J. (2012). The persistence of thought: evidence for a role of working memory in the maintenance of task-unrelated thinking. Psychol. Sci. 23, 375–380. doi: 10.1177/0956797611431465,
Li, H-X., Lu, B., Chen, X., Li, X-Y., Castellanos, FX., Yan, CG., et al. (2022). Exploring self-generated thoughts in a resting state with natural language processing. Behav Res Methods, 54, 1725–1743. doi: 10.3758/s13428-021-01710-6
Li, H.-X. (2023). Neural representations of self-generated thought during think-aloud fMRI. NeuroImage 265:119775. doi: 10.1016/j.neuroimage.2022.119775
Li, Z., Kadivar, A., Pluta, J., Dunlop, J., and Wang, Z. (2012). Test–retest stability analysis of resting brain activity revealed by blood oxygen level-dependent functional MRI. J. Magn. Reson. Imaging 36, 344–354. doi: 10.1002/jmri.23670,
Liu, T. T., Nalci, A., and Falahpour, M. (2017). The global signal in fMRI: nuisance or information? NeuroImage 150, 213–229. doi: 10.1016/j.neuroimage.2017.02.036,
Mahadevan, A. S. (2021). Evaluating the sensitivity of functional connectivity measures to motion artifact in resting-state fMRI data. NeuroImage 241:118408. doi: 10.1016/j.neuroimage.2021.118408
Martínez-Pérez, V., Baños, D., Andreu, A., Tortajada, M., Palmero, L. B., Campoy, G., et al. (2021). Propensity to intentional and unintentional mind-wandering differs in arousal and executive vigilance tasks. PLoS One 16:e0258734. doi: 10.1371/journal.pone.0258734
Mildner, J. N., and Tamir, D. I. (2019). Spontaneous thought as an unconstrained memory process. Trends Neurosci. 42, 763–777. doi: 10.1016/j.tins.2019.09.001,
Murphy, K., and Fox, M. D. (2017). Towards a consensus regarding global signal regression for resting state functional connectivity MRI. NeuroImage 154, 169–173. doi: 10.1016/j.neuroimage.2016.11.052,
Noble, S., Scheinost, D., and Constable, R. T. (2019). A decade of test-retest reliability of functional connectivity: a systematic review and meta-analysis. NeuroImage 203:116157. doi: 10.1016/j.neuroimage.2019.116157
Noble, S., Scheinost, D., and Constable, R. T. (2021). A guide to the measurement and interpretation of fMRI test-retest reliability. Curr. Opin. Behav. Sci. 40, 27–32. doi: 10.1016/j.cobeha.2020.12.012,
O'Shaughnessy, E. S., Berl, M. M., Moore, E. N., and Gaillard, W. D. (2008). Pediatric functional magnetic resonance imaging (fMRI): issues and applications. J. Child Neurol. 23, 791–801. doi: 10.1177/0883073807313047,
Piani, M. C., Maggioni, E., Delvecchio, G., and Brambilla, P. (2022). Sustained attention alterations in major depressive disorder: a review of fMRI studies employing go/no-go and CPT tasks. J. Affect. Disord. 303, 98–113. doi: 10.1016/j.jad.2022.02.003,
Power, J. D. (2017). Sources and implications of whole-brain fMRI signals in humans. NeuroImage 146, 609–625. doi: 10.1016/j.neuroimage.2016.09.038,
Power, J. D., Schlaggar, B. L., and Petersen, S. E. (2015). Recent progress and outstanding issues in motion correction in resting state fMRI. NeuroImage 105, 536–551. doi: 10.1016/j.neuroimage.2014.10.044,
Raffaelli, Q., Mills, C., de Stefano, N. A., Mehl, M. R., Chambers, K., Fitzgerald, S. A., et al. (2021). The think aloud paradigm reveals differences in the content, dynamics and conceptual scope of resting state thought in trait brooding. Sci. Rep. 11:19362. doi: 10.1038/s41598-021-98138-x
Schaefer, A., Kong, R., Gordon, E. M., Laumann, T. O., Zuo, X. N., Holmes, A. J., et al. (2018). Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114. doi: 10.1093/cercor/bhx179,
Seli, P., Carriere, J. S., and Smilek, D. (2015). Not all mind wandering is created equal: dissociating deliberate from spontaneous mind wandering. Psychol. Res. 79, 750–758. doi: 10.1007/s00426-014-0617-x,
Seli, P., Risko, E. F., and Smilek, D. (2016). On the necessity of distinguishing between unintentional and intentional mind wandering. Psychol. Sci. 27, 685–691. doi: 10.1177/0956797616634068,
Shehzad, Z. (2014). A multivariate distance-based analytic framework for connectome-wide association studies. NeuroImage 93, 74–94. doi: 10.1016/j.neuroimage.2014.02.024
Shmueli, K. (2007). Low-frequency fluctuations in the cardiac rate as a source of variance in the resting-state fMRI BOLD signal. NeuroImage 38, 306–320. doi: 10.1016/j.neuroimage.2007.07.037
Shrout, P. E., and Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86, 420–428.
Sladky, R., Friston, K. J., Tröstl, J., Cunnington, R., Moser, E., and Windischberger, C. (2011). Slice-timing effects and their correction in functional MRI. NeuroImage 58, 588–594. doi: 10.1016/j.neuroimage.2011.06.078,
Smallwood, J., and Schooler, J. W. (2015). The science of mind wandering: empirically navigating the stream of consciousness. Annu. Rev. Psychol. 66, 487–518. doi: 10.1146/annurev-psych-010814-015331,
Song, J., Desphande, A. S., Meier, T. B., Tudorascu, D. L., Vergun, S., Nair, V. A., et al. (2012). Age-related differences in test-retest reliability in resting-state brain functional connectivity. PLoS One 7:e49847. doi: 10.1371/journal.pone.0049847
Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., et al. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15, 273–289. doi: 10.1006/nimg.2001.0978,
Van Dijk, K. R., Sabuncu, M. R., and Buckner, R. L. (2012). The influence of head motion on intrinsic functional connectivity MRI. NeuroImage 59, 431–438. doi: 10.1016/j.neuroimage.2011.07.044,
Van Essen, D. C. (2013). The WU-Minn human connectome project: an overview. NeuroImage 80, 62–79. doi: 10.1016/j.neuroimage.2013.05.041,
Vanderwal, T., Kelly, C., Eilbott, J., Mayes, L. C., and Castellanos, F. X. (2015). Inscapes: A movie paradigm to improve compliance in functional magnetic resonance imaging. NeuroImage 122, 222–232. doi: 10.1016/j.neuroimage.2015.07.069,
Vértes, P. E. (2012). Simple models of human brain functional networks. Proc. Natl. Acad. Sci. U. S. A. 109, 5868–5873. doi: 10.1073/pnas.1111738109
Wang, J. (2017). Improving the test-retest reliability of resting state fMRI by removing the impact of sleep. Front. Neurosci. 11:249. doi: 10.3389/fnins.2017.00249
Wang, H.-T., Poerio, G., Murphy, C., Bzdok, D., Jefferies, E., and Smallwood, J. (2018). Dimensions of experience: exploring the heterogeneity of the wandering mind. Psychol. Sci. 29, 56–71. doi: 10.1177/0956797617728727,
Windischberger, C. (2002). On the origin of respiratory artifacts in BOLD-EPI of the human brain. Magn. Reson. Imaging 20, 575–582. doi: 10.1016/S0730-725X(02)00563-5
Yan, L. (2009). Physiological origin of low-frequency drift in blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI). Magnetic Resonance in Med.: Official J. Int. Society for Magnetic Resonance Med. 61, 819–827. doi: 10.1002/mrm.21902
Yan, C.-G., Wang, X. D., Zuo, X. N., and Zang, Y. F. (2016). DPABI: data processing & analysis for (resting-state) brain imaging. Neuroinformatics 14, 339–351. doi: 10.1007/s12021-016-9299-4,
Yeo, B. T. (2011). The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165. doi: 10.1152/jn.00338.2011
Zuo, X.-N., and Xing, X.-X. (2014). Test-retest reliabilities of resting-state FMRI measurements in human brain functional connectomics: a systems neuroscience perspective. Neurosci. Biobehav. Rev. 45, 100–118. doi: 10.1016/j.neubiorev.2014.05.009,
Keywords: functional connectivity, machine learning, rs-fMRI, spontaneous thought, test–retest reliability
Citation: Chang Z and Li H (2026) Excluding spontaneous thought periods enhances functional connectivity test–retest reliability and machine learning performance in fMRI. Front. Neurosci. 19:1730402. doi: 10.3389/fnins.2025.1730402
Edited by:
Zhengwang Wu, University of North Carolina at Chapel Hill, United StatesReviewed by:
Abhishek Appaji, BMS College of Engineering, IndiaRahul Biswas, University of California, San Francisco, United States
Copyright © 2026 Chang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Haifeng Li, bGloYWlmZW5nQGhpdC5lZHUuY24=