AUTHOR=Sendi Mohammad S. E. , Salat David H. , Miller Robyn L. , Calhoun Vince D. TITLE=Two-step clustering-based pipeline for big dynamic functional network connectivity data JOURNAL=Frontiers in Neuroscience VOLUME=Volume 16 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2022.895637 DOI=10.3389/fnins.2022.895637 ISSN=1662-453X ABSTRACT=Background: In a conventional dynamic functional network connectivity (dFNC) pipeline, a clustering stage summarizes the connectivity patterns that are transiently but reliably realized over the course of a scanning session. However, identifying the right number of clusters (or states) through a conventional clustering criterion computed by running the algorithm repeatedly over a large range of cluster numbers is time-consuming and requires substantial computational power, and the computational demands become prohibitive as datasets become larger. Here we developed a new dFNC pipeline based on a two-step clustering approach to analyze large dFNC data without having access to huge computational power. Method: In the proposed dFNC pipeline, we implement two-step clustering. In the first step, we randomly use a sub-sample dFNC data and identify several sets of states at different model orders. In the second step, we aggregate dFNC states estimated from all iterations in the first step and identify the optimum number of clusters using the elbow criteria. Additionally, we use this new reduced dataset and estimate a final set of states by a second kmeans clustering on the aggregated dFNC states from the first clustering. To validate the reproducibility of results in the new pipeline, we analyzed four dFNC datasets from the human connectome project (HCP). Results: We found that both conventional and proposed dFNC pipelines generate similar brain dFNC states across four sessions with more than 99% similarity. We found that the conventional dFNC pipeline evaluates the clustering order and finds the final dFNC state in 275 minutes, while this process takes only 11 minutes for the proposed dFNC pipeline. In other words, the new pipeline is 25 times faster than the traditional method in finding the optimum number of clusters and finding the final dFNC states. We also found that the new method results in better clustering quality than the conventional approach (p<0.001). We show that the results are replicated across four datasets from HCP. Conclusion: We developed a new analytic pipeline that facilitates the analysis of large dFNC datasets without having a huge computational power source and validated the reproducibility of the result across multiple datasets.