LLM-based feature selection and counterfactual explanations applied to functional connectivity analysis in schizophrenia

Yuan, Xinyan; Chen, Tiantian; He, Yanyan; Gu, Lingling; Sun, Ying; Wei, Shaolong

doi:10.3389/fnins.2025.1732013

ORIGINAL RESEARCH article

Front. Neurosci., 12 January 2026

Sec. Brain Imaging Methods

Volume 19 - 2025 | https://doi.org/10.3389/fnins.2025.1732013

This article is part of the Research TopicExploring Neuropsychiatric Disorders Through Multimodal MRI: Network Analysis, Biomarker Discovery, and Clinical InsightsView all 7 articles

LLM-based feature selection and counterfactual explanations applied to functional connectivity analysis in schizophrenia

Xinyan Yuan¹

Tiantian Chen¹

Yanyan He¹

Lingling Gu¹

Ying Sun¹

Shaolong Wei²^*

¹School of Artificial Intelligence, Jiangsu Vocational College of Business, Nantong, China
²School of Artificial Intelligence and Computer Science, Nantong University, Nantong, China

Introduction: Schizophrenia (SZ) is a complex psychiatric disorder whose neural mechanisms are still unclear. Functional connectivity (FC) provides a unique perspective for understanding its pathology, but its high-dimensional nature poses significant challenges for feature selection and model interpretation. Traditional feature selection methods, while predictive, lack the integration of prior neuroscience knowledge, resulting in limited clinical relevance.

Methods: To address this, we propose an innovative framework that combines feature selection guided by a large language model (LLM) with counterfactual explanation. This framework leverages brain disease knowledge encoded by the LLM to guide dimensionality reduction of high-dimensional FC, ensuring that selected features are both statistically significant and biologically plausible. Counterfactual explanations are then used to generate causal intervention examples, which are then translated by the LLM into intuitive explanations in natural language, providing understandable and actionable clinical insights for individual patients or physicians.

Results: We validate our approach on five real-world SZ datasets and demonstrate that it not only improves model classification performance but also provides new insights into SZ analysis.

Discussion: The LLM-based FC analysis method proposed in this study demonstrates good feature selection and interpretability on multiple SZ datasets. Its main advantage is its ability to effectively screen key FC features for brain regions. However, this method has some limitations, such as being difficult to directly apply clinically due to data heterogeneity, being unable to accurately locate individual FC abnormalities, and the hyperparameters for counterfactual generation not yet being optimized.

1 Introduction

Schizophrenia (SZ) is a severe mental illness characterized by clinical manifestations including hallucinations, delusions, and disorganized thinking, leading to significant impairment in patients' social functioning (McCutcheon et al., 2020, 2023). Despite its widespread global impact, the underlying neuropathological mechanisms remain largely unknown, posing significant challenges for objective diagnosis and effective intervention (Insel, 2010; Fišar, 2023). Current research suggests that abnormal connectivity between functional brain regions is a key characteristic of SZ (Zhang et al., 2021). Functional connectivity (FC), which reflects the coordinated activity between different brain regions, provides a unique perspective for understanding the pathological mechanisms of SZ (Chyzhyk et al., 2015; Zhu et al., 2024). With the advancement of neuroimaging technology, functional magnetic resonance imaging (fMRI) has become a standard method for studying brain activity and connectivity patterns, making FC an effective tool for investigating SZ (Shen et al., 2010; Lynall et al., 2010).

However, analyzing FC data presents significant methodological challenges. FC is typically represented as a high-dimensional matrix whose number of elements depends on the number of brain regions mapped by the brain atlas (Li et al., 2021; Mhiri and Rekik, 2020). This results in feature dimensions reaching thousands or even tens of thousands for each subject (Ting et al., 2018). First, redundant information in the data can lead to model overfitting, resulting in poor generalization of diagnosis and prediction (Tian and Zalesky, 2021). Second, high-dimensional data makes it difficult for traditional statistical methods to effectively extract important features related to SZ (Naheed et al., 2020). Therefore, reducing dimensionality while maintaining data validity and information content has become a key issue in current FC data analysis.

Although traditional feature selection methods (such as LASSO, MRMR, and XGBoost) have demonstrated certain predictive capabilities in high-dimensional data (Wang et al., 2022), they generally rely on purely data-driven statistical indicators and lack explicit modeling of the biomedical knowledge behind the features, resulting in the selected features being difficult to interpret or disconnected from clinical intervention (Hua et al., 2009; des Touches et al., 2023). In recent years, with breakthroughs in natural language understanding and knowledge reasoning using large language models (LLMs), researchers have begun exploring the use of LLMs as a source of prior knowledge in feature selection (Bal-Ghaoui and Sabri, 2025; Mutian et al., 2025; Choi et al., 2022). By incorporating the rich knowledge of brain diseases from LLMs, the feature selection process becomes more relevant to real-world medical contexts, enabling the identification of features that are highly relevant to clinical diagnosis (Oh and Lee, 2025). These features are also easier to interpret and integrate with clinical interventions.

To address these challenges, this study proposes an innovative framework that combines LLM-guided feature selection with counterfactual explanation techniques, achieving a closed loop between knowledge-guided feature selection and causal-guided interpretation in FC analysis of SZ. A schematic diagram of our proposed method is shown in Figure 1. Specifically, we first preprocess the rs-fMRI data and extract the upper triangular elements to construct FC feature matrix. Next, we develop a robust feature selection framework based on the LLM. By transforming the LLM's embedded prior knowledge of brain disease into connection-specific penalty weights, we effectively narrow the search space and prioritize FC features with potential intervention value. Finally, we employ the counterfactual explanation model to generate multiple sets of counterfactual examples for schizophrenia patients. Specifically, we fine-tune the patients' abnormal FC features to healthy individuals. These examples are then fed into the LLM to generate intuitive and easily understandable final explanations. We validate our method on five real-world SZ datasets, demonstrating that it not only improves model interpretability but also provides new insights into SZ analysis.

Figure 1

Flowchart of a three-step process in a machine learning pipeline. Step a: Data pre-processing involves brain parcellation, fMRI, generating time series of ROIs, leading to a functional connectivity matrix and labels. Step b: Feature selection using LLM-Lasso involves FC feature names, LLM processing prompts, applying penalties, and using Lasso for transformation with cross-validation to choose variables. Step c: Explaining counterfactual examples using LLM shows an explanation generated from prompts, highlighting changes in FC features through counterfactual examples and SVM categorization.

Figure 1. Illustration of our proposed schizophrenia analysis method, including (a) data pre-processing, (b) Feature selection using LLM-Lasso, (c) Explaining counterfactual examples using LLM.

2 Related work

Counterfactual explanations, as an explainable artificial intelligence method, have garnered widespread attention in academia and industry in recent years (Wang et al., 2021). Their core idea is to generate a hypothetical input sample that alters the model's prediction, thereby revealing the key rationale behind the model's decision (Verma et al., 2024). Unlike interpretable methods based on feature importance or model structure, counterfactual explanations are closer to human intuition and can present causal explanations to end users in the form of if-then explanations.

Wachter et al. (2017) were the first to systematically propose counterfactual explanations. They framed the problem as an optimization problem: achieving the desired change in model output with the smallest possible input perturbation. This approach emphasizes the proximity, effectiveness, and comprehensibility of explanations, laying the foundation for subsequent research. Subsequently, researchers have expanded on counterfactual explanations from various perspectives. For example, Mothilal et al. (2020) proposed a method for generating multiple and diverse counterfactual explanations to avoid the potential bias of a single explanation. Ustun et al. (2019) focused on feasibility when generating counterfactuals, ensuring that the proposed input changes are actually feasible for users in the real world. Furthermore, Poyiadzi et al. (2020) introduced causal constraints to enhance the causal plausibility of counterfactual explanations.

In recent years, counterfactual explanations have been widely applied in various fields, including financial risk assessment (Cheng et al., 2020), medical diagnosis (Richens et al., 2020), image classification (Khorram and Fuxin, 2022), and recommender systems (Kaffes et al., 2021). In the image domain, researchers use generative models (such as GANs) to generate visually plausible counterfactual images (Melistas et al., 2024). In the text domain, researchers achieve explanations by perturbing keywords or sentence structure (Yang et al., 2020). In this paper, we can make a counterfactual claim that if the abnormal FC between brain regions in patients with SZ is adjusted to normal ranges, their condition may be closer to that of healthy individuals. This kind of decision-making is very useful in medicine, helping doctors evaluate the potential effects of different treatment options, especially for brain diseases.

3 Materials and methods

3.1 Schizophrenia dataset

This study uses five public schizophrenia datasets containing 773 subjects, including the Center for Biomedical Research (COBRE) dataset, Huaxi dataset, Nottingham dataset, Taiwan dataset, and Xiangya dataset. These subjects have the following requirements: (i) no other Diagnostic and Statistical Manual of Mental Disorders (DSMIV) disease exists; (ii) no history of drug abuse; (iii) no clinically significant head trauma. Table 1 summarizes the demographic and clinical characteristics of subjects of these datasets.

Table 1

Table 1. Demographic and clinical information of subjects in five datasets.

3.2 Data pre-processing

Rs-fMRI data are collected using three different types of scanners: 3-T Siemens Tim-Trio scanner with an eight- or 12-channel head coil (COBRE, Taiwan and Xiangya), 3-T General Electric MRI scanner (Huaxi), and 3-T Philips Achieva MRI scanner (Nottingham). The rs-fMRI data are preprocessed using SPM 8 and the Data Processing Assistant for Resting-State fMRI (DPARSF) according to standard procedures. The following steps are performed: (i) removing the first 10 volumes, (ii) slice timing correction, (iii) head motion correction, (iv) regress out the nuisance covariates, (v) normalized to standardized space, (vi) voxel-wise bandpass filtering, (vii) normalization of anatomical images to MNI template space, and (viii) smoothing with a 4 mm Full Width at Half Maximum (FWHM) Gaussian kernel. After processing, the nodes of the brain network are defined according to the Anatomical Automatic Labeling (AAL) template, and the pairwise similarities between the node time series are calculated as the connecting edges of the brain network.

Next, let $A_{i}^{F} \in ℝ^{N \times N}$ be the connectivity matrix of the functional brain network, N be the number of regions of the brain network, i = 1, 2, ..., n, and n be the number of subjects. We take the upper triangular elements of the $A_{i}^{F}$ matrix as features and represent them as vectors $x_{i} \in ℝ^{1 \times p}$ , $p = \frac{N (N - 1)}{2}$ , and y_i is the label of the i-th subject. Therefore, the FC feature matrix of all subjects can be represented as $X = {[x_{1}, . . ., x_{i}, . . ., x_{n}]}^{⊤} \in ℝ^{n \times p}$ , and the corresponding label is $Y = {[y_{1}, . . ., y_{i}, . . ., y_{n}]}^{⊤} \in ℝ^{n}$ . It is worth noting that in this paper, we divided the brain network into 90 regions of interest (ROI), that is, N = 90, so each subject contains a vector of dimension 1 × 4005, which reflects the functional connectivity strength pattern between the 90 brain regions of the subject.

3.3 Feature selection using LLM-Lasso

After the above steps, we obtain the FC feature matrix X for all subjects. However, this matrix contains a large amount of redundant information, and the number of functional connections between brain regions far exceeds the number of samples, presenting a typical high-dimensional small sample problem. To address this issue and identify the most critical FC features for SZ diagnosis, we employ a penalized Lasso feature selection method. Furthermore, we incorporate the penalty factor derived from the LLM into the Lasso penalty term. This approach not only improves the accuracy of feature selection but also effectively reduces the influence of redundant features, thereby effectively screening key features.

3.3.1 The LLM-Lasso

For the input FC feature X ∈ ℝ^n×p and label Y ∈ ℝⁿ, the traditional Lasso method achieves feature selection by introducing an ℓ₁-norm penalty term in the objective function of minimizing the residual sum of squares (Fonti and Belitser, 2017). The objective function of Lasso regression can be expressed as:

\begin{array}{l} \hat{β} = \min_{β} {\frac{1}{2} \sum_{i = 1}^{n} {(y_{i} - β_{0} - x_{i}^{⊤} β)}^{2} \\ + λ \sum_{j = 1}^{p} | β_{j} |} & (1) \end{array}

However, the ℓ₁ penalty term $λ \sum_{j = 1}^{p} | β_{j} |$ in the above equation imposes the same sparsity constraint on all features, implicitly assuming that all features are equally important in the absence of prior information. This assumption may not hold true when processing high-dimensional brain FC data, as it completely ignores the known biological significance of connections between different brain regions in neuroscience and the differences in their potential associations with schizophrenia. This can lead to feature selection results that deviate from existing understanding of pathological mechanisms.

In order to incorporate domain knowledge into the feature selection process, we can enhance the Lasso method by assigning a penalty factor generated by LLM to each coefficient in the ℓ₁ penalty. The LLM-Lasso objective function we constructed can be expressed as:

\begin{array}{l} \begin{matrix} \hat{β} = \min_{β} {\frac{1}{2} \sum_{i = 1}^{n} {(y_{i} - β_{0} - x_{i}^{⊤} β)}^{2} \\ + λ \sum_{j = 1}^{p} ω_{j} | β_{j} |} \end{matrix} & (2) \end{array}

where ω_j>0 is the penalty factor for the j-th feature generated based on knowledge of the LLM domain. Our goal is to expand the uniform penalty term $λ \sum_{j = 1}^{p} | β_{j} |$ in the traditional Lasso to a weighted form $λ \sum_{j = 1}^{p} ω_{j} | β_{j} |$ , so that features with stronger neuroscience evidence receive less penalty, thereby giving them priority in feature selection.

3.3.2 LLM penalty factor optimization based on cross-validation

In LLM-Lasso, we use LLM to generate a penalty factor $S = [s_{1}, s_{2}, . . ., s_{p}] \in ℝ^{p}$ for each feature to guide feature selection for Lasso regression. However, the penalty factors generated by LLM can be inaccurate or produce so-called hallucinations, information that is false or unreliable. To ensure that the model's reliance on LLM output is data-driven and reliable, we introduce a cross-validation procedure to select the optimal penalty factor transformation.

We first define a finite family of transformation functions U that transforms the original penalty factor S generated by LLM into the optimal penalty W^* finally used for Lasso:

\begin{array}{l} \begin{matrix} W^{*} = u^{*} (S), & u^{*} \in U \end{matrix} & (3) \end{array}

where W^* ∈ ℝ^p, u^* is the optimal transformation functions selected from U via k-fold cross-validation. Next, we divide the dataset into k subsets. For the i-th fold, let the training set be $D_{t r a}^{(i)}$ and the penalty factor be given by the transformed W = u(S), which is used to calculate the optimal coefficient $β_{i, u (V)}^{*}$ of the Lasso:

\begin{array}{l} \begin{matrix} β_{i, u (S)}^{*} = \min_{β} {\frac{1}{2} \sum_{(x, y) \in D_{t r a}^{(i)}} {(y_{i} - β_{0} - x_{i}^{⊤} β)}^{2} \\ + λ \sum_{j = 1}^{p} u {(S)}_{j} | β_{j} |} \end{matrix} & (4) \end{array}

Then, use the validation data $X_{v a l}^{(i)}$ and $Y_{v a l}^{(i)}$ to evaluate the cross-validation loss function $L$ , and finally select the transformation function that minimizes the total validation loss:

\begin{array}{l} u^{*} = \underset{u \in U}{arg min} \sum_{i = 1}^{k} L (X_{v a l}^{(i)} β_{i, u (S)}^{*}, Y_{v a l}^{(i)}) & (5) \end{array}

We define U as a range of transformations with penalties representing varying degrees of dependence on LLM generation. In this paper, inspired by Zhang et al. (2025), we order the penalty factors by feature importance, with more important features receiving smaller penalties and less important features receiving larger penalties. Therefore, we adopt the family of inverse importance transformation functions:

\begin{array}{l} U = {u : u {(S)}_{j} = s_{j}^{φ}, φ \in {0, 1, ..., φ_{\max}} & (6) \end{array}

where s_j is the penalty factor generated by the LLM for the j-th feature, and φ is a parameter that controls the degree of dependence of the penalty factor. If φ = 0, then the converted penalty is the same as the original Lasso penalty, while if φ is larger, it increases the dependence on the penalty factor generated by the LLM.

By cross-validating, we ensure that the model does not degrade due to reliance on unreliable penalties generated by the LLM. In the worst case, since U contains u₀(W) = 1, which sets all penalty factors to 1, equivalent to the original Lasso, the LLM-Lasso will never perform worse than the original Lasso in terms of cross-validation loss.

3.3.3 Empirically calibrated prompting for optimizing LLM

Prompting is crucial for guiding the LLM to understand specific prediction tasks and generate high-quality penalty factors. In this work, we employ an empirically calibrated prompting method, combined with existing LLM domain knowledge on functional connectivity abnormalities in schizophrenia, to guide the LLM in generating penalty factors for FC features. For such a large-scale SZ dataset, we use a zero-shot prompting approach and set the LLM's temperature parameter T to 0 in all experiments, applying a greedy decoding strategy to ensure deterministic and reproducible output.

For all classification tasks, our prompt template consists of user and system, defined as follows:

\begin{array}{l} P = p r o m p t (Q^{u s e r} (A (ϕ, c)), H^{s y s t e m}) & (7) \end{array}

where $P$ is the complete prompt ultimately fed to the LLM, and $Q^{u s e r}$ is the user query. $A$ is the task description, taking as input a feature set ϕ and a category c. $H^{s y s t e m}$ is the system history or dialogue buffer, which maintains the dialogue context. Figure 2 provides an example of the general structure of component $A$ , which typically consists of a background description of the dataset, the assigned task, and formatting instructions.

Figure 2

Background section describes an fMRI study using functional connectivity data from a schizophrenia sample with 120 samples and 4005 features. The goal is to build a Lasso model to classify samples. Task section explains assigning penalty scores between 0.1 and 1 to each feature based on its importance. Lower scores indicate higher predictive power. Importance should be based on evidence. Formatting section specifies sorting penalty factors in the order of features and outputting them as: [Feature name]: VALUE(float). Example given is [PrecG.R, ORBsup.L]: 0.3.

Figure 2. An example is used to describe $A$ .

3.4 Explaining counterfactual examples using LLM

To enhance the interpretability of our method, we further introduce a counterfactual explanation model (Mothilal et al., 2020). In this paper, we can formulate a counterfactual statement: if the abnormal FC between brain regions in SZ patients is adjusted to normal ranges, their condition is closer to that of healthy individuals. To help patients more intuitively understand these counterfactual examples, we introduce LLM to generate natural language explanations, thereby transforming complex feature changes into easily understandable action recommendations (Fredes and Vitria, 2024).

Before introducing the counterfactual framework, we first represent the FC matrix after feature selection as X′ ∈ ℝ^n×q, where q≪p. Furthermore, we train an appropriate machine learning model (i.e., f(·)) to predict SZ. In our experiments, we use a support vector machine (SVM) as the classification model due to its strong adaptability to small sample datasets (Xue et al., 2009).

3.4.1 Diversity counterfactual explanation

The input of counterfactual explanation model includes a trained SVM model (i.e., f(·)) and the feature vector $m_{i} \in ℝ^{1 \times q}$ of the i-th subject. Our goal is to generate a set of counterfactual examples ${x_{i}^{1}, x_{i}^{2}, . . ., x_{i}^{L}}$ for subject i such that its decision outcome $x_{i}^{l} \in ℝ^{1 \times q}$ is different from the prediction of the original feature vector m_i.

The counterfactual explanation model consists of three parts: loss function loss(·), distance function dist(·), and diversity metric diversity(·). Specifically, the first part pushes counterfactual $x_{i}^{l}$ toward different predictions, the second part makes counterfactual examples closer to the original input, and the third part is used to increase the diversity of counterfactual explanations. In the first part, we use a hinge loss function that helps generate counterfactuals with less variation by reducing the preference for extreme values. The hinge loss is expressed as follows:

\begin{array}{l} l o s s_{h i n g e} = m a x (0, 1 - z \cdot l o g i t (f (x)) & (8) \end{array}

where z is 1 when Ŷ = 1 and –1 when Ŷ = 0, and logit(f(x)) is the unscaled output of the SVM model. It is worth noting that in our experiments, 1 corresponds to normal subjects and 0 corresponds to patients, so in the verification of converting patients into normal subjects, Ŷ is usually set to 1. loss(·) represents the difference between the counterfactual example x and the target label Ŷ, and is used to ensure that the generated counterfactual example is consistent with the expected class in the prediction result. For the choice of distance function in the second part, we follow Wachter et al. (2017) proposal and divide the distance of each feature by the median absolute deviation (MAD) of the feature values in the training set:

\begin{array}{l} d i s t (x, m) = \frac{1}{L} \sum_{α = 1}^{L} \frac{| x^{α} - m^{α} |}{M A D_{α}} & (9) \end{array}

where MAD_α is the median absolute deviation of the α-th feature, L is the total number of counterfactual examples to generate, x represents the counterfactual example and m represents the original feature vector. The distance between the counterfactual example x and the original example m is calculated by dist(x, m). Its main purpose is to control and prevent the counterfactual example from deviating too much from the original example, thereby ensuring operability. For the third part, we use a determinant-based point procedure to measure the diversity of counterfactual examples, computed by the determinant value of its kernel matrix K:

\begin{array}{l} d i v e r s i t y = d e t (K) & (10) \end{array}

where $K_{u, v} = \frac{1}{1 + d i s t (x^{u}, x^{v})}$ , x^v and x^u represent two counterfactual examples. In the experiments, to avoid uncertain determinants, we add small random perturbations on the diagonal elements to calculate the determinant. The purpose of diversity(·) is to ensure that the generated counterfactual examples are diverse in the feature space, rather than producing a set of highly similar results. The diversity constraint does not affect the feasibility of individual counterfactual examples, but rather, by controlling the differences between them, it allows us to obtain multiple feasible explanatory paths.

To ensure the validity of the generated counterfactual examples, our optimization process ensures that each counterfactual example meets the following conditions: (i) Valid prediction change (Equation 8): Each counterfactual example's prediction for the target class Ŷ differs from the original input, ensuring the desired classification change is achieved. (ii) Similarity to the original example (Equation 9): Through distance loss dist(·), each counterfactual example maintains sufficient similarity to the original example, avoiding excessive feature perturbation and ensuring that the generated counterfactual examples are actionable and consistent with reality. (iii) Diversity (Equation 10): By introducing a diversity metric diversity(·), we ensure that the generated counterfactual samples are differentiated in the feature space, providing multiple different interpretation paths, rather than just multiple approximate modifications of the same input. Finally, we can obtain counterfactual examples by optimizing the following loss:

\begin{array}{l} \begin{matrix} X (m_{i}) = \frac{γ_{1}}{L} \sum_{l = 1}^{L} d i s t (x_{i}^{l}, m_{i}) \\ - γ_{2} d i v e r s i t y (x_{i}^{1}, x_{i}^{2}, . . ., x_{i}^{L}) \\ + \underset{x_{i}^{1}, x_{i}^{2}, . . ., x_{i}^{L}}{arg min} \frac{1}{L} \sum_{l = 1}^{L} l o s s_{h i n g e} (f (x_{i}^{l}), Ŷ) \end{matrix} & (11) \end{array}

where X(m_i) is the optimization objective function for counterfactual examples, γ₁ and γ₂ are hyperparameters for balancing the three parts of the loss function. The above formula reveals the minimum change required for the input data to achieve the idealized result. By adjusting the FC values between abnormal brain regions of SZ patients, their state may be closer to normal.

3.4.2 Using LLM to explain counterfactuals in natural language

After generating counterfactual examples, our goal is to infer the primary causes from them. To achieve this, we provide the LLM with a set of counterfactual examples (i.e., the FC features to be adjusted) along with the original FC features. We ask it to generate a final explanation in simplest terms, highlighting the steps the patient can take to transition to the healthy category. Specifically, when multiple steps are presented, the LLM ranks these steps based on their potential effectiveness and feasibility, providing the optimal solution. Figure 3 shows an example of the prompts used and the output generated by the LLM. This intuitive explanation approach can provide patients and doctors with the guidance they need to treat the disease.

Figure 3

An infographic depicting an ML system's negative result predicting schizophrenia treatment outcomes. It shows three brain diagrams labeled Original, Counterfactual 1, and Counterfactual 2, highlighting changes in functional connectivity. The prompt discusses altering connectivity to improve patient outcomes. The LLM answer explains two counterfactual examples: the first decreases SMG.L-ACG.L from 0.6 to 0.3, and the second increases MFG.L-HIP.R while decreasing SFGdor.L-AMYG.R. It concludes that Counterfactual 1 is easier to implement, recommending gradual strategies starting with it.

Figure 3. Examples of using LLM to explain counterfactual.

4 Experiments and results

4.1 Experimental setting

In this work, we use a support vector machine (SVM) classifier to perform the classification task on five SZ datasets. During the experiments, we evaluate the performance of different methods based on diagnostic accuracy (ACC = $\frac{TP+TN}{TP+TN+FP+FN}$ ), sensitivity (SEN = $\frac{TP}{TP+FN}$ ) and specificity (SPE = $\frac{TN}{TN+FP}$ ). FP, TP, FN, and TN represent false positives, true positives, false negatives, and true negatives, respectively. To ensure fairness, all compared feature selection methods use SVM classifiers. The LLMs we use include GPT-5, GPT-4.1 (Achiam et al., 2023), DeepSeek-V3.2 (Liu et al., 2025), Kimi-k2 (Team et al., 2025), and Gemini 2.5 Pro (Comanici et al., 2025), all of which are accessible through the OpenAI API. Given GPT-5's superior performance, we use the GPT-5 model in subsequent experiments. For an ablation study of the performance of these LLMs, please refer to Section 4.7. For each sample, we set the number of counterfactual examples L to be generated to 5, and the hyperparameters γ₁ = 0.5 and γ₂ = 1. Notably, we use a five-fold cross-validation strategy in all experiments and in the selection of hyperparameters.

4.2 Statistical analysis of FC features

In this set of experiments, we perform statistical analysis on the FC remaining after LLM-Lasso feature selection to demonstrate the effectiveness of our method. For intuitiveness, we show in Figure 4 the FC features retained by our method after feature selection on five datasets. As shown in Figure 4, through comprehensive analysis of the five datasets, we find significant FC abnormalities between SZ patients and NC, particularly in key brain regions such as the default mode network (DMN), sensorimotor network, and limbic system. FC abnormalities in these brain regions are closely associated with multiple core symptoms of SZ. For example, DMN regions (such as the PCUN and ANG) are associated with functions such as self-referential processing and mind wandering, and their abnormal connectivity is believed to underlie the neural basis of self-disorder and cognitive decline in SZ (Whitfield-Gabrieli et al., 2009; Tang et al., 2025). Abnormal connectivity between the INS and STG may reflect dysfunction in the salience network. As a core node in this network, dysfunction in the INS may lead to patients' misattribution of external stimuli, thereby triggering symptoms such as hallucinations and delusions (Palaniyappan and Liddle, 2012). Furthermore, abnormal FC between the AMYG and CAU has been detected in multiple datasets and is highly correlated with negative symptoms and motivational deficits in SZ (Arnedo et al., 2015). Abnormal connectivity between the PreCG and OLF suggests sensorimotor integration disorders, consistent with early anosmia and motor planning deficits in SZ (Li et al., 2018). Overall, these results demonstrate that our method can effectively extract stable and biologically meaningful FC features, helping to improve the accuracy and interpretability of SZ classification.

Figure 4

Brain network diagrams paired with bar charts representing data from five studies: COBRE, Huaxi, Nottingham, Taiwan, and Xiangya. Each section includes a brain map with annotated regions and corresponding bar charts showing statistical comparisons between SZ (schizophrenia) and NC (normal control) groups. Significant differences are marked with asterisks, indicating varying degrees of significance. The charts showcase changes in connectivity patterns across different brain regions and studies.

Figure 4. Functional connectivity (FC) retained after feature selection by our method in five datasets and statistical analysis. Among them, ^* indicates 0.01 < p < 0.05, ^** indicates 0.001 < p < 0.01, ^*** indicates 0.0001 < p < 0.001, and ^**** indicates p < 0.0001.

4.3 Comparison methods

We compare our proposed method with eight feature selection methods, including two state-of-the-art LLM-based methods (LLM-Select and LLM4FS), four traditional data-driven methods (Lasso, HSIC-Lasso, XGBoost, and Random feature selection), and two deep learning methods (CCNN and DeepFS), to comprehensively evaluate its performance in feature selection tasks.

The details are as follows: (i) LLM-Select (Jeong et al., 2024): A pure text-driven feature selection method that prompts the LLM to output feature importance scores. (ii) LLM4FS (Li and Xiu, 2025): A hybrid feature selection method that allows the LLM to directly call traditional algorithms to analyze sample data and output feature scores. (iii) Lasso (Tibshirani, 1996): A standard Lasso regression model without the LLM. (iv) HSIC-Lasso (Yamada et al., 2014): A kernel-based nonlinear feature selection method that uses the Hilbert-Schmidt independence criterion (HSIC) to measure the relevance of features to the target. (v) Xgboost (Chen and Guestrin, 2016): An embedded feature selection method that selects high-contribution features by training an XGBoost model and extracting feature importances. (vi) Random feature selection (RFS): A baseline method that randomly extracts a subset of features. (vii) CCNN (Meszlényi et al., 2017): Connectome Convolutional Neural Network, used for feature selection in FC. (viii) DeepFS (Li et al., 2023): Deep Feature Screening, uses deep learning to extract low-dimensional representations and perform feature selection.

For all of the above methods, hyperparameters and the LLM (LLM-Select: GPT-4.1, LLM4FS: GPT-4.5) used are set according to the values recommended in the original papers.

4.4 Classification performance

In this set of experiments, we compare our proposed method with eight methods and show the results in Table 2. It is not difficult to see that our method shows excellent stability and consistency on all five datasets. Specifically, across the five datasets, our method achieves ACC of 91.67% (COBRE), 85.48% (Huaxi), 90.48% (Nottingham), 88.46% (Taiwan), and 86.21% (Xiangya), respectively. Most methods achieve ACC below 85%. Furthermore, our method demonstrates outstanding performance in both SEN and SPE, achieving 90.91% SPE on the COBRE dataset and 92.00% SEN on the Huaxi dataset. This demonstrates the robustness of our method in distinguishing positive from negative samples. LLM4FS also performs well in SPE, achieving 90.91% and 81.25% on the COBRE and Huaixi datasets, respectively, exceeding other methods and demonstrating its strong ability to discriminate against negative samples. Furthermore, CCNN's performance only outperformed traditional methods, not the two LLM-based methods. However, DeepFS outperformed LLM-Select in terms of ACC on the COBRE, Taiwan, and Xiangya datasets, especially on the Taiwan dataset, where its SPE value of 87.62% surpassed all other methods. We also noted that LLM-based feature selection methods (LLM-Select and LLM4FS) perform well overall, significantly outperforming several other traditional methods. These results indicate that the LLM-based feature selection method not only has strong theoretical significance, but can also provide more reliable support for the diagnosis of complex diseases such as SZ in practical applications.

Table 2

Table 2. Classification performance comparison with existing methods.

4.5 Counterfactual explanation

In this set of experiments, we demonstrate how a counterfactual explanation model generates a set of intuitive and diverse counterfactual (CF) examples for patients. We provide counterfactual explanations by fine-tuning the abnormal FC values of the patients, specifically adjusting the FC values between regions to make the patient's state more similar to that of a healthy individual. As shown in Figure 5, we generate three different sets of CF examples for SZ patients and present them as brain maps. Furthermore, we feed these three sets of CF examples into the LLM, which generates the final intuitive explanations for the patients.

Figure 5

Brain scan analysis diagrams in two columns. The left column shows brain regions with labeled connections (COBRE, Hualai, Nottingham, Taiwan, Xianyang) in three configurations (CF1, CF2, CF3). The right column provides explanations for each configuration, detailing how specific brain regions relate to emotion regulation, cognition, and perception. Yellow highlights denote key areas of connectivity reduction or enhancement.

Figure 5. Examples of counterfactuals (CFs) are generated for randomly selected SZ patients from five datasets, along with their final interpretations generated by LLM. Only three counterfactuals are shown here, the remaining FC features after feature selection are in Figure 4. Red indicates increases in FC values between corresponding regions, while blue indicates decreases. The numerical changes between each pair of regions are highlighted. Note that due to space constraints, the final interpretations generated by LLM are abbreviated, with only the key details included.

It is evident from Figure 5 that we can bring the patient's condition close to that of a healthy individual by only slightly adjusting the FC values between the corresponding regions. Specifically, in the COBRE dataset, CF1 increases the FC value between CAU.L and AMYG.R from –0.359 to 0.431, and decreases the FC value between PCL.L and IOG.L from 0.282 to –0.681. The changes in CF2 and CF3 are similar to CF1, with no more than two connections adjusted. Similarly, across the five datasets, only two connections are adjusted. For datasets retaining only four features (such as Xiangya and Taiwan), most CF examples only require a single connection change. Furthermore, we find that the magnitude of the FC changes after counterfactual explanation remains stable within 1, indicating that these adjustments have a localized and controllable impact on the patient's brain FC, contributing to a stable state transition. In summary, our method not only helps clinicians identify key FC abnormalities but also provides powerful support for clinical diagnosis.

4.6 Cross-dataset validation

To further validate the generalization ability of our proposed method on different datasets, we design this set of experiments. Specifically, we select five datasets: COBRE, Huaxi, Nottingham, Taiwan, and Xiangya. We train on any four of these datasets and test on the remaining one. This experiment is repeated five times, with each dataset serving as the test set in turn, to ensure the robustness and reliability of the evaluation results.

The experimental results are shown in Figure 6. As can be seen, our method still exhibits good performance in cross-dataset testing. Although the performance across datasets decreases compared to training and testing on a single dataset only, the overall ACC remains above 83%, demonstrating the model's strong cross-dataset generalization ability. In the No-Xiangya cross-dataset experiment, the SPE for a single dataset is 86.36%, while the SPE for the cross-dataset experiment is 86.11%, showing a very small difference, indicating that our model adapts well to different datasets. Furthermore, the figure also shows other evaluation metrics for cross-dataset testing, with most metrics decreasing by no more than 5%, further validating the robustness and stability of our method. In summary, our proposed method not only achieves good performance on a single dataset but also effectively generalizes across datasets.

Figure 6

Bar chart displaying accuracy (ACC), sensitivity (SEN), and specificity (SPE) across different datasets: No-COBRE, No-Huaxi, No-Nottingham, No-Taiwan, and No-Xiangya. ACC ranges from 86.21 to 92.31, SEN from 80.64 to 86.40, and SPE from 82.05 to 88.69. Data shows variation in metrics with each dataset absence.

Figure 6. This demonstrates the performance of our method across datasets. No- indicates that the dataset is used for testing but not for training, the corresponding value is shown in red. Blue values represent the performance when using the dataset alone, i.e., when only the dataset is used for training and testing, and the values are consistent with the results in Table 2.

4.7 Ablation experiments

4.7.1 The impact of different LLMs on the results

In this set of experiments, we aim to evaluate the impact of the choice of core LLM component on the model's final performance. We compare five mainstream LLMs on the same five datasets: GPT-5, GPT-4.1, DeepSeek-V3.2, Kimi-k2, and Gemini 2.5 Pro. The results are shown in Figure 7.

Figure 7

Bar chart comparing the accuracy percentages of five models: Kimi, DeepSeek, GPT-4.1, Gemini, and GPT-5, across five datasets: COBRE, Huaxi, Nottingham, Taiwan, and Xiangya. Bars are shown with slight variations in accuracy, with all models achieving around 80% or higher. Each model is represented by a distinct color.

Figure 7. Accuracy of GPT-5, GPT-4.1, DeepSeek-V3.2, Kimi-k2, and Gemini 2.5 Pro on five datasets.

As shown in Figure 7, the choice of different LLMs significantly impacts overall performance, validating the critical importance of our core component design. Across all five datasets, GPT-5 demonstrates the most stable and superior performance, achieving the highest ACC across all tasks. In particular, on the Huaxi and Nottingham datasets, it achieves improvements of approximately 1.61% and 1.59% over the next-best model, Gemini. DeepSeek performs close to GPT-4.1 on medium-sized datasets (Nottingham and Taiwan), but lags behind by 3.23% on the largest sample, the Huaxi dataset, indicating that its capacity or knowledge density still lags behind the GPT series. Kimi ranks last across all five datasets, with the most significant decline on the Nottingham early SZ task. This is presumably due to the low proportion of psychiatric text in its pre-training corpus, resulting in insufficient prior memory. We believe this phenomenon may be related to the composition of the pre-training data used by each model: GPT and Gemini series extensively incorporate academic and professional web pages (such as arXiv¹ and PubMed²) into their training, while DeepSeek and Kimi still primarily use general web pages, with relatively limited coverage of specialized text. In summary, while all models possess strong language understanding capabilities, GPT-5, with its stronger contextual modeling capabilities and consistency, became the optimal core component choice for this study.

4.7.2 Comparison of LLM-Lasso and Lasso in FC selection

To evaluate the effectiveness of the LLM-Lasso method and compare it with the traditional Lasso method, we designed an ablation experiment under the same parameter settings, focusing on the differences in the number and overlap of FC features selected by the two methods. The experimental results are shown in Figure 8. We observe that Lasso selects a significantly higher number of FCs than the LLM-Lasso method, indicating that the traditional method tends to select more features, potentially leading to redundant information. Excessive features often result in model overfitting, affecting its generalization ability and diagnostic accuracy. In contrast, the LLM-Lasso method, by incorporating biological knowledge provided by LLM, effectively selected features, focusing on retaining key information related to SZ and avoiding interference from redundant features. Further analysis reveals a certain degree of overlap between the FCs selected by LLM-Lasso and those selected by Lasso, which may be because LLM-Lasso is an improvement upon Lasso. However, it is noteworthy that on the Taiwan dataset, LLM-Lasso identifies a FC (AMYG.R-CAU.L) that traditional Lasso fails to capture. An abnormal FC between AMYG.R-CAU.L has been shown to be highly correlated with SZ-negative symptoms and motivational deficits (Arnedo et al., 2015). This finding suggests that traditional Lasso methods may not adequately consider the biological importance of this connection, while LLM-Lasso, by incorporating domain knowledge, demonstrates greater biological interpretability in feature selection. Overall, LLM-Lasso effectively reduces the influence of redundant information and highlights FCs that are truly clinically significant.

Figure 8

Illustration showing five brain models labeled COBRE, Nottingham, Huaxi, Xiangya, and Taiwan. Each model features interconnected nodes with blue and red lines, representing connectivity patterns in neuroscience.

Figure 8. The differences between our proposed LLM-Lasso and the traditional Lasso in the number and overlap of FC features. Red connections indicate that the FC features screened by the two methods overlap, while blue connections indicate the FC features screened by the Lasso method. The FC features screened by LLM-Lasso can be seen in Figure 4.

5 Discussion

This study proposes a FC analysis method that LLM-based feature selection with counterfactual explanations, demonstrating good feature selection and interpretability on multiple SZ datasets. However, it's important to clarify that this method is currently more suitable as an exploratory brain imaging research tool than a clinically applicable diagnostic test. Although we introduce neuroscience knowledge constraints through LLM, the model still relies on limited samples and heterogeneous scan data, and the counterfactual recommendations have not yet been empirically linked to specific clinical interventions. These factors limit its direct application in personalized diagnosis. In the future, we will validate the stability of this method on larger datasets, gradually promoting its transformation from a research tool to clinical decision support.

Further analysis shows that at the brain network system level, we observe stable consistency across the datasets, with the selected FC features significantly concentrated in key brain regions such as the default mode network, sensorimotor network, and limbic system. However, at the level of specific connectivity edges, different datasets exhibit a certain degree of variability, which may be due to differences in scanning equipment or patient groups across different hospitals. This suggests that while this method can reliably identify which brain network systems may be problematic, it is insufficient to pinpoint the specific FC abnormality in each patient. Future research needs to incorporate more refined clinical stratification and multimodal data to elucidate the biological significance and clinical value of this lateral variability.

In addition, we introduce counterfactual explanations for SZ analysis, but this method has certain limitations. The hyperparameter L determines the number of counterfactual examples generated. We chose a fixed value for our experiments, but this choice was not subject to systematic sensitivity analysis or optimization. We recognize that the choice of L may affect the results of counterfactual example generation. We hypothesize that a small L value may lead to insufficient diversity among counterfactual samples, while a large L value may result in excessive diversity among samples, affecting its practical operability. In future research, we plan to explore further improvements to the counterfactual generation method, particularly in how to more effectively select and optimize the hyperparameter L. By dynamically adjusting the value of L, we expect to generate more diverse and accurate counterfactual examples, thereby improving the model's explanatory power in complex decision-making scenarios.

6 Conclusion

This study proposes an innovative framework that combines LLM-guided feature selection with counterfactual explanation, providing a new method for FC analysis in SZ. By incorporating prior knowledge from LLM into the feature selection process, FC features closely related to clinical diagnosis are prioritized, thereby improving the accuracy and interpretability of feature selection. At the same time, counterfactual explanation enables the generation of actionable recommendations that can assist clinical intervention, further enhancing the practicality and understandability of the model. The method achieves positive experimental results on five real-world SZ datasets. In combination with more brain disease data and clinical cases in the future, the method is expected to provide important support for the early diagnosis and personalized treatment of SZ.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

XY: Investigation, Methodology, Writing – original draft. TC: Data curation, Writing – original draft. YH: Resources, Software, Writing – original draft. LG: Validation, Writing – original draft. YS: Methodology, Writing – original draft. SW: Investigation, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Acknowledgments

We sincerely appreciate the researchers and institutions that provided the publicly available datasets used in this study, including COBRE, Huaxi, Nottingham, Taiwan, and Xiangya. These datasets have greatly contributed to the advancement of schizophrenia research. Additionally, we acknowledge the efforts of all participants and staff involved in data collection and preprocessing.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. The author(s) confirm that they did not use any generative AI techniques in the writing and revision of the paper. All text in this paper is the independent work of the author(s). However, during the experimental phase of the research described in this paper, we used large language models (LLMs) as experimental subjects or tools to achieve specific research goals. The author(s) are responsible for the entire content of the paper.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^https://arxiv.org

2. ^https://pubmed.ncbi.nlm.nih.gov

References

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., et al. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774.

Google Scholar

Arnedo J. Svrakic D. M. Del Val C. Romero-Zaliz R. Hernández-Cuervo H. Molecular Genetics of Schizophrenia Consortium . (2015). Uncovering the hidden risk architecture of the schizophrenias: confirmation in three independent genome-wide association studies. Am. J. Psychiatry 172, 139–153. doi: 10.1176/appi.ajp.2014.14040435

Crossref Full Text | Google Scholar

Bal-Ghaoui, M., and Sabri, F. (2025). Llm-fs-agent: a deliberative role-based large language model architecture for transparent feature selection. arXiv preprint arXiv:2510.05935.

Google Scholar

Chen, T., and Guestrin, C. (2016). “Xgboost: a scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. doi: 10.1145/2939672.2939785

Crossref Full Text | Google Scholar

Cheng, F., Ming, Y., and Qu, H. (2020). Dece: decision explorer with counterfactual explanations for machine learning models. IEEE Trans. Vis. Comput. Graph. 27, 1438–1447. doi: 10.1109/TVCG.2020.3030342

PubMed Abstract | Crossref Full Text | Google Scholar

Choi, K., Cundy, C., Srivastava, S., and Ermon, S. (2022). Lmpriors: pre-trained language models as task-specific priors. arXiv preprint arXiv:2210.12530.

Google Scholar

Chyzhyk, D., Savio, A., and Gra na, M. (2015). Computer aided diagnosis of schizophrenia on resting state fMRI data by ensembles of elm. Neural Netw. 68, 23–33. doi: 10.1016/j.neunet.2015.04.002

PubMed Abstract | Crossref Full Text | Google Scholar

Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., et al. (2025). Gemini 2.5: pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261.

Google Scholar

des Touches, T., Munda, M., Cornet, T., Gerkens, P., and Hellepute, T. (2023). Feature selection with prior knowledge improves interpretability of chemometrics models. Chemom. Intell. Lab. Syst. 240:104905. doi: 10.1016/j.chemolab.2023.104905

Crossref Full Text | Google Scholar

Fišar, Z. (2023). Biological hypotheses, risk factors, and biomarkers of schizophrenia. Progr. Neuro-psychopharmacol. Biol. Psychiat. 120:110626. doi: 10.1016/j.pnpbp.2022.110626

PubMed Abstract | Crossref Full Text | Google Scholar

Fonti, V., and Belitser, E. (2017). Feature selection using lasso. VU Amsterdam Research Paper in Business Analytics, 1–25.

Google Scholar

Fredes, A., and Vitria, J. (2024). Using LLMS for explaining sets of counterfactual examples to final users. arXiv preprint arXiv:2408.15133.

Google Scholar

Hua, J., Tembe, W. D., and Dougherty, E. R. (2009). Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognit. 42, 409–424. doi: 10.1016/j.patcog.2008.08.001

Crossref Full Text | Google Scholar

Insel, T. R. (2010). Rethinking schizophrenia. Nature 468, 187–193. doi: 10.1038/nature09552

Crossref Full Text | Google Scholar

Jeong, D. P., Lipton, Z. C., and Ravikumar, P. (2024). LLM-select: feature selection with large language models. arXiv preprint arXiv:2407.02694.

Google Scholar

Kaffes, V., Sacharidis, D., and Giannopoulos, G. (2021). “Model-agnostic counterfactual explanations of recommendations,” in Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, 280–285. doi: 10.1145/3450613.3456846

Crossref Full Text | Google Scholar

Khorram, S., and Fuxin, L. (2022). “Cycle-consistent counterfactuals by latent transformations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10203–10212. doi: 10.1109/CVPR52688.2022.00996

Crossref Full Text | Google Scholar

Li, J., and Xiu, X. (2025). Llm4fs: Leveraging large language models for feature selection and how to improve it. arXiv preprint arXiv:2503.24157.

Google Scholar

Li, K., Wang, F., Yang, L., and Liu, R. (2023). Deep feature screening: feature selection for ultra high-dimensional data via deep neural networks. Neurocomputing 538:126186. doi: 10.1016/j.neucom.2023.03.047

Crossref Full Text | Google Scholar

Li, X., Zhou, Y., Dvornek, N., Zhang, M., Gao, S., Zhuang, J., et al. (2021). Braingnn: Interpretable brain graph neural network for fMRI analysis. Med. Image Anal. 74:102233. doi: 10.1016/j.media.2021.102233

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Y., Li, W.-X., Zou, Y.-M., Yang, Z.-Y., Xie, D.-J., Yang, Y., et al. (2018). Revisiting the persistent negative symptoms proxy score using the clinical assessment interview for negative symptoms. Schizophr. Res. 202, 248–253. doi: 10.1016/j.schres.2018.07.005

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, A., Mei, A., Lin, B., Xue, B., Wang, B., Xu, B., et al. (2025). Deepseek-v3. 2: pushing the frontier of open large language models. arXiv preprint arXiv:2512.02556.

Google Scholar

Lynall, M.-E., Bassett, D. S., Kerwin, R., McKenna, P. J., Kitzbichler, M., Muller, U., et al. (2010). Functional connectivity and brain networks in schizophrenia. J. Neurosci. 30, 9477–9487. doi: 10.1523/JNEUROSCI.0333-10.2010

Crossref Full Text | Google Scholar

McCutcheon, R. A., Keefe, R. S., and McGuire, P. K. (2023). Cognitive impairment in schizophrenia: aetiology, pathophysiology, and treatment. Mol. Psychiatry 28, 1902–1918. doi: 10.1038/s41380-023-01949-9

PubMed Abstract | Crossref Full Text | Google Scholar

McCutcheon, R. A., Marques, T. R., and Howes, O. D. (2020). Schizophrenia—an overview. JAMA Psychiat. 77, 201–210. doi: 10.1001/jamapsychiatry.2019.3360

PubMed Abstract | Crossref Full Text | Google Scholar

Melistas, T., Spyrou, N., Gkouti, N., Sanchez, P., Vlontzos, A., Panagakis, Y., et al. (2024). “Benchmarking counterfactual image generation,” in Advances in Neural Information Processing Systems, 133207–133230. doi: 10.52202/079017-4233

Crossref Full Text | Google Scholar

Meszlényi, R. J., Buza, K., and Vidnyánszky, Z. (2017). Resting state fMRI functional connectivity-based classification using a convolutional neural network architecture. Front. Neuroinform. 11:61. doi: 10.3389/fninf.2017.00061

PubMed Abstract | Crossref Full Text | Google Scholar

Mhiri, I., and Rekik, I. (2020). Joint functional brain network atlas estimation and feature selection for neurological disorder diagnosis with application to autism. Med. Image Anal. 60:101596. doi: 10.1016/j.media.2019.101596

PubMed Abstract | Crossref Full Text | Google Scholar

Mothilal, R. K., Sharma, A., and Tan, C. (2020). “Explaining machine learning classifiers through diverse counterfactual explanations,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 607–617. doi: 10.1145/3351095.3372850

Crossref Full Text | Google Scholar

Mutian, O., Thomas, J. J., Tianzhou, Y., and Fiore, U. (2025). LLM-guided semantic feature selection for interpretable financial market forecasting in low-resource financial markets. Franklin Open 12:100359. doi: 10.1016/j.fraope.2025.100359

Crossref Full Text | Google Scholar

Naheed, N., Shaheen, M., Khan, S. A., Alawairdhi, M., and Khan, M. A. (2020). Importance of features selection, attributes selection, challenges and future directions for medical imaging data: a review. Comput. Model. Eng. Sci. 125, 314–344. doi: 10.32604/cmes.2020.011380

Crossref Full Text | Google Scholar

Oh J.-S. and Lee, J.-Y.. (2025). Latent self-consistency for reliable majority-set selection in short-and long-answer reasoning. arXiv preprint arXiv:2508.18395.

Google Scholar

Palaniyappan, L., and Liddle, P. F. (2012). Does the salience network play a cardinal role in psychosis? An emerging hypothesis of insular dysfunction. J. Psychiat. Neurosci. 37, 17–27. doi: 10.1503/jpn.100176

PubMed Abstract | Crossref Full Text | Google Scholar

Poyiadzi, R., Sokol, K., Santos-Rodriguez, R., De Bie, T., and Flach, P. (2020). “Face: feasible and actionable counterfactual explanations,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–350. doi: 10.1145/3375627.3375850

Crossref Full Text | Google Scholar

Richens, J. G., Lee, C. M., and Johri, S. (2020). Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 11:3923. doi: 10.1038/s41467-020-17419-7

PubMed Abstract | Crossref Full Text | Google Scholar

Shen, H., Wang, L., Liu, Y., and Hu, D. (2010). Discriminative analysis of resting-state functional connectivity patterns of schizophrenia using low dimensional embedding of fMRI. Neuroimage 49, 3110–3121. doi: 10.1016/j.neuroimage.2009.11.011

PubMed Abstract | Crossref Full Text | Google Scholar

Tang, B., Yao, L., Strawn, J. R., Zhang, W., and Lui, S. (2025). Neurostructural, neurofunctional, and clinical features of chronic, untreated schizophrenia: a narrative review. Schizophr. Bull. 51, 366–378. doi: 10.1093/schbul/sbae152

PubMed Abstract | Crossref Full Text | Google Scholar

Team, K., Bai, Y., Bao, Y., Chen, G., Chen, J., Chen, N., et al. (2025). Kimi k2: Open agentic intelligence. arXiv preprint arXiv:2507.20534.

Google Scholar

Tian, Y., and Zalesky, A. (2021). Machine learning prediction of cognition from functional connectivity: are feature weights reliable? Neuroimage 245:118648. doi: 10.1016/j.neuroimage.2021.118648

PubMed Abstract | Crossref Full Text | Google Scholar

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. Series B 58, 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x

Crossref Full Text | Google Scholar

Ting, C.-M., Ombao, H., Salleh, S.-H., and Abd Latif, A. Z. (2018). Multi-scale factor analysis of high-dimensional functional connectivity in brain networks. IEEE Trans. Netw. Sci. Eng. 7, 449–465. doi: 10.1109/TNSE.2018.2869862

Crossref Full Text | Google Scholar

Ustun, B., Spangher, A., and Liu, Y. (2019). “Actionable recourse in linear classification,” in Proceedings of the Conference on Fairness, Accountability, and Transparency, 10–19. doi: 10.1145/3287560.3287566

Crossref Full Text | Google Scholar

Verma, S., Boonsanong, V., Hoang, M., Hines, K., Dickerson, J., and Shah, C. (2024). Counterfactual explanations and algorithmic recourses for machine learning: a review. ACM Comput. Surv. 56, 1–42. doi: 10.1145/3677119

Crossref Full Text | Google Scholar

Wachter, S., Mittelstadt, B., and Russell, C. (2017). Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31:841. doi: 10.2139/ssrn.3063289

Crossref Full Text | Google Scholar

Wang, C., Li, X.-H., Han, H., Wang, S., Wang, L., Cao, C. C., et al. (2021). “Counterfactual explanations in explainable AI: a tutorial,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Data Mining, 4080–4081. doi: 10.1145/3447548.3470797

Crossref Full Text | Google Scholar

Wang, J., Chen, J., Zhou, R., Gao, Y., and Li, J. (2022). Machine learning-based multiparametric MRI radiomics for predicting poor responders after neoadjuvant chemoradiotherapy in rectal cancer patients. BMC Cancer 22:420. doi: 10.1186/s12885-022-09518-z

PubMed Abstract | Crossref Full Text | Google Scholar

Whitfield-Gabrieli, S., Thermenos, H. W., Milanovic, S., Tsuang, M. T., Faraone, S. V., McCarley, R. W., et al. (2009). Hyperactivity and hyperconnectivity of the default network in schizophrenia and in first-degree relatives of persons with schizophrenia. Proc. Nat. Acad. Sci. 106, 1279–1284. doi: 10.1073/pnas.0809141106

PubMed Abstract | Crossref Full Text | Google Scholar

Xue, H., Yang, Q., and Chen, S. (2009). “SVM: Support vector machines,” in The Top Ten Algorithms in Data Mining (Chapman and Hall/CRC), 51–74. doi: 10.1201/9781420089653-10

Crossref Full Text | Google Scholar

Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., and Sugiyama, M. (2014). High-dimensional feature selection by feature-wise kernelized lasso. Neural Comput. 26, 185–207. doi: 10.1162/NECO_a_00537

PubMed Abstract | Crossref Full Text | Google Scholar

Yang, L., Kenny, E. M., Ng, T. L. J., Yang, Y., Smyth, B., and Dong, R. (2020). Generating plausible counterfactual explanations for deep transformers in financial text classification. arXiv preprint arXiv:2010.12512.

Google Scholar

Zhang, E., Goto, R., Sagan, N., Mutter, J., Phillips, N., Alizadeh, A., et al. (2025). LLM-lasso: a robust framework for domain-informed feature selection and regularization. arXiv preprint arXiv:2502.10648.

Google Scholar

Zhang, X., Braun, U., Harneit, A., Zang, Z., Geiger, L. S., Betzel, R. F., et al. (2021). Generative network models of altered structural brain connectivity in schizophrenia. Neuroimage 225:117510. doi: 10.1016/j.neuroimage.2020.117510

PubMed Abstract | Crossref Full Text | Google Scholar

Zhu, C., Tan, Y., Yang, S., Miao, J., Zhu, J., Huang, H., et al. (2024). Temporal dynamic synchronous functional brain network for schizophrenia classification and lateralization analysis. IEEE Trans. Med. Imag. 43, 4307–4318. doi: 10.1109/TMI.2024.3419041

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: counterfactual explanation, feature selection, functional connectivity, large language model, schizophrenia

Citation: Yuan X, Chen T, He Y, Gu L, Sun Y and Wei S (2026) LLM-based feature selection and counterfactual explanations applied to functional connectivity analysis in schizophrenia. Front. Neurosci. 19:1732013. doi: 10.3389/fnins.2025.1732013

Received: 25 October 2025; Revised: 10 December 2025;
Accepted: 17 December 2025; Published: 12 January 2026.

Edited by:

Wei Wang, Capital Medical University, China

Reviewed by:

Nikolaos Smyrnis, National and Kapodistrian University of Athens, Greece
Eunsong Kang, Kangwon National University, Republic of Korea

Copyright © 2026 Yuan, Chen, He, Gu, Sun and Wei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shaolong Wei, d2Vpc2hhb2xvbmczN0BnbWFpbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.