CGSDA: inferring snoRNA-disease associations via ChebNetII and GatedGCN

Zou, Yongfu; Lu, Yusong; Lu, Shanghui; Wei, Zhanliang; Li, Le; Liao, Shuilin; Zeng, Ting; Zhang, Yi; Miao, Rui

doi:10.3389/fgene.2025.1684484

ORIGINAL RESEARCH article

Front. Genet., 16 October 2025

Sec. Computational Genomics

Volume 16 - 2025 | https://doi.org/10.3389/fgene.2025.1684484

This article is part of the Research TopicGenomic Medicine, Therapeutics, and Health: Thai Symposium & Workshop in BangkokView all articles

CGSDA: inferring snoRNA-disease associations via ChebNetII and GatedGCN

Yongfu Zou¹

Yusong Lu¹*

Shanghui Lu¹

Zhanliang Wei²

Le Li³

Shuilin Liao⁴

Ting Zeng⁴

Yi Zhang⁵

Rui Miao³

¹School of Mathematics and Physics, Hechi University, Hechi, China
²Department of Neurosurgery, The Second Nanning People’s Hospital, Nanning, China
³Basic Teaching Department, Zhuhai Campus of Zunyi Medical University, Zhuhai, China
⁴Faculty of Innovation Enginee, Macau University of Science and Technology, Taipa, China
⁵School of Tourism and Culture, Nanning Normal University, Nanning, China

Introduction: Recent biomedical studies have highlighted the pivotal role of non-coding RNAs (ncRNAs) in gene regulatory networks, where they influence gene expression, cellular function, and the onset and progression of various diseases. Among these, small nucleolar RNAs (snoRNAs), a prominent class of small ncRNAs, have attracted considerable research attention over the past two decades. Initially recognized for their involvement in rRNA processing and modification, snoRNAs are now understood to contribute to broader biological processes, including the regulation of disease mechanisms, maintenance of cellular homeostasis, and development of targeted therapeutic strategies. With ongoing advancements, snoRNAs are increasingly regarded as promising candidates for novel therapeutic agents in cancer, neurodegenerative disorders, endocrine conditions, and cardiovascular diseases. Consequently, there is a growing demand for efficient, cost-effective, and environment-independent approaches to study snoRNAs, which has driven the adoption of computational methodologies in this domain.

Methods: In this work, we propose a novel predictive framework, CGSDA, which integrates a ChebNetII convolutional network with a gated graph sequence neural network to identify potential snoRNA–disease associations. The model begins by constructing a snoRNA–disease association network, embedding residual mechanisms into both modules to effectively capture the representations of snoRNAs and diseases. These representations are then fused and dimensionally reduced, after which the refined embeddings are fed into a predictor to generate association predictions.

Results: Experimental evaluation demonstrates that CGSDA consistently outperforms baseline models in predictive accuracy. Ablation experiments were conducted to assess the contribution of each module, confirming that all components substantially enhance overall performance and validating the robustness of the proposed method. Furthermore, case studies on lung cancer and breast cancer showed that 10 out of the top 15 and 12 out of the top 15 predicted snoRNA-disease associations were validated by existing literature, respectively, confirming the model’s effectiveness in identifying potential novel snoRNA-disease associations.

Discussion: The implementation of CGSDA, along with relevant datasets, is publicly available at: https://github.com/cuntjx/CGSDA. This public release enables the research community to further validate and apply the framework, supporting advancements in computational identification of snoRNA–disease associations and facilitating progress in snoRNA-based therapeutic development, and ultimately benefiting human health.

1 Introduction

SnoRNAs are a class of ncRNAs predominantly located in the nucleolus of eukaryotic cells, where they play critical roles in RNA modification. They are generally categorized into two major groups: H/ACA box snoRNAs and C/D box snoRNAs. H/ACA box snoRNAs are primarily responsible for guiding pseudouridylation, whereas C/D box snoRNAs direct site-specific methylation. Accumulating evidence indicates that methylation and pseudouridylation, both mediated by snoRNAs, are essential processing steps in the maturation of precursor rRNA into functional rRNA. Notably, a single snoRNA molecule typically serves as a guide for no more than two individual RNA modification sites. Beyond these canonical functions, some snoRNAs also exhibit non-traditional roles; for example (Ender et al., 2008; Babiarz et al., 2008; Taft et al., 2009), and HBII-180C (Ono et al., 2010) can function as microRNAs (miRNAs).

1.1 SnoRNA involmente in disease

Emerging evidence demonstrates that snoRNAs are differentially expressed and participate in key biological processes such as apoptosis, proliferation, and differentiation. Owing to their multidimensional regulatory functions, elucidating the molecular mechanisms through which snoRNAs influence disease development has become a major frontier in biomedical research over the past 2 decades. For instance, Tang et al. (2019) investigated the role of snoRNAs in non-small cell lung cancer (NSCLC) using both in vitro and in vivo loss-of-function analyses. Their study revealed that SNORA71A functions as an oncogene in NSCLC and contributes to disease progression. Similarly, Liu et al. (2022) examined the role of SNORD1C in colorectal cancer through multiple experimental approaches, demonstrating that this snoRNA is involved in several tumor-related processes and plays a critical role in cancer progression. Xu et al. (2014) employed quantitative RT-PCR to assess snoRNA expression in tissue samples from over one hundred hepatocellular carcinoma (HCC) patients, further validating their findings using cell-based experiments and a xenograft nude mouse model. Their results showed that upregulation of SNORD113-1 inhibited HCC growth, suggesting its potential utility as both a diagnostic biomarker and therapeutic target. Beyond these individual studies, a growing body of evidence underscores the involvement of snoRNAs in diverse human diseases. In 2022, Huang et al. (2022) published a systematic review summarizing the biological functions and mechanistic roles of snoRNAs in tumor pathophysiology. Subsequently, Chabronova et al. (2024) provided a comprehensive review focusing on snoRNAs in cardiovascular development, physiology, and heart-related disorders. Their work not only consolidated current findings but also highlighted the potential clinical applications of snoRNAs in cardiovascular medicine. In the same year, Shen et al. (2024) conducted another systematic review that emphasized the diagnostic and therapeutic significance of snoRNAs across multiple disease contexts.

In addition to studies on human diseases, several investigations have explored the association between snoRNAs and animal pathologies (Omonijo et al., 2023; Regårdh et al., 2022; Anderson et al., 2022). Such cross-species comparative studies not only confirm the broad involvement of snoRNAs in disease processes across species but also help elucidate the conserved functional mechanisms of snoRNAs in disease development. Therefore, exploring the association between snoRNAs and diseases can help reveal the complex mechanisms of disease occurrence and development and, ultimately, benefit clinical applications.

1.2 Computational approaches for snoRNA–disease association prediction

As a methodological system characterized by high efficiency in processing large-scale data, strong scalability, and cross-domain adaptability, computational approaches have become deeply embedded in multidisciplinary and interdisciplinary research fields such as bioinformatics, drug discovery, and systems biology. These methods, supported by the rapid improvement in the cost–performance ratio of computational resources and the iterative advancement of deep learning architectures, now represent a central paradigm for the integration and analysis of multimodal data. In the context of snoRNA–disease association prediction, computational methods have been extensively applied, offering advantages of low cost, high timeliness, and broad generalizability when used as guidance tools. To the best of our knowledge, the first computational framework developed for this purpose was the iSnoDi-LSGT model, proposed by Zhang and Liu (2022). This model integrates dual constraints with topological node embeddings to predict potential snoRNA–disease associations. Building upon this foundation, several computational frameworks have since emerged. For example, Sun et al. (2022) introduced the PSnoD model, which draws inspiration from matrix completion techniques. Their framework constructs three networks and incorporates bounded nuclear norm regularization into a matrix completion strategy to enhance prediction Acc. Momanyi et al. (2024) proposed SAGESDA, a graph neural network–based approach that fuses multiple networks to build snoRNA–disease heterogeneous networks. They then applied the GraphSAGE algorithm to learn node representations, followed by a dot product classifier for association inference.

Zhang et al. further advanced the field by developing GCLSDA, a method based on a lightweight graph convolutional network (GCN). This approach first extracts node representations of snoRNAs and diseases via a light GCN (LightGCN), and then applies a contrastive learning mechanism to mitigate the challenges of sparse correlation matrices and node embedding over-smoothing, thereby improving model performance (Zhang et al., 2023). Similar GCN-based or variant methods include GCNSDA (Liu et al., 2021), GCASDA (Liu et al., 2024) and IGCNSDA (Hu et al., 2024). Beyond GCNs, Muna et al. (2025) proposed GBDTSVM, a hybrid framework combining two classical machine learning algorithms: gradient-boosted decision trees (GBDT) and support vector machines (SVM). In this method, GBDT is used to extract node features, which are subsequently passed to the SVM classifier for snoRNA–disease prediction. More recently, La Rosa et al. proposed GL4SDA, a novel framework that integrates multiple modalities. This model derives snoRNA representations from their secondary structures, leverages large language models (LLMs) to generate disease features, and finally employs a graph neural network with an attention mechanism to predict snoRNA–disease associations. As shown in Table 1, we have provided a brief summary of the state-of-the-art methods.

Table 1

Table 1. The brief summary of the state-of-the-art methods.

Although the past 2 decades have yielded substantial evidence supporting snoRNA–disease associations, the systematic collection of relevant data has lagged behind. This limitation has hindered the effective application of computational methods in this field, as these approaches typically require large-scale datasets to achieve optimal performance. Consequently, the relatively limited availability of high-quality data has been one of the key factors restricting broader computational applications in snoRNA–disease association prediction. Furthermore, there are relatively few high-performance models in the field of snoRNA-disease association prediction, and most of them adopt a single graph neural network for feature extraction or association prediction, which may bring the following adverse impacts on model performance. First, the over-smoothing of graph neural networks can degrade model performance. Second, relying solely on one type of neural network for feature extraction may exert an adverse impact on model performance due to the learning bias of the neural network. To address this gap and to further advance the development and optimization of predictive models, we propose a novel framework, CGSDA, designed to identify potential snoRNA–disease associations. CGSDA integrates two GNN modules: the ChebNetII convolutional network (ChebNetII) and gated graph convolutional network (Gated). The framework operates in three main stages. First, a snoRNA–disease association network is constructed, with a residual mechanism embedded into both modules to alleviate over-smoothing during representation learning. Next, the node embeddings learned by the two graph neural network modules are fused and subjected to dimensionality reduction, aiming to mitigate the adverse impact of the learning bias of a single neural network on model performance. Finally, the reduced-dimensional representations are passed into an inner product decoder to generate predictions of potential snoRNA–disease associations. Comparative experiments demonstrate that CGSDA consistently outperforms baseline models in prediction Acc. Furthermore, ablation studies reveal that each component of the framework contributes significantly to overall performance, thereby confirming the effectiveness and robustness of the proposed model.

2 Materials and methods

2.1 Dataset

In this study, two datasets (i.e., dataset SDAD and dataset MNDR) were employed. Specifically, Dataset SDAD was curated by La Rosa et al. (2025) and served as a subset of the RNADisease v4.0 database. SDAD encompasses 60 diseases, 384 snoRNAs, and 911 experimentally validated snoRNA-disease associations. Dataset MNDR, by contrast, was collected by Sun et al. (2022), and it contains 27 diseases, 220 snoRNAs, and 459 experimentally validated snoRNA-disease associations. Detailed information about the SDAD dataset is available at https://github.com/BCB4PM/GL4SDA. Dataset MNDR can be downloaded from https://github.com/linDing-groups/PSnoD or https://github.com/mariamuna04/gbdtsvm. We place the basic information of the two datasets in Table 2. In this work, we represent the snoRNA–disease association network as a graph, where snoRNAs and diseases serve as the nodes. We use $N_{S}$ and $N_{D}$ to denote the number of snoRNAs and diseases, respectively, and thus we have $N_{S} = 384$ and $N_{D} = 60$ in SDAD dataset. Here, we use $S$ to denote the set consisting of snoRNAs, denoted as $S = {s_{1}, \dots, s_{N_{s}}}$ . Similarly, we use $D$ to denote the set consisting of diseases, denoted as $D = {d_{1}, \dots, d_{N_{D}}}$ . The complete set of nodes is then given by $V = {S, D}$ . We define the adjacency (association) matrix of the snoRNA–disease association network as $A$ , where each element $i$ th row and $j$ th column is specified according to Equation 1:

A_{i j} = \{\begin{cases} 1, & \begin{aligned} if snoRNA s_{i} (1 \leq i \leq N_{S}) is associated \\ with disease d_{j} (1 \leq j \leq N_{D}) \end{aligned} \\ 0, & unconfirmed or unknown \end{cases} (1)

Table 2

Table 2. The basic information about SDAD and MNDR dataset.

During the training phase, all experimentally validated snoRNA–disease associations are treated as positive samples, while the remaining unobserved pairs are regarded as negative samples. To construct the training set, a subset of the positive samples is randomly removed from the original snoRNA–disease association matrix $A$ . The resulting matrix, denoted as $A_{train}$ , is then used for model training.

2.2 Initial features of snoRNA

There are numerous methods for extracting features from RNA molecules, and the choice of method can significantly influence downstream tasks. Among them, the k-mer algorithm is one of the most widely used approaches due to its efficiency and broad applicability. It is implemented in several feature extraction tools such as Jellyfish, KMC, and Kraken. The k-mer algorithm has two key capabilities: (1) it counts k-mer occurrence frequencies to capture sequence composition patterns, repetitive regions, and mutation hotspots; and (2) it can be adapted to different applications by varying the value of k. For instance, setting $k = 6$ allows for the characterization of six-nucleotide conserved regions within RNA sequences, which can help identify potential miRNA binding sites. Recent studies have systematically evaluated the impact of different k values on the performance of various downstream tasks (e.g., Vinje et al. (2015); Chor et al. (2009); Tahara et al. (2023)). Despite its advantages, the k-mer algorithm faces challenges. Extremely long or short sequences may compromise feature quality, and conventional k-mer approaches typically focus only on linear sequence information, overlooking secondary structural features. However, accumulating evidence demonstrates that RNA secondary structure plays a critical role in determining function. For example, studies have confirmed that snoRNA activity is strongly influenced by its structural conformation (Ganot et al., 1997; Kiss-László et al., 1996). This highlights the necessity of employing feature extraction algorithms that incorporate structural information.

One such method is nRC, a tool specifically developed by Fiannaca et al. (2017) for non-coding RNA (ncRNA) feature extraction, differs from other methods that overlook ncRNA structural features in that it can integrate the secondary structure information of ncRNAs into feature representation, thereby improving feature quality. Previous studies (e.g., La Rosa et al. (2025)) have verified that nRC outperforms baseline methods in ncRNA-related prediction tasks, confirming its suitability for snoRNA feature extraction. Therefore, we directly used the features extracted by nRC as the initial features of snoRNAs in the CGSDA model.

2.3 Initial features of disease

There are multiple approaches to representing disease features. One of the earliest methods was one-hot encoding based on disease categories. However, while effective for simple classification, this representation has shown limited performance in more complex tasks. Another widely adopted approach is disease semantic similarity, introduced by Xuan et al. (2013), Schriml et al. (2012), which leverages the MeSH database (https://www.ncbi.nlm.nih.gov/) to compute ontology-based similarity between diseases (Lu et al., 2022; Ouyang et al., 2024). This method has since been applied extensively across several studies. Importantly, many diseases are described in detail through textual resources, including clinical manifestations, pathogenic mechanisms, diagnostic criteria, classification schemes, and progression patterns. These textual descriptions capture rich and evolving knowledge about diseases, yet traditional feature extraction methods often overlook this valuable source of information.

Large Language Models (LLMs) can effectively understand the contextual information of text and generate corresponding numerical features as needed. The bge-icl-en tool, developed by Li et al. (2024) from the Beijing Academy of Artificial Intelligence (BAAI), is capable of converting English textual information into word embeddings. This tool adopts an open-source model and can be downloaded and deployed by users via https://github.com/FlagOpen/FlagEmbedding. We chose to generate the initial features of diseases by inputting the textual summaries of diseases from the MalaCards database (Rappaport et al., 2013) into the bge-icl-en tool, based on the two key considerations.

First, the MalaCards database (https://www.malacards.org/) integrates dozens of data sources and provides detailed textual descriptions for more than 16,000 human diseases. The textual information about the disease in MalaCards, including the status and classification of the disease, pathogenesis, characteristics and impacts of the disease, etiology and risk factors, preventive interventions, disease management and prognosis, contains rich semantic details, which are crucial for distinguishing disease characteristics. Unlike traditional methods (e.g., one-hot encoding, ontology-based semantic similarity), LLMs (such as bge-icl-en) can capture contextualized semantic information, avoiding the limitations of disease feature representation.

Second, the bge-icl-en tool has demonstrated excellent performance in multiple text embedding benchmark tests (Li et al., 2024) and supports open-source deployment, which not only ensures the accuracy of disease feature extraction but also guarantees the reproducibility of experimental results.

Specifically, we used the “Summary” field in the disease entries of MalaCards as the input to bge-icl-en, and the generated embedding vectors were used as the initial features of diseases.

2.4 CGSDA

Spectral convolutional networks represent an important class of graph neural networks with broad applications across diverse tasks. Among the most representative models are ChebNet (Defferrard et al., 2016) and GCN (Kipf and Welling, 2016), both of which perform spectral graph convolutions using Chebyshev polynomials. Notably, GCN can be viewed as a simplified variant of ChebNet, as it relies only on the first two Chebyshev polynomials. Despite this simplification, GCN often outperforms ChebNet in practice. To address this limitation, the ChebNetII model was introduced, which enhances Chebyshev polynomial approximation through Chebyshev interpolation, thereby mitigating the Runge phenomenon and improving model performance. Given its demonstrated superiority over baseline models, we adopted ChebNetII in this study to extract feature representations of snoRNAs and diseases.

Li et al. (2015) integrated gated loop units, optimization techniques, and graph neural networks to propose GGNNs, which are more effective than traditional sequence models in extracting node embeddings from graphs. As is well known, deep learning models exhibit “preference learning” toward certain samples. To enhance both performance and robustness, we combine the ChebNetII model with the Gated model, introducing the CGSDA framework for predicting potential snoRNA–disease associations. The overall structure of the CGSDA model is illustrated in Figures 1, 3. At a high level, the core process of the model can be divided into three main steps.

Figure 1

Diagram illustrating a neural network model with layers. It features two branches showing one-layer multi-layer perceptrons (MLP) applied to snoRNA and disease data, respectively. These are merged through concatenation into the ChebNetII Convolutional Network. The process progresses through the first and ith layers, with concatenations and graph representations, ending in outputs labeled $F^{(\text{MLP})}$, $F_{\text{Cheb}}^{(1)}$, and $F_{\text{Cheb}}^{(l)}$.

Figure 1. The overview of CGSDA framework. Embedding extracted by ChebNetII.

Step 1. The initial features of snoRNAs and diseases are first passed through a single-layer multilayer perceptron (MLP) for dimensionality reduction. The resulting downscaled features are then concatenated and input into the multilayer ChebNetII network. To mitigate the negative effects of the “oversmoothing” problem commonly observed in graph neural networks, we introduce a residual mechanism that concatenates the downscaled representations with the inputs of each ChebNetII layer. Finally, the output of the terminal ChebNetII layer is fed into another single-layer MLP for further refinement and dimensionality reduction of the node embeddings, yielding the final output $F_{Cheb}^{(L)}$ of the ChebNetII module. The detailed process is illustrated in Figure 1.

Step 2. In this step, the ChebNetII module is replaced with the Gated module, while following a similar process to extract snoRNA and disease features. Specifically, the initial features of snoRNAs and diseases are first reduced in dimension using a single-layer MLP. To address the oversmoothing issue in GNNs and its adverse effect on model performance, the reduced features are then concatenated into the multilayer GGNN, enhanced with a residual mechanism. After processing, the Gated module produces the final output $F_{Gated}^{(M)}$ . The detailed workflow is illustrated in Figure 2.

Figure 2

Diagram illustrating a neural network architecture for processing snoRNA and disease data. The process begins with one-layer MLPs applied separately to snoRNA and disease inputs, represented as grids, generating outputs $F^0_s$ and $F^0_d$. These are concatenated and further processed by a series of gated graph sequence neural networks, depicted with interconnected blue and yellow nodes. The output transitions through multiple layers, each resulting in progressively transformed matrices $F^{\text{MLP}}$ to $F^{\text{Gated}}$. Each stage involves a concatenation followed by gated graph transformations.

Figure 2. The overview of CGSDA framework. Embedding extracted by gated graph sequence neural networks.

Step 3. In the final step, the embeddings $F_{Cheb}^{(L)}$ and $F_{Gated}^{(M)}$ , obtained from Steps 1 and 2, are concatenated to form the final node embedding $F_{Final}$ . An inner product is then computed between $F_{Final}$ and its transpose $F_{Final}^{T}$ to predict potential snoRNA–disease associations. This process is formalized in Equations 14, 15, and the detailed workflow is illustrated in Figure 3.

Figure 3

Flowchart illustrating a process for predicting snoRNA-disease associations. Two matrices, $ F_{Cheb}^{(L)} $ and $ F_{Gated}^{(M)} $, are concatenated to form $ F_{Final} $. An inner product of $ F_{Final} $ and its transpose is taken, followed by a nonlinear activation function, resulting in the predicted association matrix.

Figure 3. The overview of CGSDA framework. Embedding fusions and predicting potential snoRNA-disease associations.

2.4.1 Embedding extracted by ChebNetII

To accommodate both the ChebNetII and Gated modules, we define the association matrix and the initial feature representations using Equations 2, 3. Specifically, $F_{s}^{(0)}$ denotes the initial features of snoRNAs extracted with the nRC tool, while $F_{d}^{(0)}$ represents the initial features of diseases generated by the bge-icl-en tool. Furthermore, $M L P$ refers to a single-layer MLP network.

Y = [\begin{matrix} 0 & A_{train} \\ A_{train}^{T} & 0 \end{matrix}], (2)

F^{(M L P)} = [\begin{matrix} F_{s}^{(M L P)} \\ F_{d}^{(M L P)} \end{matrix}], F_{s}^{(M L P)} = M L P (F_{s}^{(0)}), F_{d}^{(M L P)} = M L P (F_{d}^{(0)}) . (3)

For the ChebNetII module, the process of learning node embeddings can be formally expressed as Equations 4:

F_{Cheb}^{(m + 1)} = \frac{2}{K + 1} \sum_{k = 0}^{K} \sum_{j = 0}^{K} γ_{j} T_{k} (x_{j}) T_{k} (\hat{L}) F_{Cheb}^{(m)}, m = 0, \dots, M . (4)

In this equation, $M$ denotes the number of network layers in the ChebNetII module, and $F_{Cheb}^{(0)} = F^{(M L P)}$ . The parameter $K$ represents the order of the Chebyshev polynomial, where $T_{k} (x_{j}) = c o s (k \cdot a r c c o s (x_{j}))$ defines the Chebyshev polynomial and $x_{j} = c o s (\frac{(j + 0.5) π}{K + 1})$ . The polynomial order $K$ is a critical parameter linking the theoretical foundation of Chebyshev polynomials to their practical applications. Both theory and empirical studies have shown that lower-order polynomials are effective for approximating smooth and gradually varying functions, whereas higher-order polynomials are required to capture more complex functional relationships. However, although increasing the order improves the model’s capacity to fit complex functions and enhances approximation Acc., it also leads to higher computational costs and potential overfitting. Therefore, selecting an appropriate order requires balancing Acc., efficiency, and stability according to the specific task. Notably, ChebNetII becomes approximately equivalent to GCN when $K = 1$ . Furthermore, $\hat{L}$ represents the scaled Laplacian matrix of the adjacency matrix $Y$ and $γ_{j}$ are the learnable parameters for $j = 0,1, \dots, K$ .

2.4.2 Embedding extracted by GatedGCN

Similar to the ChebNetII module, the GatedGCN module begins by feeding the initial features of snoRNAs and diseases into a single-layer MLP for learning and dimensionality reduction. The resulting feature representations are then concatenated and reintroduced into the GatedGCN network to generate node embeddings. The learning process of node representations within the convolutional layers of the GatedGCN network can be formally described by Equations 5–7:

h_{v}^{(0)} = F_{v}^{(M L P)} ‖ 0, v \in V . (5)

α_{v}^{(l + 1)} = \sum_{u \in N (v)} e_{u, v} \cdot Θ^{(l)} \cdot h_{v}^{(l)}, l = 0, \dots, L . (6)

h_{v}^{(l + 1)} = G R U (α_{v}^{(l + 1)}, h_{v}^{(l)}) . (7)

Where $L$ denotes the sequence length of the convolutional layer in the GatedGCN module. $h_{v}^{(l)}, l = 0, \dots, L$ represents the embedding vector of node $v$ at the $l$ -th convolutional layer, while $F_{v}^{(M L P)}$ is the representation of the node’s initial features after passing through a single-layer MLP. The symbol $‖$ denotes a concatenation operation. $N (v)$ refers to the set of nodes adjacent to node $v$ , and $e_{u, v}$ indicates the edge weight from source node $u$ to target node $v$ , with a default value of 1. $Θ^{(l)}$ denotes the learnable parameter matrix, while the gated recurrent unit (GRU) regulates the update of node embeddings. The operations of the GRU can be expressed as Equations 8–11.

β_{v}^{(l + 1)} = σ (W_{α β}^{(l + 1)} α_{v}^{(l + 1)} + b_{α β}^{(l + 1)} + W_{β}^{(l + 1)} h_{v}^{(l)} + b_{β}^{(l + 1)}), l = 0, \dots, L, (8)

ζ_{v}^{(l + 1)} = σ (W_{α ζ}^{(l + 1)} α_{v}^{(l + 1)} + b_{α ζ}^{(l + 1)} + W_{ζ}^{(l + 1)} h_{v}^{(l)} + b_{ζ}^{(l + 1)}), (9)

η_{v}^{(l + 1)} = t a n h (W_{α η}^{(l + 1)} α_{v}^{(l + 1)} + b_{α η}^{(l + 1)} + β_{v}^{(l + 1)} ⊙ (W_{η}^{(l + 1)} h_{v}^{(l)} + b_{η}^{(l + 1)})), (10)

h_{v}^{(l + 1)} = (1 - ζ_{v}^{(l + 1)}) ⊙ η_{v}^{(l + 1)} + ζ_{v}^{(l + 1)} ⊙ h_{v}^{(l)} . (11)

Where $σ$ is the sigmoid function, $W_{α β}$ , $W_{β}$ , $W_{α ζ}$ , $W_{ζ}$ , $W_{α η}$ , $W_{η}$ , $b_{α β}$ , $b_{β}$ , $b_{α ζ}$ , $b_{ζ}$ , $b_{α η}$ , $b_{η}$ are learnable parameters, and $⊙$ is the Hadamard product.

2.4.3 Residual mechanism and feature fusion

To mitigate the negative impact of “oversmoothing” on GNN performance, we incorporate a residual mechanism into the CGSDA model. As illustrated in Figure 1, the output features of the single-layer MLP are directly connected to the inputs of each layer in both the ChebNetII and GatedGCN modules. This operation can be formally expressed by Equations 12, 13.

F_{Cheb}^{(l + 1)} = C h e b N e t I I (Y, F_{Cheb}^{(l)} ‖ F^{(M L P)}), (12)

F_{Gated}^{(l + 1)} = G a t e d G C N (Y, F_{Gated}^{(l)} ‖ F^{(M L P)}) . (13)

Deep learning models often exhibit a “preferential learning” bias toward certain samples, which can negatively affect overall performance. To address this issue and enhance both the performance and robustness of the model, we fuse the embeddings obtained from the ChebNetII and GatedGCN modules, as defined in Equation 14.

F_{FINAL} = F_{Cheb}^{(L)} ‖ F_{Gated}^{(M)} . (14)

Where $F_{Cheb}^{(L)}$ and $F_{Gated}^{(M)}$ denote the outputs of the last layer of the ChebNetII and GatedGCN networks, respectively.

2.4.4 Inner product decoder

After obtaining the final fused embeddings via Equation 14, we employed an inner product decoder to compute the predicted score for potential snoRNA-disease associations–this score quantifies the likelihood of a functional association between a given snoRNA and disease, serving as the core output of the CGSDA model for both training and inference. The calculation of the predicted score can be formalized in Equation 15:

\hat{Y} = σ (F_{FINAL} \times {(F_{FINAL})}^{T}) . (15)

where $\hat{Y}$ denotes the predicted association scores between snoRNAs and diseases. The operation $F_{FINAL} \times {(F_{FINAL})}^{T}$ computes the inner product between each pair of node embeddings–a higher inner product value indicates a stronger association likelihood. $σ (\cdot)$ denotes the sigmoid activation function, defined as $σ (x) = \frac{1}{1 + e^{- x}}$ . Its role is to normalize the inner product results to the range $(0,1)$ , converting the raw inner product values into probability-like predicted scores: a score close to one indicates a high likelihood of a snoRNA-disease association, while a score close to 0 indicates a low likelihood.

During model optimization, we used the predicted scores $\hat{Y}$ and the ground-truth association matrix $A_{train}$ to compute the binary cross-entropy loss. This loss quantifies the discrepancy between the predicted scores and the true association labels (1 for confirmed associations, 0 for unconfirmed ones), guiding the model to adjust parameters (e.g., weights in ChebNetII and GatedGCN modules) to minimize prediction errors.

2.5 Evaluation measures

Given that our work involves binary prediction in the field of snoRNA-disease association prediction and considers evaluating model performance across multiple dimensions, we employed seven widely used performance metrics. These are AUC, AUPR, F1 score (F1), Acc., Recall (Rec.), Specificity (Spe.), and Precision (Pre.). These values range between [0,1], with values closer to one indicating superior model performance. Among them, AUC quantifies a model’s ability to distinguish positive samples from negative samples; a value closer to one signifies better discrimination capability. AUPR serves as a complementary metric to AUC, typically reflecting the model’s balance between “prediction reliability” and “association capture completeness”. A value closer to one indicates superior overall performance in both aspects. The F1 score is the harmonic mean of precision and recall, aiming to balance these two metrics. A high F1 score indicates the model avoids both excessive false positives and false negatives. Precision is the proportion of correctly classified samples out of all samples, reflecting the overall prediction accuracy of the model. Recall measures the model’s ability to capture all true positive associations; high recall ensures the model does not overlook potential snoRNA-disease associations. Specificity quantifies the model’s ability to correctly identify non-associations. Complementary to recall, it ensures the model does not mislabel unconfirmed pairs as associations, thereby reducing unnecessary experimental validation burdens. Precision measures the proportion of predicted positive associations that are actually true. High precision indicates that the model’s top-ranked predictions possess high reliability.

3 Results

3.1 Parameters tuning and performance evaluation

CGSDA was implemented using PyTorch and PyG, and all experiments were conducted on an NVIDIA GeForce GTX 4060 GPU. To evaluate its predictive performance, we applied 10-fold cross-validation (10CV). Consistent with other deep learning–based association prediction methods, all experimentally verified associations were treated as positive samples, while unverified associations were regarded as negative samples during training. To mitigate the adverse effects of sample imbalance on model performance, we ensured that the number of positive and negative samples remained equal throughout the training phase.

The results of the 10-fold Cross-Validations (10CV) are detailed in Table 3. As presented in Table 3, the proposed computational method achieved an average AUC of 98.22 $%$ , AUPR of 97.19 $%$ , F1 score of 95.68 $%$ , accuracy of 95.66 $%$ , recall of 96.10 $%$ , specificity of 95.22 $%$ , and precision of 95.32 $%$ with standard deviations of 0.70 $%$ , 1.55 $%$ , 0.76 $%$ , 0.78 $%$ , 1.56 $%$ , 2.13 $%$ , and 2.03 $%$ respectively. On the MNDR dataset, the method yielded an average AUC of 98.31 $%$ , AUPR of 97.21 $%$ , F1 score of 95.58 $%$ , accuracy of 95.54 $%$ , recall of 96.63 $%$ , specificity of 94.44 $%$ , and precision of 94.60 $%$ , with respective standard deviations of 1.04 $%$ , 2.67 $%$ , 1.06 $%$ , 1.08 $%$ , 1.65 $%$ , 1.96 $%$ , and 1.79 $%$ .

Table 3

Table 3. Ten-fold cross-validation results performed by CGSDA based on SDAD and MNDR.

3.2 Comparative performance with other latest methods based on 10CV

To further evaluate the performance of CGSDA, we conducted comparative experiments. In these experiments, we selected comparative models based on three criteria: (1) methodological diversity (covering matrix processing methods, GNNs, traditional machine learning, and large language model-integrated approaches) to validate the advantages of CGSDA across different paradigms; (2) recency (all published between 2021 and 2025) to ensure alignment with current research progress; and (3) relevance (explicitly designed for snoRNA-disease association prediction rather than general non-coding RNA-disease tasks). Based on these criteria, we identified nine baseline models, with a brief description of these models provided in Table 1. This comparison was conducted using 10-fold cross-validation on both the SDAD and MNDR datasets. The performance metrics for each model are presented in Table 4. It can be observed that our CGSDA model achieved the highest performance metrics on both datasets. Specifically, on the SDAD dataset, the AUC and AUPR values reached 98.22 $%$ and 97.19 $%$ , respectively, outperforming the second-best model GBDTSVM by 2.16 $%$ . On the MNDR dataset, the AUC and AUPR values were 98.31 $%$ and 97.21 $%$ , respectively, also surpassing the second-best model GBDTSVM. This fully demonstrates the superior predictive capability of CGSDA compared to the nine baseline models. We attribute this advantage to three main factors: (1) By integrating two GNN architectures, CGSDA effectively alleviates the adverse effects of “preferential learning” biases in deep learning models. (2) The incorporation of a residual mechanism mitigates the oversmoothing problem in GNNs, preserving critical feature information and enhancing performance. (3) Adjusting the $K$ -value in the ChebNetII module enables the model to capture complex node-level information more comprehensively.

Table 4

Table 4. The comparison results of CGSDA model and other state-of-the-art models based on ten-fold cross-validation.

3.3 Parameters tuning

The parameters of our model can be categorized into four groups: (1) ChebNetII module parameters, including the number of network layers and the order of Chebyshev polynomials. (2) GatedGCN module parameters, such as the number of network layers and the sequence length of GatedGraphConv. (3) Output dimensionality of the network layers. (4) Training parameters, including learning rate, weight decay, dropout rate, and the number of training epochs. In this section, we will conduct systematic parameter tuning based on the SDAD dataset to examine the impact of the aforementioned four parameter categories on model performance and determine the optimal parameter settings for our framework.

3.3.1 Optimizable parameters in the ChebNetII module

$•$ The number of network layers for ChebNetII and GatedGCN modules. Increasing the number of network layers in the ChebNetII and GatedGCN modules can enhance model complexity, thereby enabling the extraction of higher-order node features. However, an excessive number of layers may aggravate the “oversmoothing” problem and significantly increase the computational burden. To comprehensively evaluate the impact of network depth on overall model performance, we conducted joint tuning experiments on the number of layers in both modules. Specifically, the number of network layers for ChebNetII and GatedGCN was selected from the range ${1,2, \dots, 5}$ , denoted by $M_{C}$ and $M_{G}$ , respectively. As illustrated in Figure 4A, the model achieved optimal performance when the number of layers in the ChebNetII and GatedGCN modules was set to 2 and 3, respectively. Therefore, we adopted $M_{C} = 2$ and $M_{G} = 3$ for subsequent parameter tuning experiments.

$•$ The order of Chebyshev polynomials $K$ . Increasing the order of Chebyshev polynomials enables the model to capture more complex functions when learning node embeddings. However, excessively large orders can reduce computational efficiency and increase the risk of overfitting. In this study, the $K$ values of the two-layer ChebNetII network were selected from the range 1–6. We denote the polynomial orders of the first and second layers in the ChebNetII module as $K_{1}$ and $K_{2}$ , respectively. As shown in Figure 4B, the model achieved the highest AUC when $K_{1} = 3$ and $K_{2} = 5$ .

Figure 4

Two heatmaps, labeled (A) and (B), show different parameter combinations. (A) uses parameters $ M_C $ and $ M_G $ and (B) uses $ K_2 $ and $ K_1 $. Color scales range from light yellow (low values) to dark red (high values), representing AUC scores. Specific values are displayed in each cell.

Figure 4. (A) The number of network layers in the ChebNetII and GatedGCN modules, where M_C and M_G represent the number of network layers in the ChebNetII module and GatedGCN module, respectively. (B) The order of Chebyshev polynomials, where K_1 and K_2 represent the order of the 1st layer and 2nd layer of the ChebNetII network, respectively.

3.3.2 Optimizable parameters in the GatedGCN module

$•$ The number of layers for GatedGCN module. The tuning of network layers for the GatedGCN module was carried out jointly with that of the ChebNetII module, as described in Section 3.3.1).

$•$ The sequence length of the convolutional layer $L$ . We let $L \in {1,2, \dots, 5}$ . As can be seen in Figure 5A, the model performance improves as the value of $L$ decreases and reaches an optimization at $L = 1$ , after which it starts to decline. Therefore, in the CGSDA model we make $L = 1$ .

Figure 5

Line graphs showing AUC performance. Chart A plots L against AUC, showing a decreasing trend from approximately 0.985 at L=1 to 0.965 at L=5. Chart B plots dimensions against AUC, showing fluctuations between 0.9775 and 0.9825 across dimensions 16 to 512.

Figure 5. (A) The sequence length of the convolutional layer in the GatedGCN module, denoted by L. (B) The output dimensions of two modules.

3.3.3 Optimization of network output dimensions and model training parameters

$•$ The output dimensions of two modules. To ensure proper concatenation of the embeddings generated by the ChebNetII and GatedGCN modules, we set their output dimensions to be equal. We evaluated model performance across six experiments with output dimensions ${16,32, \dots, 512}$ . As shown in Figure 5B, the model achieved its highest AUC when the output dimension was 64.

$•$ Learning rate and weight decay. We applied the same learning rate and weight decay across all network layers of CGSDA. The learning rate was selected from { $0.0001, 0.0005, \dots, 0.1$ } and the weight decay was chosen from ${0.00005, 0.0001, \dots, 0.05}$ . As shown in Figure 6A, the model achieved its highest AUC when the learning rate and weight decay were set to 0.0005 and 0.0001, respectively.

$•$ Dropout and training epochs. To ensure consistency during model training, we applied the same dropout rate across all network layers of CGSDA. The relationship between dropout and AUC is illustrated in Figure 6B. As shown, variations in dropout values caused fluctuations in model performance, with the highest AUC achieved at a dropout rate of 0.01. For training epochs, we conducted 10 experiments with epochs set to $100,200, \dots, 1000$ . From Figure 6B, we observe that AUC increased monotonically with training epochs up to 600, after which it declined. Therefore, we set the training epoch of CGSDA to 600.

Figure 6

Two line graphs labeled (A) and (B). Graph (A) plots AUC against learning rate and weight decay, showing decreasing AUC with both variables. Graph (B) plots AUC against dropout and training epochs, showing fluctuating AUC with dropout and stable AUC with training epochs. Both graphs indicate relationships between hyperparameters and model performance.

Figure 6. Impact of hyperparameters on model performance. (A) Relationship between learning rate, weight decay, and AUC. (B) Relationship between training epochs, dropout rate, and AUC.

3.4 Ablation tests

As noted earlier, our model primarily consists of two modules, ChebNetII and GatedGCN, with the residual mechanism incorporated into both. To evaluate the contribution of each component to overall performance, we designed three sets of ablation experiments based on the SDAD dataset. In the first group, CGSDA-C, the ChebNetII module was removed; in the second group, CGSDA-G, the GatedGCN module was removed; and in the third group, CGSDA-RES, the residual mechanism was removed.

We conducted ablation experiments using 10CV, and the results are summarized in Table 5. The findings show that removing individual components of the CGSDA model leads to a decline in overall performance. Specifically: (1) In comparison with CGSDA-C and CGSDA-G, the complete CGSDA model demonstrates that combining the two GNN modules effectively alleviates the negative impact of “preferential learning” on sample performance in deep learning models. (2) The inclusion of the residual mechanism significantly mitigates the oversmoothing problem in both modules, thereby enhancing overall model performance.

Table 5

Table 5. Ablation tests.

4 Case study

To validate the predictive ability of our computational framework for novel snoRNA–disease associations in real-world applications and to assess its validity and reliability, we designed two case studies. The first evaluates the model’s predictive power for previously unseen diseases, while the second examines its ability to predict novel snoRNA–disease associations. The second case study follows a similar design to GL4SDA (La Rosa et al., 2025), GBDTSVM (Muna et al., 2025), and GCASDA (Liu et al., 2024), which predict new associations while retaining all existing snoRNA–disease pairs. In this work, we selected “lung cancer (DOID:1324)” and “breast cancer (DOID:1612)” for the first and second case studies, respectively. In the first case study, all snoRNAs associated with the target disease were removed before training, forcing the model to treat the disease as a new entity and predict its associations with all snoRNAs. In the second case study, only the unknown snoRNA–disease associations were treated as the test set. For both case studies, we ranked the prediction scores generated by the CGSDA model and selected the top 15 associations with the highest scores. These candidate associations were then validated through a literature search in the PubMed Biology database using the corresponding snoRNA–disease pairs as keywords. If an association was confirmed, it was annotated with the PMID of the supporting publication; otherwise, it was labeled as Unconfirmed. It should be noted that associations confirmed in the literature can only be used as evidence if they have been validated by experiments.

Lung cancer is a highly malignant disease characterized by poor treatment outcomes and high mortality worldwide. The prognosis depends heavily on the stage at diagnosis: patients with early-stage lung cancer have relatively high 5-year survival rates, whereas those diagnosed at advanced stages face a 5-year survival rate close to $0 %$ . However, because early-stage lung cancer is typically asymptomatic, timely detection is challenging. Current screening methods rely primarily on medical imaging, which may result in missed diagnoses due to both limited visibility in early disease and dependence on the physician’s expertise. Consequently, identifying novel biomarkers for the early detection of lung cancer is of significant clinical importance. SnoRNAs have recently emerged as promising candidates in this regard. As shown in Table 6, 10 out of the top 15 snoRNAs predicted by the CGSDA model to be associated with lung cancer were validated by evidence from the literature.

Table 6

Table 6. The top 15 predicted snoRNAs associated with lung cancer and breast cancer.

Breast cancer is a common malignant tumor among women, though it also affects approximately 1 $%$ of men (Jameson et al., 2020). While the etiology of breast cancer is not yet fully understood, it is well established that factors such as long-term improper diet, radiation exposure, heredity, and genetic mutations significantly increase risk. Epidemiological studies indicate that breast cancer remains the leading cause of cancer-related deaths among women in many countries, with lower mortality in economically developed regions likely due to early diagnosis and comprehensive treatment strategies (Sancho-Garnier and Colonna, 2019). In-depth research on breast cancer is therefore critical for elucidating its pathogenesis and risk factors, ultimately supporting early screening, effective treatment, and improved prognosis. As shown in Table 6, 12 of the top 15 snoRNAs predicted by our model to be associated with breast cancer have been validated in the literature.

5 Discussion and conclusion

snoRNAs, an important class of small ncRNAs, have attracted significant research attention over the past 2 decades. Studies of their functions have expanded from their initial roles in rRNA processing and modification to broader areas including disease mechanism regulation, cellular homeostasis, and targeted therapies. With ongoing research, snoRNAs are increasingly recognized as potential sources of novel therapeutic agents for cancer, neurodegenerative disorders, endocrine diseases, and cardiovascular conditions. Accordingly, there is an urgent need for efficient, cost-effective, and environmentally independent methods to study snoRNAs, which has driven the widespread adoption of computational approaches in this field. In this study, we developed CGSDA, a model that integrates the ChebNetII Convolutional Network (ChebII) module with the Gated GCN (Gated) module. The model first constructs a snoRNA–disease association network and incorporates a residual mechanism within both modules to learn the representations of snoRNAs and diseases. These node embeddings are subsequently fused and downscaled before being input into a predictor to infer potential snoRNA–disease associations. Experimental results demonstrate that CGSDA achieves superior predictive performance compared to baseline models. Furthermore, ablation experiments reveal that each component of the model makes a significant contribution to its overall performance, fully validating the effectiveness of the proposed framework.

In the comparative experiments, the CGSDA model exhibited excellent performance across multiple metrics with small standard deviations, demonstrating significant advantages over baseline models and confirming the good robustness of CGSDA. In the two types of case studies conducted, 10 and 12 out of the top 15 associations predicted by our model were validated by existing literature, respectively. This showcases the effectiveness of CGSDA in identifying biologically meaningful snoRNA-disease associations, indicating its great potential for guiding downstream experimental research.

Although CGSDA exhibits outstanding performance and practicality, it also has the following limitations: (1) The model is suitable for binary association prediction, and its predictive performance for multi-type associations requires further evaluation. (2) Although the nRC tool can integrate the secondary structure information of snoRNAs, it does not cover other key biological features of snoRNAs (such as expression levels, subcellular localization, and interactions with other RNAs), which may lead to the omission of important clues affecting association prediction. (3) Additional wet experiments are required to validate the predicted results. Based on this, in future work, we will expand the application of the CGSDA model in the field of multi-type association prediction and collaborate with medical schools to conduct wet experimental validation of the predicted results.

We attribute the superior performance of CGSDA to three main factors. First, by integrating the ChebNetII and GatedGCN modules, the model effectively mitigates the negative impact of “preference learning” inherent in deep learning models, as confirmed by our ablation experiments. Second, the incorporation of a residual mechanism helps reduce the “oversmoothing” phenomenon, further enhancing model performance. Third, adjusting the $K$ value in the ChebNetII module allows the framework to capture complex node information more comprehensively, contributing to its predictive Acc.

Data availability statement

All the data and corresponding codes are available at https://github.com/cuntjx/CGSDA.

Author contributions

YZ: Data curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review and editing. YL: Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Writing – original draft, Writing – review and editing. SL: Data curation, Methodology, Visualization, Writing – original draft, Writing – review and editing. ZW: Visualization, Writing – review and editing. LL: Writing – review and editing. SL: Writing – review and editing. TZ: Writing – review and editing. YZ: Writing – review and editing. RM: Writing – review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the National Social Science Fund Project (17XJY004) and the Hechi College Research Project (2023XJPT007).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Anderson, J. R., Jacobsen, S., Walters, M., Bundgaard, L., Diendorfer, A., Hackl, M., et al. (2022). Small non-coding rna landscape of extracellular vesicles from a post-traumatic model of equine osteoarthritis. Front. Veterinary Sci. 9, 901269. doi:10.3389/fvets.2022.901269

PubMed Abstract | CrossRef Full Text | Google Scholar

Babiarz, J. E., Ruby, J. G., Wang, Y., Bartel, D. P., and Blelloch, R. (2008). Mouse es cells express endogenous shrnas, sirnas, and other microprocessor-independent, dicer-dependent small rnas. Genes and Dev. 22, 2773–2785. doi:10.1101/gad.1705308

PubMed Abstract | CrossRef Full Text | Google Scholar

Chabronova, A., Holmes, T. L., Hoang, D. M., Denning, C., James, V., Smith, J. G., et al. (2024). Snornas in cardiovascular development, function, and disease. Trends Mol. Med. 30, 562–578. doi:10.1016/j.molmed.2024.03.004

PubMed Abstract | CrossRef Full Text | Google Scholar

Chor, B., Horn, D., Goldman, N., Levy, Y., and Massingham, T. (2009). Genomic dna k-mer spectra: models and modalities. Genome Biol. 10, R108–R110. doi:10.1186/gb-2009-10-10-r108

PubMed Abstract | CrossRef Full Text | Google Scholar

Defferrard, M., Bresson, X., and Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. Adv. neural Inf. Process. Syst. 29.

Google Scholar

Ender, C., Krek, A., Friedländer, M. R., Beitzinger, M., Weinmann, L., Chen, W., et al. (2008). A human snorna with microrna-like functions. Mol. Cell 32, 519–528. doi:10.1016/j.molcel.2008.10.017

PubMed Abstract | CrossRef Full Text | Google Scholar

Fiannaca, A., La Rosa, M., La Paglia, L., Rizzo, R., and Urso, A. (2017). Nrc: non-coding rna classifier based on structural features. BioData Min. 10, 27–18. doi:10.1186/s13040-017-0148-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Ganot, P., Bortolin, M.-L., and Kiss, T. (1997). Site-specific pseudouridine formation in preribosomal rna is guided by small nucleolar rnas. Cell 89, 799–809. doi:10.1016/s0092-8674(00)80263-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Hu, X., Zhang, P., Liu, D., Zhang, J., Zhang, Y., Dong, Y., et al. (2024). Igcnsda: unraveling disease-associated snornas with an interpretable graph convolutional network. Briefings Bioinforma. 25, bbae179. doi:10.1093/bib/bbae179

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Z.-h., Du, Y.-p., Wen, J.-t., Lu, B.-f., and Zhao, Y. (2022). Snornas: functions and mechanisms in biological processes, and roles in tumor pathophysiology. Cell Death Discov. 8, 259. doi:10.1038/s41420-022-01056-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Jameson, J. L., Fauci, A. S., Kasper, D. L., Hauser, S. L., Longo, D. L., and Loscalzo, J. (2020). Medicina Interna de Harrison-2 Volumes-20. Brasil: McGraw Hill.

Google Scholar

Kipf, T. N., and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

Google Scholar

Kiss-László, Z., Henry, Y., Bachellerie, J.-P., Caizergues-Ferrer, M., and Kiss, T. (1996). Site-specific ribose methylation of preribosomal rna: a novel function for small nucleolar rnas. Cell 85, 1077–1088. doi:10.1016/s0092-8674(00)81308-2

PubMed Abstract | CrossRef Full Text | Google Scholar

La Rosa, M., Fiannaca, A., Mendolia, I., La Paglia, L., and Urso, A. (2025). Gl4sda: predicting snorna-disease associations using gnns and llm embeddings. Comput. Struct. Biotechnol. J. 27, 1023–1033. doi:10.1016/j.csbj.2025.03.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2015). Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493.

Google Scholar

Li, C., Qin, M., Xiao, S., Chen, J., Luo, K., Shao, Y., et al. (2024). Making text embedders few-shot learners. arXiv preprint arXiv:2409.15700.

Google Scholar

Liu, D., Luo, Y., Zheng, J., Xu, H., Zhang, J., and Deng, L. (2021). “Gcnsda: predicting snorna-disease associations via graph convolutional network,” in 2021 IEEE international conference on bioinformatics and biomedicine (BIBM) (IEEE), 183–188.

CrossRef Full Text | Google Scholar

Liu, Y., Zhao, C., Wang, G., Chen, J., Ju, S., Huang, J., et al. (2022). Snord1c maintains stemness and 5-fu resistance by activation of wnt signaling pathway in colorectal cancer. Cell Death Discov. 8, 200. doi:10.1038/s41420-022-00996-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, S., Zhu, W., Wang, P., Yu, S., and Wu, F. (2024). Identifying associations between small nucleolar rnas and diseases via graph convolutional network and attention mechanism. IEEE J. Biomed. Health Inf. 28, 7647–7658. doi:10.1109/JBHI.2024.3424848

PubMed Abstract | CrossRef Full Text | Google Scholar

Lu, S., Liang, Y., Li, L., Liao, S., and Ouyang, D. (2022). Inferring human mirna–disease associations via multiple kernel fusion on gcnii. Front. Genet. 13, 980497. doi:10.3389/fgene.2022.980497

PubMed Abstract | CrossRef Full Text | Google Scholar

Momanyi, B. M., Zhou, Y.-W., Grace-Mercure, B. K., Temesgen, S. A., Basharat, A., Ning, L., et al. (2024). Sagesda: multi-Graphsage networks for predicting snorna-disease associations. Curr. Res. Struct. Biol. 7, 100122. doi:10.1016/j.crstbi.2023.100122

PubMed Abstract | CrossRef Full Text | Google Scholar

Muna, U. M., Hafiz, F., Biswas, S., and Azim, R. (2025). Gbdtsvm: combined support vector machine and gradient boosting decision tree framework for efficient snorna-disease association prediction. Comput. Biol. Med. 192, 110219. doi:10.1016/j.compbiomed.2025.110219

PubMed Abstract | CrossRef Full Text | Google Scholar

Omonijo, F. A., Gagné, D., Laterrière, M., Ster, C., Zhao, X., and Ibeagha-Awemu, E. M. (2023). Psvi-7 snorna dysregulated expression and potential roles in bovine sub-clinical mastitis due to Staphylococcus chromogenes. J. Animal Sci. 101, 409–410. doi:10.1093/jas/skad281.486

CrossRef Full Text | Google Scholar

Ono, M., Yamada, K., Avolio, F., Scott, M. S., van Koningsbruggen, S., Barton, G. J., et al. (2010). Analysis of human small nucleolar rnas (Snorna) and the development of snorna modulator of gene expression vectors. Mol. Biol. Cell 21, 1569–1584. doi:10.1091/mbc.e10-01-0078

PubMed Abstract | CrossRef Full Text | Google Scholar

Ouyang, D., Liang, Y., Wang, J., Li, L., Ai, N., Feng, J., et al. (2024). Hgclamir: hypergraph contrastive learning with attention mechanism and integrated multi-view representation for predicting mirna-disease associations. PLOS Comput. Biol. 20, e1011927. doi:10.1371/journal.pcbi.1011927

PubMed Abstract | CrossRef Full Text | Google Scholar

Rappaport, N., Nativ, N., Stelzer, G., Twik, M., Guan-Golan, Y., Iny Stein, T., et al. (2013). Malacards: an integrated compendium for diseases and their annotation. Database 2013, bat018. doi:10.1093/database/bat018

PubMed Abstract | CrossRef Full Text | Google Scholar

Regårdh, S., Jacobsen, S., Bundgaard, L., Walters, M., Anderson, J., Castanheira, C., et al. (2022). Synovial fluid snornas metanalysis in equine osteoarthritis. Osteoarthr. Cartil. 30, S351. doi:10.1016/j.joca.2022.02.471

CrossRef Full Text | Google Scholar

Sancho-Garnier, H., and Colonna, M. (2019). Breast cancer epidemiology. Presse Medicale 48, 1076–1084. doi:10.1016/j.lpm.2019.09.022

PubMed Abstract | CrossRef Full Text | Google Scholar

Schriml, L. M., Arze, C., Nadendla, S., Chang, Y.-W. W., Mazaitis, M., Felix, V., et al. (2012). Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 40, D940–D946. doi:10.1093/nar/gkr972

PubMed Abstract | CrossRef Full Text | Google Scholar

Shen, L.-P., Zhang, W.-C., Deng, J.-R., Qi, Z.-H., Lin, Z.-W., and Wang, Z.-D. (2024). Advances in the mechanism of small nucleolar rna and its role in dna damage response. Mil. Med. Res. 11, 53. doi:10.1186/s40779-024-00553-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, Z., Huang, Q., Yang, Y., Li, S., Lv, H., Zhang, Y., et al. (2022). Psnod: identifying potential snorna-disease associations based on bounded nuclear norm regularization. Briefings Bioinforma. 23, bbac240. doi:10.1093/bib/bbac240

PubMed Abstract | CrossRef Full Text | Google Scholar

Taft, R. J., Glazov, E. A., Lassmann, T., Hayashizaki, Y., Carninci, P., and Mattick, J. S. (2009). Small rnas derived from snornas. Rna 15, 1233–1240. doi:10.1261/rna.1528909

PubMed Abstract | CrossRef Full Text | Google Scholar

Tahara, S., Tsuchiya, T., Matsumoto, H., and Ozaki, H. (2023). Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory snps in humans. BMC Genomics 24, 597. doi:10.1186/s12864-023-09692-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Tang, G., Zeng, Z., Sun, W., Li, S., You, C., Tang, F., et al. (2019). Small nucleolar rna 71a promotes lung cancer cell proliferation, migration and invasion via mapk/erk pathway. J. Cancer 10, 2261–2275. doi:10.7150/jca.31077

PubMed Abstract | CrossRef Full Text | Google Scholar

Vinje, H., Liland, K. H., Almøy, T., and Snipen, L. (2015). Comparing k-mer based methods for improved classification of 16s sequences. BMC Bioinforma. 16, 205–213. doi:10.1186/s12859-015-0647-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, G., Yang, F., Ding, C.-L., Zhao, L.-J., Ren, H., Zhao, P., et al. (2014). Small nucleolar rna 113–1 suppresses tumorigenesis in hepatocellular carcinoma. Mol. Cancer 13, 216–14. doi:10.1186/1476-4598-13-216

PubMed Abstract | CrossRef Full Text | Google Scholar

Xuan, P., Han, K., Guo, M., Guo, Y., Li, J., Ding, J., et al. (2013). Prediction of micrornas associated with human diseases based on weighted k Most similar neighbors. PloS one 8, e70204. doi:10.1371/journal.pone.0070204

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, W., and Liu, B. (2022). isnodi-lsgt: identifying snorna-disease associations based on local similarity constraints and global topological constraints. Rna 28, 1558–1567. doi:10.1261/rna.079325.122

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L., Chen, M., Hu, X., and Deng, L. (2023). Graph convolutional network and contrastive learning small nucleolar rna (Snorna) disease associations (gclsda): predicting snorna–disease associations via graph convolutional network and contrastive learning. Int. J. Mol. Sci. 24, 14429. doi:10.3390/ijms241914429

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: disease-snoRNA association, GNNS, ChebNetII convolutional network, gated graph convolutional network, multiplefeatures fusion, residual mechanism

Citation: Zou Y, Lu Y, Lu S, Wei Z, Li L, Liao S, Zeng T, Zhang Y and Miao R (2025) CGSDA: inferring snoRNA-disease associations via ChebNetII and GatedGCN. Front. Genet. 16:1684484. doi: 10.3389/fgene.2025.1684484

Received: 13 August 2025; Accepted: 06 October 2025;
Published: 16 October 2025.

Edited by:

Massimo La Rosa, National Research Council (CNR), Italy

Reviewed by:

Antonino Fiannaca, National Research Council (CNR), Italy
Ilaria Cosentini, Consiglio Nazionale delle Ricerche (CNR), Italy

Copyright © 2025 Zou, Lu, Lu, Wei, Li, Liao, Zeng, Zhang and Miao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yusong Lu, bHV5dXNvbmdreUAxNjMuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.