- School of Informatics, Hunan University of Chinese Medicine, Changsha, China
MicroRNAs (miRNAs) play a crucial role in regulating gene expression, and their subcellular localization is essential for understanding their biological functions. However, accurately predicting miRNA subcellular localization remains a challenging task due to their short sequences, complex structures, and diverse functions. To improve prediction accuracy, this study proposes a novel model based on a graph transformer and a multi-head attention mechanism. The model integrates multi-source features which include the miRNA sequence similarity network, miRNA functional similarity network, miRNA–mRNA association network, miRNA–drug association network, and miRNA–disease association network. Specifically, we first apply the node2vec algorithm to extract features from these biological networks. Then, we use a graph transformer to capture relationships between nodes within the networks, enabling a better understanding of miRNA functions across different biological contexts. Next, a multi-head attention mechanism is implemented to combine miRNA features from multiple networks, allowing the model to capture deeper feature relationships and enhance prediction performance. Performance evaluation shows that the proposed method achieves significant improvements over current approaches on open-access datasets, achieving high performance with an AUC (area of receiver operating characteristic curve) of 0.9108 and AUPR(area of precision-recall curve) of 0.8102. It not only significantly improves prediction accuracy but also exhibits strong generalization and stability.
1 Introduction
MicroRNAs (miRNAs) are a class of small non-coding RNAs widely distributed in eukaryotic cells, typically around 22 nucleotides in length. They mainly regulate gene expression through the post-transcriptional processes Hombach and Kretz (2016); Holley and Topkara (2011). In organisms, miRNAs bind to specific target sites on mRNAs, causing their subsequent degradation or translational inhibition, thereby modulating key fundamental physiological processes like cell proliferation, differentiation, apoptosis, and immune system activation Bartel (2009). Recent studies have shown that miRNAs have indispensable functions in a variety of human diseases, including cancer, neurodegenerative disorders, and cardiovascular diseases Li et al. (2023); Dugger and Dickson (2017). They also show great potential in drug response prediction, resistance mechanisms, and therapeutic target discovery Miska (2007); Small and Olson (2011). In this context, studying the subcellular localization of miRNAs is of great significance for understanding their regulatory networks and functional mechanisms Kabekkodu et al. (2018); Catalanotto et al. (2016); Gurtan and Sharp (2013). Different subcellular localizations often suggest that miRNAs are involved in distinct biological processes. Accurate localization prediction not only facilitates the understanding of functional diversification of miRNAs but also provides theoretical support for early disease diagnosis and targeted therapy Jie et al. (2021). Although conventional experimental methods, such as fluorescence in situ hybridization and subcellular fractionation combined with high-throughput sequencing, can directly determine miRNA distributions, these techniques are often complex, expensive, and lack scalability for large-scale samples Thomson and Dinger (2016). Therefore, developing computational methods to efficiently predict miRNA subcellular localization has become a focal point in bioinformatics research. Currently, researchers have developed various machine learning models based on sequence information to explore potential miRNA localization patterns. For example, Huang et al. (2007) proposed a prediction framework combining k-mer frequency patterns with a Support Vector Machine (SVM) classifier, showing the feasibility of using sequence information for localization recognition. However, due to the short length and complex structure of miRNAs, as well as their heterogeneity across different tissues or disease states, models relying solely on sequence-level features often fail to capture the complete biological semantics, resulting in limited accuracy and generalization Li et al. (2014). To address this, some studies have incorporated biological network information, such as the miRNA-mRNA interaction network Hsu et al. (2011), the miRNA-disease association network Jiang et al. (2010), and the miRNA-drug association network Chen H. et al. (2019), to improve prediction accuracy. For instance, Xie et al. constructed a miRNA-target gene interaction network using Graph Convolutional Networks (GCNs) and applied deep learning to predict miRNA functions within cells Guan et al. (2022). Li et al. integrated miRNA, disease, and drug information through a heterogeneous network and used graph embedding techniques for feature learning Sun et al. (2020). Over the past few years, deep learning innovations have significantly advanced bioinformatics research. Convolutional Neural Networks (CNNs) have been used to retrieve sequence features of miRNAs—such as in the DeepMirTar model, which utilized CNNs to improve target gene prediction accuracy Wen et al. (2018). Recurrent Neural Networks (RNNs) have also been employed to capture sequential dependencies, as seen in MirLocNet, which uses Long Short-Term Memory (LSTM) networks to computationally infer miRNA subcellular localization Chen Q. et al. (2019). Graph Neural Networks (GNNs) are widely applied in modeling biological networks. As proposed by Gao et al. (2022), a novel Graph Attention Networks (GATs) combined with biological network information to improve miRNA function prediction Zhao et al. (2022).
In the field of miRNA subcellular localization, researchers have developed various models to enhance prediction accuracy and biological interpretability. MiRLoc Xu et al. (2022), for instance, inferred miRNA spatial distribution by leveraging known mRNA localization and their interaction with miRNAs, reflecting their role in post-transcriptional regulation. MirLocPredictor Asim et al. (2020) incorporated CNNs and positional encoding of k-mers to enhance sequence representation for multi-label localization tasks. DAmirLocGNet Bai et al. (2023a) integrated Graph Convolutional Networks and autoencoders to jointly model miRNA sequence features, disease associations, and disease semantic networks, learning high-level representations from complex graph structures. Some existing excellent models provide us with references. For example, Wang X.-F. et al. (2024) proposed a multi-channel graph neural network framework that integrates multimodal similarity information with hypergraph contrastive learning, effectively identifying novel cancer biomarkers. Wang X.- et al. (2024) designed a directed graph neural network-based multi-view learning model capable of systematically extracting regulatory feature signals from multiple biological layers, enhancing the model’s representational power. Additionally, Wang et al. (2022) developed KGDCMI, a method that integrates multi-source biological information with deep learning techniques to accurately predict interactions between circRNA and miRNA. Comparatively, PMiSLocMF Chen et al. (2024) fused heterogeneous data such as miRNA-mRNA, miRNA-drug, and miRNA-disease networks using a graph attention mechanism, achieving robust performance even in scenarios with sparse data or incomplete labels. Despite improvements, present architectures still face challenges such as inadequate information integration and underutilization of multi-head feature relationships. Effectively integrating multi-source information and building more expressive feature representations to improve miRNA subcellular localization prediction remains an urgent and critical problem.
To overcome these limitations, this study proposes a novel miRNA subcellular localization prediction model named GTMALoc, based on graph transformer and multi-head attention mechanisms. This approach effectively incorporates miRNA sequence information and their roles across different biological networks to improve prediction performance. Specifically, we first extract miRNA features from multiple biological networks–including miRNA sequence similarity, miRNA-mRNA associations, miRNA-disease associations, and miRNA-drug associations—using node2vec. Then, a graph transformer framework is applied to infer latent node correlations, offering better insight into miRNA functionality in different contexts. A multi-head attention mechanism is subsequently employed to integrate miRNA features across networks, capturing deeper, multi-head relational patterns and enhancing predictive performance. The evaluations show that our model outperforms mainstream methods in terms of accuracy, generalization, and stability on public datasets, demonstrating its effectiveness and feasibility in the miRNA subcellular localization task. Key improvements over existing methods provided by this study are:
(1) We propose a new miRNA subcellular localization prediction model that leverages graph transformer and multi-head attention mechanisms to integrate multi-source biological network information.
(2) Complex relationships within biological networks are modeled using node2vec and graph transformer to improve high-dimensional representations of miRNA features.
(3) A multi-head attention mechanism is employed to fuse heterogeneous network information, thereby strengthening inter-feature relationships and improving the prediction accuracy and generalization ability of model.
2 Materials and methods
2.1 Datasets
The dataset used in this study is sourced from version 2.0 of the RNALocate database Cui et al. (2022), which systematically compiles a large number of experimentally support RNA subcellular localization records. From this database, we select a subset containing 1,041 miRNAs to construct and evaluate our model. To ensure biological consistency, all select miRNAs are included in the miRNA functional similarity network established in the MiRLoc Xu et al. (2022) study, facilitating the exploration of potential functional associations. In terms of localization annotation, these miRNAs are assigned to seven subcellular compartments: cytoplasm, nucleus, nucleolus, mitochondrion, exosome, microvesicle, and extracellular vesicle. The specific numbers are as follows: 870 exosomes, 825 microvesicle, 499 nucleus, 308 cytoplasms, 259 mitochondrion, 102 extracellular vesicle, and 67 nucleolus. This categorization not only covers the major cellular structures where miRNAs may reside but also reflects their diverse roles in intracellular and intercellular communication, providing a rich and challenging dataset for multi-label classification tasks.
2.2 Methods
In this study, we develop a multi-source feature fusion model, GTMALoc, for miRNA subcellular localization prediction, aiming to comprehensively capture miRNA characteristics in various biological contexts. This process is illustrated in Figure 1. First, we extract structural features from several biological networks, including the miRNA sequence similarity network, miRNA–mRNA regulatory network, miRNA–disease association network, and miRNA–drug interaction network. To preserve both local and global structural information within each network, we apply the node2vec algorithm to perform embedding learning on these heterogeneous graphs, thereby obtaining a representation vector for each miRNA under different semantic relations—reflecting its functional characteristics in diverse biological environments. Next, we utilize the graph transformer model to process the graph embedding features. Leveraging its built-in structural awareness and self-attention mechanism, the model captures complex and variable dependencies among nodes, enabling a deeper understanding of miRNA behavior and influence across different networks. To achieve effective multi-source information fusion, we further introduce a multi-head attention mechanism to align and integrate miRNA representations from various networks. This allows the model to automatically uncover important cross-network interactions and latent high-level semantic relationships. The fusion strategy not only enhances the model’s sensitivity to critical features but also significantly improves overall prediction accuracy and generalization performance.
3 miRNA networks
3.1 miRNA sequence similarity network and miRNA functional similarity network
All miRNA sequence data are obtained from the authoritative database miRBase (version 22) Kozomara et al. (2019), which provides experimentally validated miRNA sequences from humans and other species, and is widely used in miRNA research. To construct the miRNA sequence similarity network, we employ the Smith–Waterman algorithm Smith and Waterman, (1981), a classical local sequence alignment technique that precisely evaluates the similarity between two miRNA sequences in terms of base composition and order. Specifically, the algorithm uses dynamic programming to find optimal local alignments based on base matches, mismatches, and gap penalties, thereby computing a similarity score for each miRNA pair (Equation 1).
where
let
The challenge of similarity underestimation arising from disease set sparsity is resolved through linear combination with miRNA GIP kernel similarity, generating robust functional similarity estimates (Equation 5).
where
The resulting adjacency matrix
3.2 miRNA-mRNA association network
The miRNA–mRNA regulatory network in this study is primarily based on data from the authoritative miRTarBase (2020 version) Huang et al. (2020), supplemented by a curated dataset of validated interactions compiled by Xu et al. (2022). miRTarBase is known for its high-quality data, integrating miRNA–target gene interactions supported by both low- and high-throughput experimental evidence, such as reporter gene assays, qRT-PCR, and Western blot. The constructed network contains 8,254 high-confidence regulatory relationships between 1,041 non-coding miRNAs and 2,836 protein-coding genes.
3.3 miRNA-drug association network
The miRNA–drug association network is based on data from ncDR, a drug resistance research database Dai et al. (2017), which collects experimentally verified and predicted interactions between non-coding RNAs and drugs. The data are standardized as follows: First, the 1,041 miRNAs involved in previous studies are matched based on miRBase nomenclature. Second, only interactions with clearly annotated drug resistance evidence (including preclinical or cell line experiments) are retained. This results in 3,305 high-confidence miRNA–drug interactions involving 130 commonly used clinical drugs, such as cisplatin and gefitinib.
3.4 miRNA-diease association network
To construct the required network, this study references the dataset from HMDD v3.2 Bai et al. (2023b), a widely-used human microRNA disease database. After curation and filtering, 15,547 miRNA–disease association pairs are obtained, covering 1,041 miRNAs and 640 human diseases.
4 Node2vec algorithm
Network modeling has emerged as a pivotal paradigm in biomedical research due to its intuitive representation of complex relationships, particularly in systematic miRNA analysis involving multimodal correlations. This study integrates four critical biological networks: the miRNA sequence similarity network (quantifying functional conservation), the miRNA–disease association network (revealing pathological regulation), the miRNA–drug interaction network (reflecting therapeutic targeting), and the miRNA–mRNA regulatory network (decoding genetic circuitry). To effectively capture topological features from these non-Euclidean spatial data, we employ the node2vec algorithm Grover and Leskovec (2016), a graph embedding approach based on adaptive random walk strategies. By tuning search parameters—the return parameter p controlling local neighborhood sampling and the in-out parameter q governing global structural exploration—this approach generates semantically preserved node sequences, subsequently vectorized through Skip-Gram modeling. Notably, we implement dimension-specific embedding strategies tailored to distinct network characteristics: 64-dimensional representations in sequence similarity networks to resolve fine-grained patterns of conserved functional motifs, versus 128-dimensional high-capacity embeddings in the three heterogeneous association networks to capture complex multi-hop interactions. This hierarchical embedding mechanism simultaneously reduces feature redundancy while preserving network-specific information, establishing an interpretable mathematical foundation for subsequent multi-view feature fusion.
5 Graph transformer
This study proposes a structure-aware graph neural network, the graph transformer, to learn high-quality node embeddings from graph structures. Unlike traditional GNNs, which struggle with sparse or heterogeneous structures, the graph transformer incorporates multi-head attention and a structure reconstruction loss, enabling better modeling of local and global graph information. First, the input miRNA functional similarity matrix and association matrix are feature fused. After generating the node feature matrix
Here,
where
In order to enhance the model’s ability to express structural information, graph transformer also introduces structural reconstruction loss as the training goal. In the unsupervised setting, the model scores the node pairs of the real edges in the input graph, and defines the structural reconstruction similarity as (Equation 10):
where
where
6 Multi-head attention mechanism
To capture cross-modal dependencies and interactions among various biological features, a Multi-Head Attention (MHA) module is introduced as the core feature interaction component in the fusion model. Based on the transformer encoder, MHA computes attention across multiple subspaces in parallel to enhance local and global correlation modeling. Four input feature types—miRNA sequence, drug features, mRNA features, and disease features are first projected to a common 128-dimensional space using fully connected layers with L2 regularization. These are concatenated and reshaped into a 2D sequence format before being passed into the MHA module. The Multi-Head Attention Mechanism is calculated as follows (Equations 12, 13):
The outputs of all heads are concatenated and transformed, the formula is as follows (Equations 14, 15):
To further improve the representation capability, the multi-head attention output will be delivered through two layers of Feed-Forward Network (FFN), the formula is as follows (Equations 16, 17):
The final output represents the fused multimodal semantic embedding features, which are used as the input of the subsequent self-supervised learning projection head and the multi-label classification header, which not only retains the information of the original modal features, but also integrates the high-order correlation between them.
7 Prediction of miRNA subcellular localization
During forward propagation, the fused high-dimensional features are processed through the MHA and FFN modules. Finally, the classification head maps the features to predicted subcellular localization probabilities (Equation 18):
where
8 Results
8.1 10-Fold cross-validation
In our experiments, we employ 10-fold cross-validation to comprehensively assess the generalization ability of the model. The dataset is randomly shuffled and evenly divided into 10 subsets, each fold rotation assigned one decile to testing and nine to training, ensuring comprehensive parameter optimization. After training, the model generates predicted probabilities for each class on the test set, which are then mapped to the [0,1] range using the Sigmoid activation function and binarized with a threshold of 0.5. For each fold, we calculate the Area Under the ROC Curve (AUC) and the Area Under the Precision-Recall Curve (AUPR) as evaluation metrics, and record the results for each class. As shown in Figures 2, 3, our model achieves an average AUC of 0.9108 and an average AUPR of 0.8102 on the multi-label subcellular localization task, fully demonstrating the model’s effectiveness and robustness in capturing multimodal features and their high-order interactions.
8.2 Comparative experiments
To comprehensively evaluate the performance of the GTMALoc model, we use both 5-fold and 10-fold cross-validation strategies and systematically compare it with four existing methods (MiRLoc, MirLocPredictor, DAmiRLocGNet, and PMiSLocMF). The evaluation metrics include AUC and AUPR to thoroughly assess the model’s effectiveness.
As shown in Table 1, GTMALoc achieves an average AUC score of 0.9094 under 5-fold cross-validation, outperforming the other methods across most subcellular localization categories. It performs particularly well in structurally complex or sparsely connected categories such as cytoplasm (0.9240), extracellular vesicle (0.9115), and microvesicle (0.9113), demonstrating its strength in integrating high-dimensional heterogeneous information and modeling complex relationships. Table 2 presents the comparison based on AUPR, which primarily reflects the model’s robustness in class-imbalanced scenarios. GTMALoc also achieves the highest average AUPR of 0.8044, showing excellent performance in critical functional regions such as exosome (0.9900), nucleus (0.9248), and microvesicle (0.9900). Although the score slightly decreases in the nucleolus (0.5142), where signals are sparse, the overall performance remains superior.

Table 1. AUC Performance Comparison of miRNA Subcellular Localization Models Based on 5-Fold Cross-Validation.

Table 2. AUPR Performance Comparison of miRNA Subcellular Localization Models Based on 5-Fold Cross-Validation.
Furthermore, as indicated in Table 3, GTMALoc’s average AUC increases to 0.9108 under 10-fold cross-validation, further confirming the model’s stability and generalization across different data splits. Its outstanding performance in categories such as nucleolus, cytoplasm, and exosome highlights its ability to accurately identify miRNAs localized in these regions. This advantage is largely attributed to the multi-head attention mechanism, which effectively captures complex sequence patterns and graph-structured information. As shown in Table 4, although GTMALoc’s AUPR scores for some sub-tasks are slightly lower than those of PMiSLocMF, the overall average AUPR reaches 0.8102, demonstrating strong resilience to data imbalance. We observe significant performance differences across localization categories: exosome and microvesicle achieve near-perfect AUPR scores, indicating successful recognition of key regional features, while performance in nucleolus and extracellular vesicle is relatively lower, likely due to insufficient positive samples and data sparsity. This suggests that future work should focus on improving data quality or adopting sample augmentation strategies to enhance performance in low-signal categories. Overall, the experimental results demonstrate that GTMALoc consistently exhibits strong predictive power and generalization ability under various cross-validation strategies, confirming its feasibility and practicality as a reliable tool for miRNA subcellular localization prediction.

Table 3. AUC Performance Comparison of miRNA Subcellular Localization Models Based on 10-Fold Cross-Validation.

Table 4. AUPR Performance Comparison of miRNA Subcellular Localization Models Based on 10-Fold Cross-Validation.
8.3 Ablation study
To validate the contribution of each submodule within the overall architecture, we conduct detailed ablation studies by sequentially removing key components of the model and observing the resulting AUC performance on the multi-label subcellular localization task. As shown in Figure 4, we design five ablation settings: removing the miRNA-disease association network, removing the miRNA-drug interaction network, removing the miRNA-mRNA regulatory network, removing the graph transformer module, and removing the multi-head attention mechanism. All other modules remain unchanged across experiments to ensure a consistent model structure and fair evaluation. Each module contributes positively to the model’s overall performance, especially the graph transformer and multi-head attention modules, which play crucial roles in capturing high-order cross-modal interactions and both local and global structural features.
8.4 Parameter study
To investigate the effect of the number of attention heads on model performance, we systematically evaluate different configurations (2, 4, and eight heads) on the validation set, as shown in Figure 5. A grid search strategy fixes other hyperparameters while varying the number of attention heads to observe sensitivity in the AUC metric. Results show that the model achieves peak performance (AUC = 0.9108) when the number of heads is set to 4, outperforming the 2-head and 8-head configurations by 0.0005 and 0.0006, respectively. This can be attributed to two main factors: (1) a moderate number of heads helps capture complementary interaction patterns in parallel subspaces, enhancing the model’s ability to fuse features across networks; (2) exceeding the optimal number of heads leads to redundancy in attention weights and an increased risk of local overfitting. Therefore, we adopt the 4-head configuration in the final architecture, balancing computational efficiency and predictive accuracy.
9 Case studies
To further demonstrate the practical utility of GTMALoc in predicting miRNA subcellular localization, we conduct case studies across seven subcellular categories: cytoplasm, exosomes, nucleolus, nucleus, extracellular vesicles, microvesicles, and mitochondrion. For each compartment, we select the top five miRNAs with the highest predicted probabilities generated by GTMALoc. We then manually verify these predictions against experimental evidence reported in the scientific literature. In total, 35 miRNA–localization associations are examined. As shown in Table 5 of these are supported by published studies, while only five lack current experimental validation.
Taking miR-122 and miR-21 as representative examples, we analyze the alignment between GTMALoc’s predictions and reported biological findings. miR-122 is a liver-specific miRNA that is highly enriched in hepatocytes, and its cytoplasmic localization is well supported by experimental evidence Zhang et al. (2021). Previous studies indicate that miR-122 plays a crucial role in liver homeostasis by regulating lipid metabolism, cholesterol biosynthesis, and HCV replication through mRNA binding Ren et al. (2008). GTMALoc assigns a high confidence score of 0.98 for its cytoplasmic localization and captures its interactions with liver metabolism-related mRNA nodes, highlighting the model’s capacity to extract biologically meaningful features from the molecular network. In contrast, miR-21 is known for its multi-localization behavior and is highly expressed in various cancer types. It has been shown to be secreted via exosomes, contributing to immune modulation and tumor microenvironment remodeling Krishnamurthy et al. (2018), and also localizes in the nucleolus, where it may influence non-coding RNA processing Beckett et al. (2015). GTMALoc successfully predicts both localizations with high confidence and focuses attention on miR-21’s connections to tumor-associated signaling pathways, consistent with its known roles in cell proliferation, anti-apoptosis, and inflammatory response. These case studies suggest that GTMALoc not only achieves accurate subcellular localization predictions but also provides biologically interpretable outputs, particularly for multi-localized miRNAs, offering valuable insights for downstream functional analysis and subcellular mechanism exploration.
10 Conclusion
In this study, we propose a computational model, GTMALoc, for predicting miRNA subcellular localization. GTMALoc combines graph transformers with a multi-head attention mechanism to fuse heterogeneous biological information from multiple sources. Specifically, the model effectively integrates miRNA sequence features, interaction network structures, and functional properties. Through graph-based modeling and dynamic attention weighting, GTMALoc learns more discriminative high-dimensional feature representations, significantly improving the accuracy of localization prediction. We conduct a comprehensive evaluation of GTMALoc on public datasets. The results show that GTMALoc outperforms existing methods on multiple performance metrics, especially in handling sparse graph structures and high-dimensional feature spaces. Ablation studies confirm the key contributions of each feature modality and attention component to the model’s overall performance. Additionally, through representative case studies, we validate the biological interpretability of GTMALoc. The predicted subcellular localizations are not only consistent with known miRNA functions reported in the literature but also reveal potential regulatory modes that have not been fully explored. Given that miRNA localization is closely related to its regulatory roles in various cellular contexts, accurate localization prediction provides valuable insights into miRNA-mediated mechanisms under physiological and pathological conditions.
Although GTMALoc performs well in various experiments, it still faces some limitations. We integrate multiple heterogeneous features, such as sequence data, functional similarity, and molecular interaction networks; however, biological data often contain noise and incompleteness. For example, the functional annotations of many miRNAs remain incomplete, and some interaction networks may contain missing data or experimental biases, which can adversely affect the accuracy of feature learning. Additionally, differences among data sources in species, experimental conditions, or time points introduce biases and impair the model’s generalizability. In future work, we plan to further refine the model architecture to improve its interpretability, adaptability, and applicability in real-world biomedical research scenarios.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: The datasets for this study are openly available in the public domain: RNALocate v2.0 at http://www.rnalocate.org/ or http://www.rna-society.org/rnalocate/. The code and data that support the findings of this study are available at https://github.com/27167199/GTMALoc.
Author contributions
XH: Conceptualization, Methodology, Writing – original draft. JJ: Writing – original draft, Software, Methodology. LS: Writing – original draft. CY: Supervision, Project administration, Writing – review and editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. Authors declare that the financial support is provided by the National Natural Science Foundation of China (Grant Nos. 62473149 and 61962050), the Natural Science Foundation of Hunan Province, China (Grant No. 2022JJ30428), and the Excellent Youth Funding Program of the Hunan Provincial Education Department (Grant No. 22B0372).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Asim, M. N., Malik, M. I., Zehe, C., Trygg, J., Dengel, A., and Ahmed, S. (2020). Mirlocpredictor: a convnet-based multi-label microrna subcellular localization predictor by incorporating k-mer positional information. Genes 11, 1475. doi:10.3390/genes11121475
Bai, T., Yan, K., and Liu, B. (2023a). Damirlocgnet: mirna subcellular localization prediction by combining mirna–disease associations and graph convolutional networks. Briefings Bioinforma. 24, bbad212.
Bai, T., Yan, K., and Liu, B. (2023b). Damirlocgnet: mirna subcellular localization prediction by combining mirna–disease associations and graph convolutional networks. Briefings Bioinforma. 24, bbad212. doi:10.1093/bib/bbad212
Bartel, D. P. (2009). Micrornas: target recognition and regulatory functions. cell 136, 215–233. doi:10.1016/j.cell.2009.01.002
Beckett, E. L., Martin, C., Choi, J. H., King, K., Niblett, S., Boyd, L., et al. (2015). Folate status, folate-related genes and serum mir-21 expression: implications for mir-21 as a biomarker. BBA Clin. 4, 45–51. doi:10.1016/j.bbacli.2015.06.006
Catalanotto, C., Cogoni, C., and Zardo, G. (2016). Microrna in control of gene expression: an overview of nuclear functions. Int. J. Mol. Sci. 17, 1712. doi:10.3390/ijms17101712
Chen, H., Zhang, Z., and Feng, D. (2019a). Prediction and interpretation of mirna-disease associations based on mirna target genes using canonical correlation analysis. BMC Bioinforma. 20, 404–408. doi:10.1186/s12859-019-2998-8
Chen, L., Gu, J., and Zhou, B. (2024). Pmislocmf: predicting mirna subcellular localizations by incorporating multi-source features of mirnas. Briefings Bioinforma. 25, bbae386. doi:10.1093/bib/bbae386
Chen, Q., Zhe, Z., Lan, W., Zhang, R., Wang, Z., Luo, C., et al. (2019b). Identifying mirna-disease association based on integrating mirna topological similarity and functional similarity. Quant. Biol. 7, 202–209. doi:10.1007/s40484-019-0176-7
Cui, T., Dou, Y., Tan, P., Ni, Z., Liu, T., Wang, D., et al. (2022). Rnalocate v2. 0: an updated resource for rna subcellular localization with increased coverage and annotation. Nucleic acids Res. 50, D333–D339. doi:10.1093/nar/gkab825
Dai, E., Yang, F., Wang, J., Zhou, X., Song, Q., An, W., et al. (2017). ncdr: a comprehensive resource of non-coding rnas involved in drug resistance. Bioinformatics 33, 4010–4011. doi:10.1093/bioinformatics/btx523
Dugger, B. N., and Dickson, D. W. (2017). Pathology of neurodegenerative diseases. Cold Spring Harb. Perspect. Biol. 9, a028035. doi:10.1101/cshperspect.a028035
Grover, A., and Leskovec, J. (2016). “node2vec: scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 855–864.
Guan, Y.-J., Yu, C.-Q., Qiao, Y., Li, L.-P., You, Z.-H., Ren, Z.-H., et al. (2022). Mfidma: a multiple information integration model for the prediction of drug–mirna associations. Biology 12, 41. doi:10.3390/biology12010041
Gurtan, A. M., and Sharp, P. A. (2013). The role of mirnas in regulating gene expression networks. J. Mol. Biol. 425, 3582–3600. doi:10.1016/j.jmb.2013.03.007
Holley, C. L., and Topkara, V. K. (2011). An introduction to small non-coding rnas: mirna and snorna. Cardiovasc. drugs Ther. 25, 151–159. doi:10.1007/s10557-011-6290-z
Hombach, S., and Kretz, M. (2016). “Non-coding rnas: classification, biology and functioning,” in Non-coding RNAs in colorectal cancer, 3–17.
Hsu, S.-D., Lin, F.-M., Wu, W.-Y., Liang, C., Huang, W.-C., Chan, W.-L., et al. (2011). mirtarbase: a database curates experimentally validated microrna–target interactions. Nucleic acids Res. 39, D163–D169. doi:10.1093/nar/gkq1107
Huang, H.-Y., Lin, Y.-C.-D., Li, J., Huang, K.-Y., Shrestha, S., Hong, H.-C., et al. (2020). Mirtarbase 2020: updates to the experimentally validated microrna–target interaction database. Nucleic acids Res. 48, D148–D154. doi:10.1093/nar/gkz896
Huang, T.-H., Fan, B., Rothschild, M. F., Hu, Z.-L., Li, K., and Zhao, S.-H. (2007). Mirfinder: an improved approach and software implementation for genome-wide fast microrna precursor scans. BMC Bioinforma. 8, 341–10. doi:10.1186/1471-2105-8-341
Jiang, Q., Hao, Y., Wang, G., Juan, L., Zhang, T., Teng, M., et al. (2010). Prioritization of disease micrornas through a human phenome-micrornaome network. BMC Syst. Biol. 4, S2–S9. doi:10.1186/1752-0509-4-S1-S2
Jie, M., Feng, T., Huang, W., Zhang, M., Feng, Y., Jiang, H., et al. (2021). Subcellular localization of mirnas and implications in cellular homeostasis. Genes 12, 856. doi:10.3390/genes12060856
Kabekkodu, S. P., Shukla, V., Varghese, V. K., D’Souza, J., Chakrabarty, S., and Satyamoorthy, K. (2018). Clustered mirnas and their role in biological functions and diseases. Biol. Rev. 93, 1955–1986. doi:10.1111/brv.12428
Kozomara, A., Birgaoanu, M., and Griffiths-Jones, S. (2019). mirbase: from microrna sequences to function. Nucleic acids Res. 47, D155–D162. doi:10.1093/nar/gky1141
Krishnamurthy, S., Pavani, C., Kurup, P., Palanisamy, S., Jagadeesh, A., Sekar, K., et al. (2018). Cystinuria in a 13-month-old girl with absence of mutations in the slc3a1 and slc7a9 genes. Indian J. Nephrol. 28, 84–85. doi:10.4103/ijn.IJN_20_17
Li, S., Lei, Z., and Sun, T. (2023). The role of micrornas in neurodegenerative diseases: a review. Cell Biol. Toxicol. 39, 53–83. doi:10.1007/s10565-022-09761-x
Li, Y., Qiu, C., Tu, J., Geng, B., Yang, J., Jiang, T., et al. (2014). Hmdd v2. 0: a database for experimentally supported human microrna and disease associations. Nucleic acids Res. 42, D1070–D1074. doi:10.1093/nar/gkt1023
Miska, E. (2007). “Microrna expression profiles classify human cancers,” in Cytometry part B-clinical cytometry (HOBOKEN, NJ 07030 USA: WILEY-LISS DIV JOHN WILEY and SONS INC), 72, 126.
Ren, X., Vincenz, C., and Kerppola, T. K. (2008). Changes in the distributions and dynamics of polycomb repressive complexes during embryonic stem cell differentiation. Mol. Cell. Biol. 28, 2884–2895. doi:10.1128/MCB.00949-07
Small, E. M., and Olson, E. N. (2011). Pervasive roles of micrornas in cardiovascular biology. Nature 469, 336–342. doi:10.1038/nature09783
Smith, T. F., and Waterman, M. S. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197. doi:10.1016/0022-2836(81)90087-5
Sun, J., Zhang, J., Li, Q., Yi, X., Liang, Y., and Zheng, Y. (2020). Predicting citywide crowd flows in irregular regions using multi-view graph convolutional networks. IEEE Trans. Knowl. Data Eng. 34, 2348–2359. doi:10.1109/tkde.2020.3008774
Thomson, D. W., and Dinger, M. E. (2016). Endogenous microrna sponges: evidence and controversy. Nat. Rev. Genet. 17, 272–283. doi:10.1038/nrg.2016.20
Wang, D., Wang, J., Lu, M., Song, F., and Cui, Q. (2010). Inferring the human microrna functional similarity and functional network based on microrna-associated diseases. Bioinformatics 26, 1644–1650. doi:10.1093/bioinformatics/btq241
Wang, X.-F., Huang, L., Wang, Y., Guan, R.-C., You, Z.-H., Sheng, N., et al. (2024a). Multi-view learning framework for predicting unknown types of cancer markers via directed graph neural networks fitting regulatory networks. Briefings Bioinforma. 25, bbae546. doi:10.1093/bib/bbae546
Wang, X.-F., Huang, L., Wang, Y., Guan, R.-C., You, Z.-H., Sheng, N., et al. (2024b). A multichannel graph neural network based on multisimilarity modality hypergraph contrastive learning for predicting unknown types of cancer biomarkers. Briefings Bioinforma. 25, bbae575. doi:10.1093/bib/bbae575
Wang, X.-F., Yu, C.-Q., Li, L.-P., You, Z.-H., Huang, W.-Z., Li, Y.-C., et al. (2022). Kgdcmi: a new approach for predicting circrna–mirna interactions from multi-source information extraction and deep learning. Front. Genet. 13, 958096. doi:10.3389/fgene.2022.958096
Wen, M., Cong, P., Zhang, Z., Lu, H., and Li, T. (2018). Deepmirtar: a deep-learning approach for predicting human mirna targets. Bioinformatics 34, 3781–3787. doi:10.1093/bioinformatics/bty424
Xu, M., Chen, Y., Xu, Z., Zhang, L., Jiang, H., and Pian, C. (2022). Mirloc: predicting mirna subcellular localization by incorporating mirna–mrna interactions and mrna subcellular localization. Briefings Bioinforma. 23, bbac044. doi:10.1093/bib/bbac044
Zhang, D., Ran, J., Li, J., Yu, C., Cui, Z., Amevor, F. K., et al. (2021). mir-21-5p regulates the proliferation and differentiation of skeletal muscle satellite cells by targeting klf3 in chicken. Genes 12, 814. doi:10.3390/genes12060814
Keywords: miRNA, subcellular localization, graph transformer, multi-head attention mechanism, multi-source features
Citation: Huang X, Jiang J, Shi L and Yan C (2025) GTMALoc: prediction of miRNA subcellular localization based on graph transformer and multi-head attention mechanism. Front. Genet. 16:1623008. doi: 10.3389/fgene.2025.1623008
Received: 05 May 2025; Accepted: 10 June 2025;
Published: 19 June 2025.
Edited by:
Federica Calore, The Ohio State University, United StatesCopyright © 2025 Huang, Jiang, Shi and Yan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Cheng Yan, eWFuY2hlbmcwMUBobnVjbS5lZHUuY24=