ORIGINAL RESEARCH article

Front. Genet.

Sec. Statistical Genetics and Methodology

Volume 16 - 2025 | doi: 10.3389/fgene.2025.1596049

This article is part of the Research TopicExpanding Insights Into Structure, Function, and Disorder of Genome by the Power of Artificial Intelligence in BioinformaticsView all 3 articles

Transcriptomic analysis and machine learning modeling identifies novel biomarkers and genetic characteristics of hypertrophic cardiomyopathy

Provisionally accepted
  • Department of Cardiovascular Medicine, People's Hospital of Linquan County, Linquan, China

The final, formatted version of the article will be published soon.

Objective: This study aimed to leverage bioinformatics approaches to identify novel biomarkers and characterize the molecular mechanisms underlying hypertrophic cardiomyopathy (HCM). Methods: Two RNA-sequencing datasets (GSE230585 and GSE249925) were obtained from the Gene Expression Omnibus (GEO) repository. Computational analysis was performed to compare transcriptomic profiles between normal cardiac tissues from healthy donors and myocardial tissues from HCM patients. Functional annotation of differentially expressed genes (DEGs) was performed using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. Immune cell infiltration patterns were quantified via single-sample gene set enrichment analysis (ssGSEA). A predictive model for HCM was developed through systematic evaluation of 113 combinations of 12 machine-learning algorithms, employing 10-fold cross-validation on training datasets and external validation using an independent cohort (GSE180313). Results: A total of 271 DEGs were identified, primarily enriched in multiple biological pathways. Immune infiltration analysis revealed distinct patterns of immune cell composition. Based on the top differentially expressed genes, a robust 12-gene diagnostic signature (COMP, SFRP4, RASD1, IL1RL1, S100A8, S100A9, ESM1, CA3, MYL1, VGLL2, MCEMP1, and MT1A) was constructed, demonstrating superior performance in both training and testing cohorts. Conclusions: This study utilized bioinformatics approaches to analyze RNA-sequencing datasets, identifying DEGs and distinct immune infiltration patterns in HCM. These findings enabled the construction of a 12-gene diagnostic signature with robust predictive performance, thereby advancing our understanding of HCM's molecular biomarkers and pathogenic mechanisms.

Keywords: Hypertrophic Cardiomyopathy, Gene Expression, RNA sequencing, Gene Expression Omnibus, DNA Repair, biomarker, machine learning

Received: 19 Mar 2025; Accepted: 19 May 2025.

Copyright: © 2025 Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Chunrui Li, Department of Cardiovascular Medicine, People's Hospital of Linquan County, Linquan, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.