Artificial Intelligence for Extracting Phenotypic Features and Disease Subtyping Applied to Single-Cell Sequencing Data

40.7K
views
37
authors
8
articles
Cover image for research topic "Artificial Intelligence for Extracting Phenotypic Features and Disease Subtyping Applied to Single-Cell Sequencing Data"
Editors
5
Impact
Loading...
7,180 views
14 citations
Original Research
07 February 2022

The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a   log 2FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.

7,871 views
32 citations
Original Research
17 May 2021
Breast Cancer Case Identification Based on Deep Learning and Bioinformatics Analysis
Dongfang Jia
5 more and 
Xiaoyi Lv
Workflow of this study.

Mastering the molecular mechanism of breast cancer (BC) can provide an in-depth understanding of BC pathology. This study explored existing technologies for diagnosing BC, such as mammography, ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) and summarized the disadvantages of the existing cancer diagnosis. The purpose of this article is to use gene expression profiles of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) to classify BC samples and normal samples. The method proposed in this article triumphs over some of the shortcomings of traditional diagnostic methods and can conduct BC diagnosis more rapidly with high sensitivity and have no radiation. This study first selected the genes most relevant to cancer through weighted gene co-expression network analysis (WGCNA) and differential expression analysis (DEA). Then it used the protein–protein interaction (PPI) network to screen 23 hub genes. Finally, it used the support vector machine (SVM), decision tree (DT), Bayesian network (BN), artificial neural network (ANN), convolutional neural network CNN-LeNet and CNN-AlexNet to process the expression levels of 23 hub genes. For gene expression profiles, the ANN model has the best performance in the classification of cancer samples. The ten-time average accuracy is 97.36% (±0.34%), the F1 value is 0.8535 (±0.0260), the sensitivity is 98.32% (±0.32%), the specificity is 89.59% (±3.53%) and the AUC is 0.99. In summary, this method effectively classifies cancer samples and normal samples and provides reasonable new ideas for the early diagnosis of cancer in the future.

6,724 views
17 citations
Open for submission
Frontiers Logo

Frontiers in Genetics

From Graphs to Genes: Harnessing the Power of Machine Learning and Bioengineering
Edited by Hiep Thi Nguyen, Larisa M Haupt, Paul Dunn, Bao Chi Bui
Deadline
01 April 2025
Submit a paper
Recommended Research Topics
122.4K
views
174
authors
24
articles
49.9K
views
98
authors
15
articles
Frontiers Logo

Frontiers in Genetics

Novel Machine Learning Algorithms for Conventional Omics Data and their Application
Edited by Suyan Tian, Cong Liu, Jinpeng Liu, Wenzheng Bao
10.8K
views
16
authors
5
articles
Frontiers Logo

Frontiers in Genetics

Adoption of Artificial Intelligence in Human and Clinical Genomics, volume II
Edited by Li Zhang, GUANGMING ZHANG, Piyush Shukla
29K
views
72
authors
14
articles
Frontiers Logo

Frontiers in Genetics

Machine Learning Faciliates Biomarker Discovery for Complex Diseases
Edited by Suyan Tian, Changjing Zhuge, Cong Liu, Jinpeng Liu, Wenzheng Bao
12.5K
views
27
authors
4
articles