Advanced Interpretable Machine Learning Methods for Clinical NGS Big Data of Complex Hereditary Diseases

147.3K

views

174

authors

25

articles

Advanced Interpretable Machine Learning Methods for Clinical NGS Big Data of Complex Hereditary Diseases

147.3K

views

174

authors

25

articles

Editorial

23 October 2020

Editorial: Advanced Interpretable Machine Learning Methods for Clinical NGS Big Data of Complex Hereditary Diseases

Yudong Cai

,

Tao Huang

and

Peilin Jia

3,516 views

2 citations

Editors

3

Yudong Cai

School of Life Sciences, Shanghai University

Tao Huang

Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences (CAS)

Peilin Jia

Beijing Institute of Genomics

Impact

Fetching...

About

Next-generation sequencing (NGS) has revolutionized biomedical research, enabling genome-wide screening of genetic defects. The NGS based tests have many applications in Non-Invasive Prenatal Testing (NIPT), early detection of diseases, targeted therapy of various cancers and etiology of rare diseases. As genomic data increases, it will be a challenge to identify genetic patterns with traditional sampling based statistical methods. Therefore, advanced machine learning methods, such as deep learning, and Artificial Intelligence (AI), can be very beneficial. As an end-to-end method, the deep neural network can extract complex feature patterns automatically and construct predictive modeling with little manual feature engineering.

Another change the big data has caused is the comeback of instance-based or data-driven methods. Unlike the model-based learning or principle driven methods, the instance-based learning, such as K-nearest neighbors, is easy-to-use, easy-to-interpret and has high accuracy when the sample size is big enough to guarantee its performance and the system is too complex to build principle driven models. With clinical NGS big data, the genetic causes of various hereditary diseases can be revealed and the shared genetic relationships between diseases can be investigated. Some very different disease may share similar genetic causes and should be treated with similar approaches. Some similar diseases may have different genetic causes and should be treated accordingly.

The interpretable model with a simple rule is what we need the most to transform information exacted from big data into knowledge we can master and apply in medical practice. A black box AI algorithm cannot appease a worried patient. Therefore, the interpretable model is not only good for genetic counseling but also essential for knowledge validation and formation. It can also be used to check the accuracy of models and avoid misleading information caused by the bias of big data. The last but not the least change is that in clinical practice, the analysis methods for NGS panel data is quite different from the analysis methods for WGS/WES data which are widely used in the research community. For instance, CNV (Copy Number Variations) with paired WGS or WES data, it is easy to see the CNV peak, however, it is difficult to determine the CNV region, i.e. the start position and the end position. Furthermore, clinical panel data which only sequence several genes in tumor tissues, there are no other regions for comparison.

In order to do this, one needs to determine the baseline and more complex methods to infer the CNV status, only based on the sequencing data within a small region in tumor tissues. Most scientists have not faced such challenges and are not aware of such problems. For the clinical panel, most NGS analysis methods and tools are required to be re-invented. This Research Topic will focus on the challenges of clinical big data analysis in complex genetic diseases, by introducing the latest interpretable machine learning algorithms. Potential areas include, but are not limited to:

· Application of novel interpretable classification algorithms in clinical medicine
· Multi-omics big data integration analysis for genetic diseases
· Disease gene identification based on network analysis
· eQTL associations between SNPs and genes
· Optimization theory based on targeted therapy for cancer
· Development of new NGS based tests for genetic diseases
· Heterogeneous network construction of disease, genes, proteins, and drugs

Download ebook

PDF

EPUB

Share

Editors

Yudong Cai

School of Life Sciences, Shanghai University

Tao Huang

Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences (CAS)

Peilin Jia

Beijing Institute of Genomics

Impact

147,342 Total views

107,838 Article views

36,450 Article downloads

3,054 Topic views

Published In

Frontiers in Genetics

Computational Genomics

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Suggest a topic