About this Research Topic
Next-generation sequencing (NGS) has revolutionized biomedical research, enabling genome-wide screening of genetic defects. The NGS based tests have many applications in Non-Invasive Prenatal Testing (NIPT), early detection of diseases, targeted therapy of various cancers and etiology of rare diseases. As genomic data increases, it will be a challenge to identify genetic patterns with traditional sampling based statistical methods. Therefore, advanced machine learning methods, such as deep learning, and Artificial Intelligence (AI), can be very beneficial. As an end-to-end method, the deep neural network can extract complex feature patterns automatically and construct predictive modeling with little manual feature engineering.
Another change the big data has caused is the comeback of instance-based or data-driven methods. Unlike the model-based learning or principle driven methods, the instance-based learning, such as K-nearest neighbors, is easy-to-use, easy-to-interpret and has high accuracy when the sample size is big enough to guarantee its performance and the system is too complex to build principle driven models. With clinical NGS big data, the genetic causes of various hereditary diseases can be revealed and the shared genetic relationships between diseases can be investigated. Some very different disease may share similar genetic causes and should be treated with similar approaches. Some similar diseases may have different genetic causes and should be treated accordingly.
The interpretable model with a simple rule is what we need the most to transform information exacted from big data into knowledge we can master and apply in medical practice. A black box AI algorithm cannot appease a worried patient. Therefore, the interpretable model is not only good for genetic counseling but also essential for knowledge validation and formation. It can also be used to check the accuracy of models and avoid misleading information caused by the bias of big data. The last but not the least change is that in clinical practice, the analysis methods for NGS panel data is quite different from the analysis methods for WGS/WES data which are widely used in the research community. For instance, CNV (Copy Number Variations) with paired WGS or WES data, it is easy to see the CNV peak, however, it is difficult to determine the CNV region, i.e. the start position and the end position. Furthermore, clinical panel data which only sequence several genes in tumor tissues, there are no other regions for comparison.
In order to do this, one needs to determine the baseline and more complex methods to infer the CNV status, only based on the sequencing data within a small region in tumor tissues. Most scientists have not faced such challenges and are not aware of such problems. For the clinical panel, most NGS analysis methods and tools are required to be re-invented. This Research Topic will focus on the challenges of clinical big data analysis in complex genetic diseases, by introducing the latest interpretable machine learning algorithms. Potential areas include, but are not limited to:
· Application of novel interpretable classification algorithms in clinical medicine
· Multi-omics big data integration analysis for genetic diseases
· Disease gene identification based on network analysis
· eQTL associations between SNPs and genes
· Optimization theory based on targeted therapy for cancer
· Development of new NGS based tests for genetic diseases
· Heterogeneous network construction of disease, genes, proteins, and drugs
Keywords: Next-generation sequencing, Non-invasive Prenatal Testing, Artificial Intelligence, Whole Exome Sequencing, Whole Genome Sequencing
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.