Technology Report ARTICLE
DigitalDLSorter: Deep-Learning on scRNA-Seq to deconvolute gene expression data
- 1Spanish National Centre for Cardiovascular Research, Spain
The development of single cell transcriptome sequencing has allowed researchers the possibility to dig inside the role of the individual cell types in a plethora of disease scenarios. It also expands to the whole transcriptome what was only possible for a few tenths of antibodies in cell population analysis. Importantly, it allows resolving the permanent question of whether the changes observed in a particular bulk experiment are a consequence of changes in cell type proportions or an aberrant behavior of a particular cell type. However, single cell experiments are still complex to perform and expensive to sequence making bulk RNA-Seq experiments more common.
scRNA-Seq data is proving highly relevant for the characterization of the immune cell repertoire in different diseases ranging from cancer to atherosclerosis. In particular, as scRNA-Seq becomes more widely used, new types of immune cell populations emerge and their role in the genesis and evolution of the disease opens new avenues for personalized immune therapies. Immunotherapy have proven successful in a variety of tumors such breast, colon and melanoma.
From a statistical perspective, single-cell data are particularly interesting due to its high dimensionality, overcoming the limitations of the “skinny matrix” that traditional bulk RNA-Seq experiments yield. With the technological advances that enable sequencing hundreds of thousands of cells, scRNA-Seq data have become especially suitable for the application of Machine Learning algorithms such as Deep Learning (DL).
We present here a DL based method to enumerate and quantify the immune infiltration in colorectal and breast cancer bulk RNA-Seq samples starting from scRNA-Seq. Our method makes use of a Deep Neural Network (DNN) model that allows quantification not only of lymphocytes as a general population but also of specific CD8+, CD4Tmem, CD4Th and CD4Tregs subpopulations, as well as B-cells and Stromal content. Moreover, the signatures are built from scRNA-Seq data from the tumor, preserving the specific characteristics of the tumor microenvironment as opposite to other approaches in which cells were isolated from blood. Our method was applied to synthetic bulk RNA-Seq and to samples from the TCGA project yielding very accurate results in terms of quantification and survival prediction.
Keywords: Machin learning, Deconvolution algorithm, Cancer, immunology, single-cell, Transcriptomics, NGS
Received: 07 Mar 2019;
Accepted: 13 Sep 2019.
Copyright: © 2019 Torroja and Sanchez-Cabo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Dr. Carlos Torroja, Spanish National Centre for Cardiovascular Research, Madrid, Spain, email@example.com
Dr. Fatima Sanchez-Cabo, Spanish National Centre for Cardiovascular Research, Madrid, Spain, firstname.lastname@example.org