<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2019.00214</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Recent Advances of Deep Learning in Bioinformatics and Computational Biology</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Tang</surname> <given-names>Binhua</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/306207/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Pan</surname> <given-names>Zixiang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn002"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/604172/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Yin</surname> <given-names>Kang</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/659001/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Khateeb</surname> <given-names>Asif</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/604388/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Epigenetics &#x00026; Function Group, Hohai University</institution>, <addr-line>Nanjing</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>School of Public Health, Shanghai Jiao Tong University</institution>, <addr-line>Shanghai</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Juan Caballero, Universidad Aut&#x000F3;noma de Quer&#x000E9;taro, Mexico</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Wenhai Zhang, Hengyang Normal University, China; Zhuliang Yu, South China University of Technology, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Binhua Tang <email>bh.tang&#x00040;hhu.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics</p></fn>
<fn fn-type="other" id="fn002"><p>&#x02020;These authors have contributed equally to this work</p></fn></author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>03</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>214</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>08</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>02</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2019 Tang, Pan, Yin and Khateeb.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Tang, Pan, Yin and Khateeb</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Extracting inherent valuable knowledge from omics big data remains as a daunting problem in bioinformatics and computational biology. Deep learning, as an emerging branch from machine learning, has exhibited unprecedented performance in quite a few applications from academia and industry. We highlight the difference and similarity in widely utilized models in deep learning studies, through discussing their basic structures, and reviewing diverse applications and disadvantages. We anticipate the work can serve as a meaningful perspective for further development of its theory, algorithm and application in bioinformatic and computational biology.</p></abstract>
<kwd-group>
<kwd>computational biology</kwd>
<kwd>bioinformatics</kwd>
<kwd>application</kwd>
<kwd>algorithm</kwd>
<kwd>deep learning</kwd>
</kwd-group>
<counts>
<fig-count count="9"/>
<table-count count="0"/>
<equation-count count="10"/>
<ref-count count="49"/>
<page-count count="10"/>
<word-count count="5885"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Deep learning is the emerging generation of the artificial intelligence techniques, specifically in machine learning. The earliest artificial intelligence was firstly implemented on hardware system in the 1950s. The newer concept with the more systematic theorems, named machine learning, appeared in the 1960s. And its newly-evolved branch, deep learning, was first brought up around the 2000s, and soon led to rapid applications in different fields, due to its unprecedented prediction performance on big data (Hinton and Salakhutdinov, <xref ref-type="bibr" rid="B17">2006</xref>; LeCun et al., <xref ref-type="bibr" rid="B24">2015</xref>; Nussinov, <xref ref-type="bibr" rid="B33">2015</xref>).</p>
<p>The basic concepts and models in deep learning have derived from the artificial neural network, which mimic human brain&#x00027;s activity pattern to intelligentize the algorithms and save tedious human labor (Mnih et al., <xref ref-type="bibr" rid="B32">2015</xref>; Schmidhuber, <xref ref-type="bibr" rid="B41">2015</xref>; Mamoshina et al., <xref ref-type="bibr" rid="B29">2016</xref>). Although deep learning is an emerging subfield recently from machine learning, it has immense utilizations spreading from machine vision, voice, and signal processing, sequence and text prediction, and computational biology topics, altogether shaping the productive AI fields (Bengio and LeCun, <xref ref-type="bibr" rid="B7">2007</xref>; Alipanahi et al., <xref ref-type="bibr" rid="B2">2015</xref>; Libbrecht and Noble, <xref ref-type="bibr" rid="B28">2015</xref>; Zhang et al., <xref ref-type="bibr" rid="B49">2016</xref>; Esteva et al., <xref ref-type="bibr" rid="B12">2017</xref>; Ching et al., <xref ref-type="bibr" rid="B9">2018</xref>). Deep learning has several implementation models as artificial neural network, deep structured learning, and hierarchical learning, which commonly apply a class of structured networks to infer the quantitative properties between responses and causes within a group of data (Ditzler et al., <xref ref-type="bibr" rid="B10">2015</xref>; Liang et al., <xref ref-type="bibr" rid="B27">2015</xref>; Xu J. et al., <xref ref-type="bibr" rid="B45">2016</xref>; Giorgi and Bader, <xref ref-type="bibr" rid="B14">2018</xref>).</p>
<p>The subsequent paragraphs mainly summarize the essential concepts and recent applications of deep learning, together highlight the key achievements and future directions of deep learning, especially from the perspectives of bioinformatics and computational biology.</p>
</sec>
<sec id="s2">
<title>Essential Concepts in Deep Neural Network</title>
<sec>
<title>Basic Structure of Neural Network</title>
<p>Neural network is a class of information processing modules, frequently utilized in machine learning. Within a multi-layer context, the basic building units, namely neurons, are connected to each other among the adjacent layers via internal links, but the neurons belonging to the same layer have no connection, as depicted in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The network structure of a deep learning model. Here we select a network structure with two hidden layers as an illustration, where <italic>X</italic> nodes constitute the input layer, <italic>H</italic>s for the hidden layers, <italic>Y</italic> for the output layer, and <italic>f</italic> (&#x000B7;) denotes an activation function.</p></caption>
<graphic xlink:href="fgene-10-00214-g0001.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="F1">Figure 1</xref>, each hidden layer processes its inputs via a connection function denoted as below,</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where <italic>W</italic> refers to the weight and <italic>b</italic> for bias. When all input layer neurons are active, each input neuron will multiply their respective weight matrix and the output will be summed up with a bias, which then will be fed into an adjacent hidden layer. Although the input-output formalization may repeat similarly among hidden layers, there is usually no direct connection between neurons within the same layer. And activation function is to quantify the connection between two neighboring neurons across two (hidden) layers.</p>
<p>Specifically, the input of the activation function is the combination <italic>W</italic><sup><italic>T</italic></sup><italic>X</italic>&#x0002B;<italic>b</italic> denoted in Equation (1), and the function output is then fed into the next neuron as a new input. Following the connection formula, the former input feature can be extracted to the next layer; by this means the features can be well-extracted and refined further. And the performance of the feature extraction depends significantly on the selection of the activation function.</p>
<p>Before training the network structure, the input raw datasets are usually separated into two or three groups, namely a training set and a test set, sometimes a validation set to examine the performance of previously trained network models, as depicted in <xref ref-type="fig" rid="F2">Figure 2</xref>. In practice, the original datasets are separated stochastically to avoid the potential local tendency, but the proportion of each set can be determined manually.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The general analysis procedure commonly adopted in deep learning, which covers training data preparation, model construction, hyperparameter fine-tuning (in training loop), prediction and performance evaluation. Basically, it still follows the requisite schema in machine learning.</p></caption>
<graphic xlink:href="fgene-10-00214-g0002.tif"/>
</fig>
</sec>
<sec>
<title>Learning by Training, Validation, and Testing</title>
<p>Normally, training a neural network refers to a process the network self-tunes its parameters or weights to meet the prespecified performance criteria, thus the trained model can be further used in regression or classification purposes. As depicted in <xref ref-type="fig" rid="F2">Figure 2</xref>, generally a complete dataset collected from a specific experiment beforehand can be split into the training and testing, and even validation sets, then followed by conventional tasks as model training, validation and performance comparison.</p>
<p>During training with initial batches of data samples, model parameters and their characteristics normally can be tuned by various learning paradigms, including appropriate activation and rectification functions. Then the trained network should be further tested or even validated with the other batch of samples, to acquire high robustness and satisfactory predictability, the processes of which are often referred as model testing and validation.</p>
<p>Usually, the three procedures above are faithfully implemented in conventional machine learning studies; and even in its quickly-evolving subfield, deep learning, the similar paradigm is always observed (LeCun et al., <xref ref-type="bibr" rid="B24">2015</xref>; Schmidhuber, <xref ref-type="bibr" rid="B41">2015</xref>).</p>
</sec>
<sec>
<title>Activation and Loss Function</title>
<p>After training completed, the neural network can perform regression or classification task on testing data, while there usually exists the difference between the predicted outputs and actual values. And the difference should be minimized to acquire optimal model performance.</p>
<p>Within a certain layer, error reduction requires scaling it back within a preset range before passing it onto the next layer of neurons. Activation herein is defined to control neurons&#x00027; outputs in &#x0201C;active&#x0201D; or &#x0201C;inactive&#x0201D; status, using those non-linear functions as rectified linear unit (ReLU), tanh, and logistic (Sigmoid or soft step) (LeCun et al., <xref ref-type="bibr" rid="B24">2015</xref>).</p>
<p>Besides, a loss function herein is to measure the total difference between the predicted and accurate values, through fine-tuning in backpropagation process. And it acts as an ending threshold for parameter optimization by means of iteratively evaluating the trained models.</p>
<p>With activation function in each neuron throughout diverse layers, a training procedure will continue searching a whole hyperparameter space till the ending threshold, compare and detect an optimal parameter combination by minimizing the preset loss function.</p>
</sec>
</sec>
<sec id="s3">
<title>Typical Algorithms and Applications</title>
<p>With the substantial progresses in advanced computation and Graphic Processing Unit (GPU) technologies, systematic interrogation into massive data to understand its inherent mechanisms becomes possible, especially through deep learning approaches. Hereinafter, we illustrated several frequently utilized models in deep learning literatures, in both recent computation theories and diverse applications.</p>
<sec>
<title>Recurrent Neural Network</title>
<p>Recurrent Neural Network (RNN) is a deep learning model different from traditional neural networks, since the former can integrate the previously learned status through a recurrent approach, namely backpropagation; while traditional neural network usually outputs prediction based on the status of the current layer.</p>
<p>Compared with traditional network models, RNN only has one hidden layer but it can unfold horizontally, and multi-vertical-groups are enabled to utilize most of the previous results, namely &#x0201C;using memory&#x0201D;.</p>
<p>As depicted in <xref ref-type="fig" rid="F3">Figure 3</xref>, the hidden layer neuron <italic>H</italic><sub><italic>n</italic></sub> is defined by Equation (2),</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>X</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where <italic>W</italic><sub>1, n</sub> and <italic>W</italic><sub>2, n</sub> represent weight matrix, <italic>b</italic><sub>1, n</sub> is a bias matrix, and &#x003C3;(&#x000B7;) (usually <italic>tanh</italic>(&#x000B7;)) for an activation function. Thus, each layer will generate a partial of output from the current hidden layer neuron with a weight matrix <italic>W</italic><sub>3, n</sub> and bias <italic>b</italic><sub>2, n</sub>, defined by Equation (3),</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>Y</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>H</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>And the total loss <italic>L</italic><sub><italic>total</italic></sub> will be the sum of the loss functions from each hidden layer, defined as below,</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo>=</mml:mo><mml:mstyle displaystyle='true'><mml:msubsup><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mrow><mml:mi>L</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mover accent='true'><mml:mi>Y</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>Thus, fine tuning of RNN backpropagation is based on three weights, <italic>W</italic><sub>1, n</sub>, <italic>W</italic><sub>2, n</sub>, and <italic>W</italic><sub>3, n</sub>. Since the multi-parameter setting in weights adds to the optimization burden, RNN usually performs worse than Convolutional Neural Network (CNN) in terms of fine-tuning. But frequently it is ensembled with CNN in diverse applications, such as dimension reduction, image, and video processing (Hinton and Salakhutdinov, <xref ref-type="bibr" rid="B17">2006</xref>; Hu and Lu, <xref ref-type="bibr" rid="B18">2018</xref>). Angermueller et al. proposed an ensembled RNN-CNN architecture, DeepCpG, on single-cell DNA methylation data, to better predict missing CpG status for genome-wide analysis; together the model&#x00027;s interpretable parameters shed light on the connection between sequence composition and methylation variability (Angermueller et al., <xref ref-type="bibr" rid="B3">2017</xref>). Section Autoencoder will specifically discuss CNN and its typical applications.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Illustrative structure diagram of Recurrent Neural Network, where <italic>X, Y</italic>, and <italic>W</italic> are defined the same as above; <italic>L</italic><sub><italic>i</italic></sub> denotes the loss function between the actual <italic>Y</italic><sub><italic>i</italic></sub> and predicted &#x00176;<sub><italic>i</italic></sub> (<italic>i</italic> &#x02208; <italic>N</italic>).</p></caption>
<graphic xlink:href="fgene-10-00214-g0003.tif"/>
</fig>
<p>Moreover, RNN outperforms those conventional models as logistic regression and SVM, and it can be implemented in various environments, accelerated by GPUs (Li et al., <xref ref-type="bibr" rid="B26">2017</xref>). Due to its structural characteristics, RNN is suitable to deal with long and sequential data, such as DNA array and genomics sequence (Pan et al., <xref ref-type="bibr" rid="B35">2008</xref>; Ray et al., <xref ref-type="bibr" rid="B40">2009</xref>; Jolma et al., <xref ref-type="bibr" rid="B21">2013</xref>; Lee and Young, <xref ref-type="bibr" rid="B25">2013</xref>; Alipanahi et al., <xref ref-type="bibr" rid="B2">2015</xref>; Xu T. et al., <xref ref-type="bibr" rid="B46">2016</xref>).</p>
<p>But RNN cannot interact with hidden neurons far from the current one. To construct an efficient framework of recalling deep memory, many improved algorithms have been proposed, like BRNN in protein secondary structure prediction (Baldi et al., <xref ref-type="bibr" rid="B6">1999</xref>), and MD-RNN in analyzing electron microscopy and MRIs of breast cancer samples (Kim et al., <xref ref-type="bibr" rid="B22">2018</xref>).</p>
<p>LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) are two recently-improved derivatives of RNN to solve the long-time dependence issues. GRU shares a similar structure with LSTM, which has several gates used for modeling its memory center. The current memory output is jointly influenced by its current input feature, the context (namely the past influence), and the inner action toward the input, as shown in <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>The LSTM network structure and its general information flow chart, where <italic>X, Y</italic>, and <italic>W</italic> are defined the same as above.</p></caption>
<graphic xlink:href="fgene-10-00214-g0004.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="F4">Figure 4</xref>, the yellow track refers to an input gate transfering its total past features, and is accessible for any new feature to be added. The green track is a mixture of an input gate and its former hidden layer neurons; and it decides what to omit, namely resetting activation function close to 0, and what to be updated into the yellow track. The blue track is the output gate integrating the inner influence from the yellow track, and it decides the output of the current hidden neurons and what to be passed to the next hidden neuron.</p>
<p>Recently an attention-based architecture, DeepDiff, utilizes a hierarchy of LSTM modules to characterize how various histone modifications cooperate simultaneously, and it can effectively predict cell-type-specific gene expression (Sekhon et al., <xref ref-type="bibr" rid="B42">2018</xref>).</p>
</sec>
<sec>
<title>Convolutional Neural Network</title>
<p>Convolutional neural networks (CNN or ConvNet) are suitable to process information in the form of multiple arrays (LeCun et al., <xref ref-type="bibr" rid="B24">2015</xref>; Esteva et al., <xref ref-type="bibr" rid="B12">2017</xref>; Hu and Lu, <xref ref-type="bibr" rid="B18">2018</xref>). To reduce the parameters without compromising its learning capacity is the general design principle of CNN (LeCun et al., <xref ref-type="bibr" rid="B24">2015</xref>; Krizhevsky et al., <xref ref-type="bibr" rid="B23">2017</xref>). And each convolution kernel&#x00027;s parameters in CNN are trained by the backpropagation algorithm.</p>
<p>Especially in image-related applications, CNN can cope with pixel scanning and processing, thus it greatly accelerates the implementation of optimized algorithms into practice (Esteva et al., <xref ref-type="bibr" rid="B12">2017</xref>; Quang et al., <xref ref-type="bibr" rid="B38">2018</xref>). Structurally, CNN consists of linear convolution operation, followed by nonlinear activators, pooling layers, and deep neural network classifier, depicted in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>The basic architecture and analysis procedure of a CNN model, which illustrates a classification procedure for an apple on a tree.</p></caption>
<graphic xlink:href="fgene-10-00214-g0005.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="F5">Figure 5</xref>, several filters are applied to convolve an input image, and its output is subsampled as a new input into the next layer; and convolution and subsampling processes are repeated till high level features, namely shapes, can be extracted. The more layers a CNN model has, the higher-level features it will extract.</p>
<p>In feature learning, convolution operation is to scan a 2D image with a given pattern, and calculate the matching degree at each step, then pooling identifies the pattern presence in the scanned region (Angermueller et al., <xref ref-type="bibr" rid="B4">2016</xref>). Activation function defines a neuron&#x00027;s output based on a set of given inputs. The weighted sum of inputs is passed through an activation function for non-linear transformation. A typical activation function returns a binary output, 0 or 1; when a neuron&#x00027;s accumulation exceeds a preset threshold, the neuron is activated and passes its information to the next layers; otherwise, the neuron is deactivated. Sigmoid, tanh, ReLU, leaky ReLU, and softmax are the commonly used activation functions (LeCun et al., <xref ref-type="bibr" rid="B24">2015</xref>; Schmidhuber, <xref ref-type="bibr" rid="B41">2015</xref>).</p>
<p>Through pooling layers, pixels are stretched to a single column vector. The vectorized and concatenated pixel information is fed into dense layers, known as fully connected layers for further classification. The fully-connected layer renders the final decision, where CNN returns a probability that an object in the image belongs to a specific type.</p>
<p>Following the fully-connected layer is a loss layer, which adjusts their weights across the network. A loss function is used to measure the model performance and inconsistency between the actual and predicted values. Model performance increases with decreasing of the loss function. For an output vector <italic>y</italic><sub><italic>i</italic></sub> and an input <italic>x</italic> &#x0003D; (<italic>x</italic><sub>1</sub>, <italic>x</italic><sub>2</sub>, &#x02026;, <italic>x</italic><sub><italic>n</italic></sub>), the mapping loss function <italic>L</italic>(&#x000B7;) between <italic>x</italic> and <italic>y</italic> is defined as,</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mi>L</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi>&#x003C6;</mml:mi><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003C3;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x003C9;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where &#x003C6; denotes an empirical risk for each output, &#x00177;<sub><italic>i</italic></sub> for the <italic>i</italic>-th prediction, <italic>n</italic> the total number of training samples, <italic>k</italic> the count of the weights &#x003C9;<sub><italic>ij</italic></sub> and <italic>b</italic><sub><italic>i</italic></sub> the bias for the activation function &#x003C3;<sub><italic>i</italic></sub>.</p>
<p>Recently, CNN has been adopted rapidly in biomedical imaging studies for its outstanding performance in computer vision and concurrent computation with GPUs (Ravi et al., <xref ref-type="bibr" rid="B39">2017</xref>). Usually convolution-pooling structure can better learn imaging features from CT scans and MRI images from head trauma, stroke diagnosis and brain EPV (enlarged perivascular space) detection (Chilamkurthy et al., <xref ref-type="bibr" rid="B8">2018</xref>; Dubost et al., <xref ref-type="bibr" rid="B11">2019</xref>).</p>
<p>In recent computational biology, a discriminative CNN framework, DeepChrome, is proposed to predict gene expression by feature extraction from histone modification. And the deep learning model outperforms traditional Random Forests and SVM on 56 cell types from REMC database (Singh et al., <xref ref-type="bibr" rid="B43">2016</xref>).</p>
<p>Furthermore, CNN can be combined with other deep learning models, such as RNN to predict imaging content, where CNN encodes an image and RNN generates the corresponding image description (Angermueller et al., <xref ref-type="bibr" rid="B4">2016</xref>). Till now, quite a few variants of CNN have been also proposed in diverse classification applications, like AlexNet with GPU support and DQN in reinforcement learning (Mnih et al., <xref ref-type="bibr" rid="B32">2015</xref>).</p>
</sec>
<sec>
<title>Autoencoder</title>
<p>Through an unsupervised manner, autoencoder is another typical artificial neural network, designed to precisely extract coding or representation features using data-driven learning (Min et al., <xref ref-type="bibr" rid="B30">2017</xref>; Zeng et al., <xref ref-type="bibr" rid="B48">2017</xref>; Yang et al., <xref ref-type="bibr" rid="B47">2018</xref>). For high-dimensional data, it is time-consuming and infeasible to load all raw data into a network, thus dimension reduction or compression is a necessity in preprocessing of raw data.</p>
<p>Autoencoder can compress and encode information from the input layer into a short code, then after specific processing, it will decode into the output closely matching the original input. <xref ref-type="fig" rid="F6">Figure 6</xref> illustrates its basic model structure and processing steps.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>The illustrative diagram of an autoencoder model. <bold>(A)</bold> Basic processing structure of autoencoder, corresponding to the input, hidden, and output layers; <bold>(B)</bold> Processing steps in encoding; <bold>(C)</bold> Processing steps in decoding.</p></caption>
<graphic xlink:href="fgene-10-00214-g0006.tif"/>
</fig>
<p>Convolution and pooling are two major steps in encoder, depcited in <xref ref-type="fig" rid="F6">Figure 6B</xref>; while decoder has two complete opposite steps, namely unpooling and deconvolution in <xref ref-type="fig" rid="F6">Figure 6C</xref>. Both convolution and pooling can compress data while preserving the most representative features in two different ways. Convolution involves continuously scanning data with a rectangle window, for example a 3 &#x000D7; 3 size; after each scanning, the window moves to a next position, namely pixel, by replacing the oldest elements with new ones, together with convolution operation. After the whole scanning and convolution, pooling is utilized to deeper compress on redundancy.</p>
<p>Similar to traditional PCA in dimension reduction to some extent, but autoencoder is more robust and effective in extracting data features for its non-linear transformation in hidden layers. Given an input <italic>x</italic>, the model extracts its main feature and generates <inline-formula><mml:math id="M6"><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>W</mml:mi><mml:mi>b</mml:mi></mml:math></inline-formula>, where <italic>W</italic> and <italic>b</italic> denote weighting and bias vectors, respectively. Commonly, the output cannot fit the input precisely, which can be measured with a loss function in mean squared error (MSE) defined in Equation (6),</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M7"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mi>L</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munderover><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover accent='true'><mml:mi>x</mml:mi><mml:mo>&#x0005E;</mml:mo></mml:mover><mml:mo>&#x02212;</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>Thus, the learning process is to minimize the loss <italic>L</italic> after iterative optimization.</p>
<p>Recently, sparse autoencoder (SAE) is frequented discussed for its admirable performance in dimension reduction and denoising corrupted data. And the loss function in SAE is defined in Equation (7),</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M8"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>A</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>L</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>+</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x02225;</mml:mo><mml:mover accent='true'><mml:mrow><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy='true'>&#x0005E;</mml:mo></mml:mover><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>where KL refers to KL-divergence in Equation (10), &#x003C1; for the activation level of neurons, usually set as 0.05 in condition of sigmoid, indicating most neurons are inactive, &#x003C1;<sub><italic>k</italic></sub> for the average activation level of neuron <italic>k</italic>, and &#x003B2; for the regularization coefficient.</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M9"><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x003C1;</mml:mi><mml:mo>&#x02225;</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:mover accent='true'><mml:mrow><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy='true'>&#x0005E;</mml:mo></mml:mover><mml:mtext>)</mml:mtext><mml:mo>=</mml:mo><mml:mi>&#x003C1;</mml:mi><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mfrac><mml:mi>&#x003C1;</mml:mi><mml:mrow><mml:mover accent='true'><mml:mrow><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy='true'>&#x0005E;</mml:mo></mml:mover></mml:mrow></mml:mfrac><mml:mo>+</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003C1;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mfrac><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02212;</mml:mo><mml:mover accent='true'><mml:mrow><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy='true'>&#x0005E;</mml:mo></mml:mover></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M10"><mml:mover accent="false"><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math></inline-formula> represents the average activation level of test samples, and <italic>x</italic><sup>(<italic>i</italic>)</sup> is the <italic>i</italic>-th test sample in Equation (9).</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M11"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mover accent='true'><mml:mrow><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy='true'>&#x0005E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo stretchy='false'>(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy='false'>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>For high dimensional data, multiple autoencoders can be stacked to act as a deep autoencoder (Hinton and Salakhutdinov, <xref ref-type="bibr" rid="B17">2006</xref>). And this architecture may lead to vanishing gradient, due to its gradient-based and backpropagation learning, and the current solutions include adopting ReLu activation and dropout (Szegedy et al., <xref ref-type="bibr" rid="B44">2015</xref>; Krizhevsky et al., <xref ref-type="bibr" rid="B23">2017</xref>). During configuration and pretraining, the model weights can be acquired by greedy layer-wise training, then the network can be fine-tuned with the backpropagation algorithm.</p>
<p>Many variations of autoencoder have been proposed recently, such as sparse autoencoder (SAE), denoising autoencoder (DAE). Typically, stacked sparse autoencoder (SSAE) was proposed to analyze high-resolution histopathological images in breast cancer (Xu J. et al., <xref ref-type="bibr" rid="B45">2016</xref>). By using SAE with three iterations, Heffernan et al. reported the successful prediction of protein secondary structure, local backbone angles, and solvent accessible surface area (Heffernan et al., <xref ref-type="bibr" rid="B15">2015</xref>). Miotto et al. introduced a stack of DAEs to predict features from a large scale of electronic health records (EHR), via an unsupervised representation approach (Miotto et al., <xref ref-type="bibr" rid="B31">2016</xref>). Ithapu et al. proposed a randomized denoising autoencoder marker (rDAm) to predict future cognitive and neural decline for Alzheimer diseases, with its performance surpassing the existing methods (Ithapu et al., <xref ref-type="bibr" rid="B20">2015</xref>).</p>
</sec>
<sec>
<title>Deep Belief Network</title>
<p>As a generative graphical model, Deep Belief Network (DBN) is composed of multiple Restricted Boltzmann Machines (RBM) or autoencoders stacked on top of each other, where each hidden layer in subnetworks serves as a visible layer for the next layer (Hinton et al., <xref ref-type="bibr" rid="B16">2006</xref>). The main network structures of RBM and DBN are depicted in <xref ref-type="fig" rid="F7">Figure 7</xref>, where it manifests the construction relations between the two network models.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Illustrative network structures of RBM and DBN. <bold>(A)</bold> The structure of RBM. <bold>(B)</bold>Take the hidden layer of the trained RBM to function as the visible layer of another RBM. <bold>(C)</bold> The structure of a DBN. It stacks several RBMs on top of each other to form a DBN.</p></caption>
<graphic xlink:href="fgene-10-00214-g0007.tif"/>
</fig>
<p>DBN trains layer by layer in an unsupervised greedy approach to initialize network weights, separately; then it can utilize the wake-sleep or backpropagation algorithm during fine-tuning. While for traditional backpropagation used in fine-tuning, DBN may encounter several problems: (1) requiring labeled data for training; (2) low learning rate; (3) inappropriate parameters tending to acquire local optimum.</p>
<p>Within recent applications, Plis et al. classified schizophrenia patients based on brain MRIs with DBN (Plis et al., <xref ref-type="bibr" rid="B37">2014</xref>); in drug design based on high-throughput screening, DBN was exploited to perform quantitative structure activity relationship (QSAR) study. And the result showed that the optimization in parameter initialization highly improves the capability of DNN to provide high-quality model predictions (Ghasemi et al., <xref ref-type="bibr" rid="B13">2018</xref>). DBN was also used to study the combination of resting-state fMRI (rs-fMRI), gray matter, and white matter data by exploiting the latent and abstract high-level features (Akhavan Aghdam et al., <xref ref-type="bibr" rid="B1">2018</xref>). Meanwhile, DBN and CNN were compared to prove that deep learning has better discriminative results and holds promise in the medical image diagnosis (Hua et al., <xref ref-type="bibr" rid="B19">2015</xref>).</p>
</sec>
<sec>
<title>Transfer Learning in Deep Learning</title>
<p>Besides the above deep learning models, transfer learning is frequently utilized in specific cases without sufficient labeling information or dimensionality (Pan and Yang, <xref ref-type="bibr" rid="B36">2010</xref>). Although conceptually it does not belong to deep learning, due to its transferability of high-level semantic classification for deep neural network, transfer learning has gained emerging notices from deep learning fields (O&#x00027;Shea et al., <xref ref-type="bibr" rid="B34">2013</xref>; Anthimopoulos et al., <xref ref-type="bibr" rid="B5">2016</xref>).</p>
<p>In quite a few deep learning studies, transfer learning enables a previously-trained model to transfer its optimized parameters to a new model, thus to implement the knowledge transmission and reduce repetitive training from scratch, as depicted in <xref ref-type="fig" rid="F8">Figure 8</xref>.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>The schematic illustration of transfer learning. Given source domain and its learning task, together with target domain and respective task, transfer learning aims to improve the learning of the target prediction function, with the knowledge in source domain and its task.</p></caption>
<graphic xlink:href="fgene-10-00214-g0008.tif"/>
</fig>
<p>Normally, source and target domains have certain statistical relationship or similarity that directly affects the transferability. The domain contains the original dataset, for example image matrix, and the task refers to certain processes, like classification or pattern recognition. The mission of transfer learning includes transferring not only the parameters like weight, but the concentrated small-size matrix from the origin data domain called knowledge distillation.</p>
<p>The knowledge distillation usually uses both &#x0201C;hard target&#x0201D; and &#x0201C;soft target&#x0201D; to train the model and obtain lower information entropy. The below softmax function is usually utilized to soften the sparse data and excavate its inherent features,</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M12"><mml:mrow><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>&#x003B1;</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003B1;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mi>T</mml:mi></mml:mfrac></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mstyle displaystyle='true'><mml:msub><mml:mo>&#x02211;</mml:mo><mml:mi>k</mml:mi></mml:msub><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>&#x003B1;</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mi>T</mml:mi></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:mstyle></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>
<p>where the logical judger &#x003B1;<sub><italic>k</italic></sub> is the input, <italic>f</italic> (&#x000B7;) is to soft target data and can offer smaller gradient variance, <italic>k</italic> denotes the <italic>k</italic>-th segmented data slice. The parameter <italic>T</italic> is called temperature and the larger <italic>T</italic> is, the softer the target is.</p>
<p>Furthermore, transfer learning is categorized into instance-based, feature-based, parameter-based and relation-based derivatives, depicted in <xref ref-type="fig" rid="F9">Figure 9</xref>. Currently transfer learning is frequently discussed in the deep learning fields for its great applicability and performance. Ensembled with CNN, transfer learning can attain greater prediction performance of interstitial lung disease CT scans (Anthimopoulos et al., <xref ref-type="bibr" rid="B5">2016</xref>). It was also used as a ligament between the multi-layer LSTM and conditional random field (CRF), and the result showed that the LSTM-CRF approach outperformed the baseline methods on the target datasets (Giorgi and Bader, <xref ref-type="bibr" rid="B14">2018</xref>).</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>Transfer learning has several derivatives categorized by the labeling information and difference between the target and source.</p></caption>
<graphic xlink:href="fgene-10-00214-g0009.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="conclusions" id="s4">
<title>Conclusions</title>
<p>Within the work, we comprehensively summarized the basic but essential concepts and methods in deep learning, together with its recent applications in diverse biomedical studies. Through reviewing those typical deep learning models as RNN, CNN, autoencoder, and DBN, we highlight that the specific application scenario or context, such as data feature and model applicability, are the prominent factors in designing a suitable deep learning approach to extract knowledge from data; thus, how to decipher and characterize data feature is not a trivial work in deep-learning workflow yet. In recent deep learning studies, many derivatives from classic network models, including the network models depicted above, manifest that model selection affects the effectiveness of deep learning application.</p>
<p>Secondly, for its limitation and further improvement direction, we should revisit the nature of the method: deep learning is essentially a continuous manifold transformation among diverse vector spaces, but there exist quite a few tasks cannot be converted into a deep learning model, or in a learnable approach, due to the complex geometric transform. Moreover, deep learning is generally a big-data-driven technique, which has made it unique from conventional statistical learning or Bayesian approaches. Thus, it is a new direction for deep learning to integrate or embed with other conventional algorithms in tackling those complicated tasks.</p>
<p>Thirdly, when it comes to innovation in computational algorithm and hardware. As an inference technique driven by big data, deep learning demands parallel computation facilities of high performance, together with more algorithmic breakthroughs and fast accumulation of diverse perceptual data, it is achieving pervasive successes in many fields and applications. Particularly in bioinformatics and computational biology, which is a typical data-oriented field, it has witnessed the remarkable changes taken place in its research methods.</p>
<p>Finally, as unprecedented innovation and successes acquired with deep learning in diverse subfields, some even argued that deep learning could bring about another wave like the internet. In the long term, deep learning technique is shaping the future of our lives and societies to its full extent. But deep learning should not be misinterpreted or overestimated either in academia or AI industry, and actually it has lots of technical problems to solve due to its nature. In all, we anticipate this review work will provide a meaningful perspective to help our researchers gain comprehensive knowledge and make more progresses in this ever-faster developing field.</p>
</sec>
<sec id="s5">
<title>Author Contributions</title>
<p>BT conceived the study. ZP, KY, AK, and BT drafted the application sections and revised and approved the final manuscript.</p>
<sec>
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Akhavan Aghdam</surname> <given-names>M.</given-names></name> <name><surname>Sharifi</surname> <given-names>A.</given-names></name> <name><surname>Pedram</surname> <given-names>M. M.</given-names></name></person-group> (<year>2018</year>). <article-title>Combination of rs-fMRI and sMRI data to discriminate autism spectrum disorders in young children using deep belief network</article-title>. <source>J. Digit. Imaging</source>. <volume>31</volume>, <fpage>895</fpage>&#x02013;<lpage>903</lpage>. <pub-id pub-id-type="doi">10.1007/s10278-018-0093-8</pub-id><pub-id pub-id-type="pmid">29736781</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alipanahi</surname> <given-names>B.</given-names></name> <name><surname>Delong</surname> <given-names>A.</given-names></name> <name><surname>Weirauch</surname> <given-names>M. T.</given-names></name> <name><surname>Frey</surname> <given-names>B. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning</article-title>. <source>Nat. Biotechnol.</source> <volume>33</volume>:<fpage>831</fpage>&#x02013;<lpage>838</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3300</pub-id><pub-id pub-id-type="pmid">26213851</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Angermueller</surname> <given-names>C.</given-names></name> <name><surname>Lee</surname> <given-names>H. J.</given-names></name> <name><surname>Reik</surname> <given-names>W.</given-names></name> <name><surname>Stegle</surname> <given-names>O.</given-names></name></person-group> (<year>2017</year>). <article-title>DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning</article-title>. <source>Genome Biol.</source> <volume>18</volume>:<fpage>67</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-017-1189-z</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Angermueller</surname> <given-names>C.</given-names></name> <name><surname>P&#x000E4;rnamaa</surname> <given-names>T.</given-names></name> <name><surname>Parts</surname> <given-names>L.</given-names></name> <name><surname>Stegle</surname> <given-names>O.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep learning for computational biology</article-title>. <source>Mol. Syst. Biol.</source> <volume>12</volume>:<fpage>878</fpage>. <pub-id pub-id-type="doi">10.15252/msb.20156651</pub-id><pub-id pub-id-type="pmid">27474269</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anthimopoulos</surname> <given-names>M.</given-names></name> <name><surname>Christodoulidis</surname> <given-names>S.</given-names></name> <name><surname>Ebner</surname> <given-names>L.</given-names></name> <name><surname>Christe</surname> <given-names>A.</given-names></name> <name><surname>Mougiakakou</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Lung pattern classification for interstitial lung diseases using a deep convolutional neural network</article-title>. <source>IEEE Trans. Med. Imag.</source> <volume>35</volume>, <fpage>1207</fpage>&#x02013;<lpage>1216</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2016.2535865</pub-id><pub-id pub-id-type="pmid">26955021</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baldi</surname> <given-names>P.</given-names></name> <name><surname>Brunak</surname> <given-names>S.</given-names></name> <name><surname>Frasconi</surname> <given-names>P.</given-names></name> <name><surname>Soda</surname> <given-names>G.</given-names></name> <name><surname>Pollastri</surname> <given-names>G.</given-names></name></person-group> (<year>1999</year>). <article-title>Exploiting the past and the future in protein secondary structure prediction</article-title>. <source>Bioinformatics</source> <volume>15</volume>:<fpage>937</fpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/15.11.937</pub-id><pub-id pub-id-type="pmid">10743560</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>LeCun</surname> <given-names>Y.</given-names></name></person-group> (<year>2007</year>). <article-title>Scaling learning algorithms toward AI</article-title>, in <source>Large-Scale Kernel Machines</source>, eds <person-group person-group-type="editor"><name><surname>Bottou</surname> <given-names>L.</given-names></name> <name><surname>Chapelle</surname> <given-names>O.</given-names></name> <name><surname>DeCoste</surname> <given-names>D.</given-names></name> <name><surname>Weston</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>The MIT Press</publisher-name>).</citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chilamkurthy</surname> <given-names>S.</given-names></name> <name><surname>Ghosh</surname> <given-names>R.</given-names></name> <name><surname>Tanamala</surname> <given-names>S.</given-names></name> <name><surname>Biviji</surname> <given-names>M.</given-names></name> <name><surname>Campeau</surname> <given-names>N. G.</given-names></name> <name><surname>Venugopal</surname> <given-names>V. K.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study</article-title>. <source>Lancet</source> <volume>392</volume>, <fpage>2388</fpage>&#x02013;<lpage>2396</lpage>. <pub-id pub-id-type="doi">10.1016/S0140-6736(18)31645-3</pub-id><pub-id pub-id-type="pmid">30318264</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ching</surname> <given-names>T.</given-names></name> <name><surname>Himmelstein</surname> <given-names>D. S.</given-names></name> <name><surname>Beaulieu-Jones</surname> <given-names>B. K.</given-names></name> <name><surname>Kalinin</surname> <given-names>A. A.</given-names></name> <name><surname>Do</surname> <given-names>B. T.</given-names></name> <name><surname>Way</surname> <given-names>G. P.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Opportunities and obstacles for deep learning in biology and medicine</article-title>. <source>J. R. Soc. Interface</source> <volume>15</volume>:<fpage>20170387</fpage>. <pub-id pub-id-type="doi">10.1098/rsif.2017.0387</pub-id><pub-id pub-id-type="pmid">29618526</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ditzler</surname> <given-names>G.</given-names></name> <name><surname>Polikar</surname> <given-names>R.</given-names></name> <name><surname>Member</surname> <given-names>S.</given-names></name> <name><surname>Rosen</surname> <given-names>G.</given-names></name> <name><surname>Member</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>Multi-layer and recursive neural networks for metagenomic classification</article-title>. <source>IEEE. Trans. Nanobiosci.</source> <volume>14</volume>:<fpage>608</fpage>. <pub-id pub-id-type="doi">10.1109/TNB.2015.2461219</pub-id><pub-id pub-id-type="pmid">26316190</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dubost</surname> <given-names>F.</given-names></name> <name><surname>Adams</surname> <given-names>H.</given-names></name> <name><surname>Bortsova</surname> <given-names>G.</given-names></name> <name><surname>Ikram</surname> <given-names>M. A.</given-names></name> <name><surname>Niessen</surname> <given-names>W.</given-names></name> <name><surname>Vernooij</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>3D regression neural network for the quantification of enlarged perivascular spaces in brain MRI</article-title>. <source>Med. Image Anal.</source> <volume>51</volume>, <fpage>89</fpage>&#x02013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1016/j.media.2018.10.008</pub-id><pub-id pub-id-type="pmid">30390514</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Esteva</surname> <given-names>A.</given-names></name> <name><surname>Kuprel</surname> <given-names>B.</given-names></name> <name><surname>Novoa</surname> <given-names>R. A.</given-names></name> <name><surname>Ko</surname> <given-names>J.</given-names></name> <name><surname>Swetter</surname> <given-names>S. M.</given-names></name> <name><surname>Blau</surname> <given-names>H. M.</given-names></name> <name><surname>Thrun</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Dermatologist-level classification of skin cancer with deep neural networks</article-title>. <source>Nature</source> <volume>542</volume>:<fpage>115</fpage>&#x02013;<lpage>118</lpage>. <pub-id pub-id-type="doi">10.1038/nature21056</pub-id><pub-id pub-id-type="pmid">28117445</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ghasemi</surname> <given-names>F.</given-names></name> <name><surname>Mehridehnavi</surname> <given-names>A.</given-names></name> <name><surname>Fassihi</surname> <given-names>A.</given-names></name> <name><surname>P&#x000E9;rez-S&#x000E1;nchez</surname> <given-names>H.</given-names></name></person-group> (<year>2018</year>). <article-title>Deep neural network in QSAR studies using deep belief network</article-title>. <source>Appl. Soft Comput.</source> <volume>62</volume>, <fpage>251</fpage>&#x02013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2017.09.040</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giorgi</surname> <given-names>J. M.</given-names></name> <name><surname>Bader</surname> <given-names>G. D.</given-names></name></person-group> (<year>2018</year>). <article-title>Transfer learning for biomedical named entity recognition with neural networks</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>4087</fpage>&#x02013;<lpage>4094</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty449</pub-id><pub-id pub-id-type="pmid">29868832</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heffernan</surname> <given-names>R.</given-names></name> <name><surname>Paliwal</surname> <given-names>K.</given-names></name> <name><surname>Lyons</surname> <given-names>J.</given-names></name> <name><surname>Dehzangi</surname> <given-names>A.</given-names></name> <name><surname>Sharma</surname> <given-names>A.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning</article-title>. <source>Sci. Rep.</source> <volume>5</volume>:<fpage>11476</fpage>. <pub-id pub-id-type="doi">10.1038/srep11476</pub-id><pub-id pub-id-type="pmid">26098304</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Osindero</surname> <given-names>S.</given-names></name> <name><surname>Teh</surname> <given-names>Y. W.</given-names></name></person-group> (<year>2006</year>). <article-title>A fast learning algorithm for deep belief nets</article-title>. <source>Neural. Comput.</source> <volume>18</volume>, <fpage>1527</fpage>&#x02013;<lpage>1554</lpage>. <pub-id pub-id-type="doi">10.1162/neco.2006.18.7.1527</pub-id><pub-id pub-id-type="pmid">16764513</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R. R.</given-names></name></person-group> (<year>2006</year>). <article-title>Reducing the dimensionality of data with neural networks</article-title>. <source>Science</source> <volume>313</volume>, <fpage>504</fpage>&#x02013;<lpage>507</lpage>. <pub-id pub-id-type="doi">10.1126/science.1127647</pub-id><pub-id pub-id-type="pmid">16873662</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>Y.</given-names></name> <name><surname>Lu</surname> <given-names>X.</given-names></name></person-group> (<year>2018</year>). <article-title>Learning spatial-temporal features for video copy detection by the combination of CNN and RNN</article-title>. <source>J. Vis. Commun. Image Rep.</source> <volume>55</volume>, <fpage>21</fpage>&#x02013;<lpage>29</lpage>. <pub-id pub-id-type="doi">10.1016/j.jvcir.2018.05.013</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hua</surname> <given-names>K. L.</given-names></name> <name><surname>Hsu</surname> <given-names>C. H.</given-names></name> <name><surname>Hidayati</surname> <given-names>H. C.</given-names></name> <name><surname>Cheng</surname> <given-names>W. H.</given-names></name> <name><surname>Chen</surname> <given-names>Y. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Computer-aided classification of lung nodules on computed tomography images via deep learning technique</article-title>. <source>Oncotargets Ther.</source> <volume>8</volume>:<fpage>2015</fpage>&#x02013;<lpage>2022</lpage>. <pub-id pub-id-type="doi">10.2147/OTT.S80733</pub-id><pub-id pub-id-type="pmid">26346558</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ithapu</surname> <given-names>V. K.</given-names></name> <name><surname>Singh</surname> <given-names>V.</given-names></name> <name><surname>Okonkwo</surname> <given-names>O. C.</given-names></name> <name><surname>Chappell</surname> <given-names>R. J.</given-names></name> <name><surname>Dowling</surname> <given-names>N. M.</given-names></name> <name><surname>Johnson</surname> <given-names>S. C.</given-names></name></person-group> (<year>2015</year>). <article-title>Imaging-based enrichment criteria using deep learning algorithms for efficient clinical trials in mild cognitive impairment</article-title>. <source>Alzheimer&#x00027;s Dement.</source> <volume>11</volume>:<fpage>1489</fpage>&#x02013;<lpage>1499</lpage>. <pub-id pub-id-type="doi">10.1016/j.jalz.2015.01.010</pub-id><pub-id pub-id-type="pmid">26093156</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jolma</surname> <given-names>A.</given-names></name> <name><surname>Yan</surname> <given-names>J.</given-names></name> <name><surname>Whitington</surname> <given-names>T.</given-names></name> <name><surname>Toivonen</surname> <given-names>J.</given-names></name> <name><surname>Nitta Kazuhiro</surname> <given-names>R</given-names></name> <name><surname>Rastas</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>DNA-binding specificities of human transcription factors</article-title>. <source>Cell</source> <volume>152</volume>, <fpage>327</fpage>&#x02013;<lpage>339</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2012.12.009</pub-id><pub-id pub-id-type="pmid">23332764</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>Y.</given-names></name> <name><surname>Sim</surname> <given-names>S. H.</given-names></name> <name><surname>Park</surname> <given-names>B.</given-names></name> <name><surname>Lee</surname> <given-names>K. S.</given-names></name> <name><surname>Chae</surname> <given-names>I. H.</given-names></name> <name><surname>Park</surname> <given-names>I. H.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>MRI assessment of residual breast cancer after neoadjuvant chemotherapy: relevance to tumor subtypes and MRI interpretation threshold</article-title>. <source>Clin. Breast Cancer</source> <volume>18</volume>, <fpage>459</fpage>&#x02013;<lpage>467</lpage>.e1 <pub-id pub-id-type="doi">10.1016/j.clbc.2018.05.009</pub-id><pub-id pub-id-type="pmid">29954674</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krizhevsky</surname> <given-names>A.</given-names></name> <name><surname>Sutskever</surname> <given-names>I.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name></person-group> (<year>2017</year>). <article-title>ImageNet classification with deep convolutional neural networks</article-title>. <source>Commun. ACM</source> <volume>60</volume>, <fpage>84</fpage>&#x02013;<lpage>90</lpage>.</citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeCun</surname> <given-names>Y.</given-names></name> <name><surname>Bengio</surname> <given-names>Y.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep learning</article-title>. <source>Nature</source> <volume>521</volume>, <fpage>436</fpage>. <pub-id pub-id-type="doi">10.1038/nature14539</pub-id><pub-id pub-id-type="pmid">26017442</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname> <given-names>T. I.</given-names></name> <name><surname>Young</surname> <given-names>R. A.</given-names></name></person-group> (<year>2013</year>). <article-title>Transcriptional regulation and its misregulation in disease</article-title>. <source>Cell</source> <volume>152</volume>, <fpage>1237</fpage>&#x02013;<lpage>1251</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2013.02.014</pub-id><pub-id pub-id-type="pmid">23498934</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>A.</given-names></name> <name><surname>Serban</surname> <given-names>R.</given-names></name> <name><surname>Negrut</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Analysis of a splitting approach for the parallel solution of linear systems on GPU cards</article-title>. <source>SIAM J. Sci. Comput.</source> <volume>39</volume>, <fpage>C215</fpage>&#x02013;<lpage>C237</lpage>. <pub-id pub-id-type="doi">10.1137/15M1039523</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liang</surname> <given-names>M.</given-names></name> <name><surname>Li</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>T.</given-names></name> <name><surname>Zeng</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinf.</source> <volume>12</volume>, <fpage>928</fpage>&#x02013;<lpage>937</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2014.2377729</pub-id><pub-id pub-id-type="pmid">26357333</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Libbrecht</surname> <given-names>M. W.</given-names></name> <name><surname>Noble</surname> <given-names>W. S.</given-names></name></person-group> (<year>2015</year>). <article-title>Machine learning applications in genetics and genomics</article-title>. <source>Nat. Rev. Genet.</source> <volume>16</volume>:<fpage>321</fpage>&#x02013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3920</pub-id><pub-id pub-id-type="pmid">25948244</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mamoshina</surname> <given-names>P.</given-names></name> <name><surname>Vieira</surname> <given-names>A.</given-names></name> <name><surname>Putin</surname> <given-names>E.</given-names></name> <name><surname>Zhavoronkov</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Applications of deep learning in biomedicine</article-title>. <source>Mol. Pharmaceut.</source> <volume>13</volume>, <fpage>1445</fpage>&#x02013;<lpage>1454</lpage>. <pub-id pub-id-type="doi">10.1021/acs.molpharmaceut.5b00982</pub-id><pub-id pub-id-type="pmid">27007977</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Min</surname> <given-names>S.</given-names></name> <name><surname>Lee</surname> <given-names>B.</given-names></name> <name><surname>Yoon</surname> <given-names>S.</given-names></name></person-group> (<year>2017</year>). <article-title>Deep learning in bioinformatics</article-title>. <source>Brief Bioinform.</source> <volume>18</volume>, <fpage>851</fpage>&#x02013;<lpage>869</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbw068</pub-id><pub-id pub-id-type="pmid">27473064</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Miotto</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>L.</given-names></name> <name><surname>Kidd</surname> <given-names>B. A.</given-names></name> <name><surname>Dudley</surname> <given-names>J. T.</given-names></name></person-group> (<year>2016</year>). <article-title>Deep patient: an unsupervised representation to predict the future of patients from the electronic health records</article-title>. <source>Sci. Rep.</source> <volume>6</volume>:<fpage>26094</fpage>. <pub-id pub-id-type="doi">10.1038/srep26094</pub-id><pub-id pub-id-type="pmid">27185194</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mnih</surname> <given-names>V.</given-names></name> <name><surname>Kavukcuoglu</surname> <given-names>K.</given-names></name> <name><surname>Silver</surname> <given-names>D.</given-names></name> <name><surname>Rusu</surname> <given-names>A. A.</given-names></name> <name><surname>Veness</surname> <given-names>J.</given-names></name> <name><surname>Bellemare</surname> <given-names>M. G.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Human-level control through deep reinforcement learning</article-title>. <source>Nature</source> <volume>518</volume>, <fpage>529</fpage>&#x02013;<lpage>533</lpage>. <pub-id pub-id-type="doi">10.1038/nature14236</pub-id><pub-id pub-id-type="pmid">25719670</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nussinov</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Advancements and challenges in computational biology</article-title>. <source>PLoS Comput. Biol.</source> <volume>11</volume>:<fpage>e1004053</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004053</pub-id><pub-id pub-id-type="pmid">25569585</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Shea</surname> <given-names>J. P.</given-names></name> <name><surname>Chou</surname> <given-names>M. F.</given-names></name> <name><surname>Quader</surname> <given-names>S. A.</given-names></name> <name><surname>Ryan</surname> <given-names>J. K.</given-names></name> <name><surname>Church</surname> <given-names>G. M.</given-names></name> <name><surname>Schwartz</surname> <given-names>D.</given-names></name></person-group> (<year>2013</year>). <article-title>pLogo: a probabilistic approach to visualizing sequence motifs</article-title>. <source>Nat. Methods</source> <volume>10</volume>, <fpage>1211</fpage>&#x02013;<lpage>1212</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.2646</pub-id><pub-id pub-id-type="pmid">24097270</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname> <given-names>Q.</given-names></name> <name><surname>Shai</surname> <given-names>O.</given-names></name> <name><surname>Lee</surname> <given-names>L. J.</given-names></name> <name><surname>Frey</surname> <given-names>B. J.</given-names></name> <name><surname>Blencowe</surname> <given-names>B. J.</given-names></name></person-group> (<year>2008</year>). <article-title>Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing</article-title>. <source>Nat. Genet.</source> <volume>40</volume>, <fpage>1413</fpage>&#x02013;<lpage>1415</lpage>. <pub-id pub-id-type="doi">10.1038/ng.259</pub-id><pub-id pub-id-type="pmid">18978789</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname> <given-names>S. J.</given-names></name> <name><surname>Yang</surname> <given-names>Q.</given-names></name></person-group> (<year>2010</year>). <article-title>A survey on transfer learning</article-title>. <source>IEEE Trans. Knowl. Data Eng.</source> <volume>22</volume>, <fpage>1345</fpage>&#x02013;<lpage>1359</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2009.191</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Plis</surname> <given-names>S. M.</given-names></name> <name><surname>Hjelm</surname> <given-names>D. R.</given-names></name> <name><surname>Salakhutdinov</surname> <given-names>R.</given-names></name> <name><surname>Allen</surname> <given-names>E. A.</given-names></name> <name><surname>Bockholt</surname> <given-names>H. J.</given-names></name> <name><surname>Long</surname> <given-names>J. D.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Deep learning for neuroimaging: a validation study</article-title>. <source>Front. Neurosci.</source> <volume>8</volume>:<fpage>229</fpage>. <pub-id pub-id-type="doi">10.3389/fnins.2014.00229</pub-id><pub-id pub-id-type="pmid">25191215</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quang</surname> <given-names>D.</given-names></name> <name><surname>Guan</surname> <given-names>Y.</given-names></name> <name><surname>Parker</surname> <given-names>S. C. J.</given-names></name></person-group> (<year>2018</year>). <article-title>YAMDA thousandfold speedup of EM-based motif discovery using deep learning libraries and GPU</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>3578</fpage>&#x02013;<lpage>3580</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty396</pub-id><pub-id pub-id-type="pmid">29790915</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ravi</surname> <given-names>D.</given-names></name> <name><surname>Wong</surname> <given-names>C.</given-names></name> <name><surname>Deligianni</surname> <given-names>F.</given-names></name> <name><surname>Berthelot</surname> <given-names>M.</given-names></name> <name><surname>Andreu-Perez</surname> <given-names>J.</given-names></name> <name><surname>Lo</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Deep learning for health informatics</article-title>. <source>IEEE J. Biomed. Health Inform.</source> <volume>21</volume>, <fpage>4</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1109/JBHI.2016.2636665</pub-id><pub-id pub-id-type="pmid">28055930</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ray</surname> <given-names>D.</given-names></name> <name><surname>Kazan</surname> <given-names>H.</given-names></name> <name><surname>Chan</surname> <given-names>E. T.</given-names></name> <name><surname>Pe&#x000F1;a</surname> <given-names>L. C.</given-names></name> <name><surname>Chaudhry</surname> <given-names>S.</given-names></name> <name><surname>Talukder</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins</article-title>. <source>Nat. Biotechnol.</source> <volume>27</volume>, <fpage>667</fpage>&#x02013;<lpage>670</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.1550</pub-id><pub-id pub-id-type="pmid">19561594</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmidhuber</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Deep learning in neural networks: an overview. (2015)</article-title>. <source>Neural. Net.</source> <volume>61</volume>:<fpage>85</fpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2014.09.003</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sekhon</surname> <given-names>A.</given-names></name> <name><surname>Singh</surname> <given-names>R.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>DeepDiff: DEEP-learning for predicting DIFFerential gene expression from histone modifications</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>i891</fpage>&#x02013;<lpage>i900</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty612</pub-id><pub-id pub-id-type="pmid">30423076</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Singh</surname> <given-names>R.</given-names></name> <name><surname>Lanchantin</surname> <given-names>J.</given-names></name> <name><surname>Robins</surname> <given-names>G.</given-names></name> <name><surname>Qi</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>DeepChrome: deep-learning for predicting gene expression from histone modifications</article-title>. <source>Bioinformatics</source> <volume>32</volume>, <fpage>i639</fpage>&#x02013;<lpage>i648</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btw427</pub-id><pub-id pub-id-type="pmid">27587684</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Szegedy</surname> <given-names>C.</given-names></name> <name><surname>Wei</surname> <given-names>L.</given-names></name> <name><surname>Yangqing</surname> <given-names>J.</given-names></name> <name><surname>Sermanet</surname> <given-names>P.</given-names></name> <name><surname>Reed</surname> <given-names>S.</given-names></name> <name><surname>Anguelov</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Going deeper with convolutions</article-title>, in <source>IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>, <fpage>1</fpage>&#x02013;<lpage>9</lpage>.</citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>J.</given-names></name> <name><surname>Xiang</surname> <given-names>L.</given-names></name> <name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Gilmore</surname> <given-names>H.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Tang</surname> <given-names>J.</given-names></name> <name><surname>Madabhushi</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>35</volume>, <fpage>119</fpage>&#x02013;<lpage>130</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2015.2458702</pub-id><pub-id pub-id-type="pmid">26208307</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>T.</given-names></name> <name><surname>Zhang</surname> <given-names>H.</given-names></name> <name><surname>Huang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Metaxas</surname> <given-names>D. N.</given-names></name></person-group> (<year>2016</year>). <article-title>Multimodal deep learning for cervical dysplasia diagnosis</article-title>, in <source>International Conference on Medical Image Computing and Computer-Assisted Intervention</source> (<publisher-loc>Boston, MA</publisher-loc>), <fpage>115</fpage>&#x02013;<lpage>123</lpage>.</citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>W.</given-names></name> <name><surname>Liu</surname> <given-names>Q.</given-names></name> <name><surname>Wang</surname> <given-names>S.</given-names></name> <name><surname>Cui</surname> <given-names>Z.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>N.</given-names></name></person-group> (<year>2018</year>). <article-title>Down image recognition based on deep convolutional neural network</article-title>. <source>Inform. Process. Agric.</source> <volume>5</volume>, <fpage>246</fpage>&#x02013;<lpage>252</lpage>. <pub-id pub-id-type="doi">10.1016/j.inpa.2018.01.004</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeng</surname> <given-names>K.</given-names></name> <name><surname>Yu</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>C.</given-names></name> <name><surname>Tao</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Coupled deep autoencoder for single image super-resolution</article-title>. <source>IEEE Trans. Cybernet.</source> <volume>47</volume>, <fpage>27</fpage>&#x02013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.1109/TCYB.2015.2501373</pub-id><pub-id pub-id-type="pmid">26625442</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>S.</given-names></name> <name><surname>Zhou</surname> <given-names>J.</given-names></name> <name><surname>Hu</surname> <given-names>H.</given-names></name> <name><surname>Gong</surname> <given-names>H.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Cheng</surname> <given-names>C.</given-names></name> <name><surname>Zeng</surname> <given-names>J.</given-names></name></person-group> (<year>2016</year>). <article-title>A deep learning framework for modeling structural features of RNA-binding protein targets</article-title>. <source>Nucleic Acids Res.</source> <volume>44</volume>:<fpage>e32</fpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv1025</pub-id><pub-id pub-id-type="pmid">26467480</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work was supported by the Natural Science Foundation of Jiangsu, China (BE2016655 and BK20161196), and the Fundamental Research Funds for China Central Universities (2019B22414). This work made use of the resources supported by the NSFC-Guangdong Mutual Funds for Super Computing Program (2nd Phase), and the Open Cloud Consortium sponsored project resource, supported in part by grants from Gordon and Betty Moore Foundation and the National Science Foundation (USA) and major contributions from OCC members.</p>
</fn>
</fn-group>
</back>
</article>