<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="methods-article" dtd-version="2.3">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2019.01236</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>IBI: Identification of Biomarker Genes in Individual Tumor Samples</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Li</surname>
<given-names>Jie</given-names>
</name>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/771180"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Dong</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Yadong</given-names>
</name>
</contrib>
</contrib-group>
<aff id="aff1"><institution>School of Computer Science and Technology, Harbin Institute of Technology</institution>, <addr-line>Harbin</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Quan Zou, University of Electronic Science and Technology of China, China</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Weijia Zhang, Icahn School of Medicine at Mount Sinai, United States; Jingyang Gao, Beijing University of Chemical Technology, China</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Jie Li, <email xlink:href="mailto:jieli@hit.edu.cn">jieli@hit.edu.cn</email>
</p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>11</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>1236</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>10</month>
<year>2019</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>11</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2019 Li, Wang and Wang</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Li, Wang and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Individual patient biomarkers have an important role in personalized treatment. Although various high-throughput sequencing technologies are widely used in biological experiments, these are usually conducted only once or a few times for each patient, which makes it a challenging problem to identify biomarkers in individual patients. At present, there is a lack of effective methods to identify biomarkers in individual sample data. Here, we propose a novel method, IBI, to identify biomarkers in individual tumor samples. Experimental results from several tumor data sets showed that the proposed method could effectively find biomarker genes for individual patients, including common biomarkers related to the mechanisms of the development of cancer, which can be used to predict survival and drug response in patients. In summary, these results demonstrate that the proposed method offers a new perspective for analyzing individual samples.</p>
</abstract>
<kwd-group>
<kwd>biomarker</kwd>
<kwd>individual sample</kwd>
<kwd>tumor</kwd>
<kwd>regression model</kwd>
<kwd>gene expression data</kwd>
</kwd-group>
<contract-num rid="cn001">Grant No.2016YFC0901905</contract-num>
<contract-num rid="cn002">Grant No. F2016016</contract-num>
<contract-sponsor id="cn001">National Key Research and Development Program of China Stem Cell and Translational Research<named-content content-type="fundref-id">10.13039/501100013290</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">Natural Science Foundation of Heilongjiang Province<named-content content-type="fundref-id">10.13039/501100005046</named-content>
</contract-sponsor>
<counts>
<fig-count count="13"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="35"/>
<page-count count="11"/>
<word-count count="4400"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<title>Introduction</title>
<p>Biomarker discovery is critical for cancer diagnostics, prognosis, and monitoring of therapy in clinical trials. With the development of high-throughput biochip technologies such as next-generation sequencing, massive quantities of cancer genomic data are being generated in the healthcare field, which offers an opportunity to identify high-quality cancer biomarkers for use in personalized medicine. Therefore, various computational methods have been proposed to identify cancer biomarkers. At present, the most commonly used methods are statistical tests, such as t-test, KS-test, and Wilcoxon&#x2019;s rank sum test (<xref ref-type="bibr" rid="B16">Li et al., 2007</xref>; <xref ref-type="bibr" rid="B6">Demb&#xe9;l&#xe9; and Kastner, 2014</xref>; <xref ref-type="bibr" rid="B21">Love et al., 2014</xref>; <xref ref-type="bibr" rid="B22">Moore et al., 2016</xref>; <xref ref-type="bibr" rid="B33">Wang et al., 2018</xref>), which identify differentially expressed genes (DEGs) from two types of samples and choose the group of genes with the lower p-value as potential biomarkers. However, the method often ignores and misses information between genes (<xref ref-type="bibr" rid="B15">Lewis-Wambi et al., 2008</xref>). Machine learning algorithms and statistical models also are widely used to identify cancer biomarkers. For example, the 70-gene biomarkers (<xref ref-type="bibr" rid="B32">Van&#x2019;t Veer et al., 2002</xref>), wound-response gene biomarkers (<xref ref-type="bibr" rid="B4">Chang et al., 2005</xref>), and several of our gene biomarkers (<xref ref-type="bibr" rid="B17">Li et al., 2008</xref>; <xref ref-type="bibr" rid="B18">Li et al., 2010</xref>; <xref ref-type="bibr" rid="B35">Zhang et al., 2017</xref>) are all identified using machine learning algorithms. The 21-gene biomarkers (<xref ref-type="bibr" rid="B31">Van&#x2019;t Veer and Bernards, 2008</xref>) and immunotherapy response biomarkers (<xref ref-type="bibr" rid="B24">Ock et al., 2017</xref>; <xref ref-type="bibr" rid="B12">Jiang et al., 2018</xref>) are based on statistical models.</p>
<p>However, the above methods are only able to identify biomarkers in two groups of samples, not in an individual sample. As cancer is a complex and heterogeneous disease, different patients have differences in pathogenesis and need different treatments. Thus, there is a need for biomarkers for individual patients that reflect their status. Currently, high-throughput biological experiments are usually conducted once or a few times for a single patient, which makes it a challenging problem to analyze single samples and, in particular, to identify biomarkers in individual patients. Some algorithms have been developed to analyze single samples. <xref ref-type="bibr" rid="B26">Rezwan et al. (2015)</xref> used the Crawford-Howell t-test to analyze methylation data of single samples and identified hypomethylation at different sites. However, this method could only detect differences in a single molecular element among different samples and may ignore the relationships of different molecular elements in the same sample. <xref ref-type="bibr" rid="B20">Liu et al. (2017)</xref> proposed the sDNB (single-sample dynamic network biomarkers) method to detect early-warning signals or critical states in individual patients using gene expression data. sDNB detects changes in gene expression levels of a pair of genes relative to reference samples and considers the local information of a gene in network. <xref ref-type="bibr" rid="B7">Drier et al. (2013)</xref> proposed an algorithm to analyze single tumor samples using pathway-level information instead of gene-level information. Pathways were detected that were significantly associated with survival of glioblastoma and colorectal cancer patients. However, a set of genes in the same pathway have similar functions; this means that models based on redundant features (biomarkers) are usually more complex.</p>
<p>Here, we propose a novel method, IBI (identification of biomarker genes in individual tumor samples), to identify biomarker genes in individual tumor samples using gene expression data. An overview of the IBI method is given in <xref ref-type="fig" rid="f1">
<bold>Figure 1</bold>
</xref>. First, DEGs in tumor and normal samples are identified. Then, regression models are constructed using the selected DEGs, and residuals of each gene in different samples are analyzed using the kernel density estimation (KDE). Finally, we assess the degree of change of each gene according to the credibility interval (CI) of its residuals to decide which genes are biomarkers of the individual sample.</p>
<fig id="f1" position="float">
<label>Figure 1</label>
<caption>
<p>Overview of IBI method.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g001.tif"/>
</fig>
</sec>
<sec id="s2" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec id="s2_1">
<title>Data Collection and Preprocessing</title>
<p>The proposed method was used to analyze three gene expression data sets: TCGA-BRCA (<xref ref-type="bibr" rid="B29">Tomczak et al., 2015</xref>), GSE63557 (<xref ref-type="bibr" rid="B14">Lesterhuis et al., 2015</xref>), and GSE35640 (<xref ref-type="bibr" rid="B30">Ulloa-Montoya et al., 2013</xref>). TCGA-BRCA consists of 1,090 breast cancer samples and 113 normal tissue samples. GSE63557 contains AB1-HA tumor data from mice during immunotherapy with 10 anti-CTLA-4 immunotherapeutic response samples and 10 non-response samples, and GSE35640 consists of advanced melanoma data with 22 MAGE&#x2212;A3 immunotherapeutic response and 34 non-response samples. The first data set contains RNA-seq data, which was preprocessed using DESeq2 (<xref ref-type="bibr" rid="B21">Love et al., 2014</xref>), and the latter two data sets were preprocessed using the z-score.</p>
</sec>
<sec id="s2_2">
<title>Identification of Differentially Expression Genes</title>
<p>Assuming we have gene expression data with two types of samples and genes, let each sample be labeled with either &#x201c;+&#x201d; or &#x201c;&#x2212;&#x201d;; <italic>n</italic>
<sub>1</sub> and <italic>n</italic>
<sub>2</sub> are the number of samples with label &#x201c;+&#x201d; and &#x201c;&#x2212;&#x201d;, respectively (<italic>n</italic> = <italic>n</italic>
<sub>1</sub>+ <italic>n</italic>
<sub>2</sub>). <italic>y</italic>
<italic>
<sub>ji</sub>
</italic> is the expression value of the <italic>j</italic>th gene of the <italic>i</italic>th sample with label &#x201c;+&#x201d;, and <italic>x</italic>
<italic>
<sub>ji</sub>
</italic> is the expression value of the <italic>j</italic>th gene of the <italic>i</italic>th sample with label &#x201c;&#x2212;&#x201d;. <italic>q</italic> DEGs are obtained using the robust algorithm (<xref ref-type="bibr" rid="B21">Love et al., 2014</xref>) or GEO2R (<xref ref-type="bibr" rid="B27">Smyth, 2004</xref>).</p>
</sec>
<sec id="s2_3">
<title>Average Sample</title>
<p>Let average samples with label &#x201c;+&#x201d; and &#x201c;&#x2212;&#x201d; be <inline-formula>
<mml:math display="inline" id="im1">
<mml:mrow><mml:msup><mml:mi>u</mml:mi><mml:mo>+</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mn>1</mml:mn><mml:mo>+</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>2</mml:mn><mml:mo>+</mml:mo></mml:msubsup><mml:mo>&#x2026;</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mi>q</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mrow><mml:mo>]</mml:mo><mml:mrow><mml:mo>&#xa0;</mml:mo><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:msup><mml:mi>u</mml:mi><mml:mo>&#x2212;</mml:mo></mml:msup><mml:mo>=</mml:mo></mml:mrow><mml:mo>[</mml:mo></mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>2</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:mo>&#x2026;</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mi>q</mml:mi><mml:mo>&#x2212;</mml:mo></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>, respectively.</p>
<disp-formula>
<label>(1)</label>
<mml:math display="block" id="M1">
<mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mi>j</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mstyle></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>q</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow/></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula>
<disp-formula>
<label>(2)</label>
<mml:math display="block" id="M2">
<mml:mrow><mml:mtext>&#xa0;</mml:mtext><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mi>j</mml:mi><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>q</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math>
</disp-formula>
</sec>
<sec id="s2_4">
<title>Regression Model Based on Average and Single Samples</title>
<p>Let
<inline-formula>
<mml:math display="inline" id="im2">
<mml:mrow><mml:mtext>&#xa0;</mml:mtext><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mtext>'</mml:mtext></mml:msubsup></mml:mrow></mml:math>
</inline-formula> be the expression value of the <italic>j</italic>th DEG of the <italic>i</italic>th sample with label &#x201c;+&#x201d; and 
<inline-formula>
<mml:math display="inline" id="im3">
<mml:mrow><mml:mtext>&#xa0;</mml:mtext><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mtext>'</mml:mtext></mml:msubsup></mml:mrow></mml:math>
</inline-formula> the expression value of the jth DEG of the <italic>i</italic>th sample with label &#x201c;&#x2212;.&#x201d; For the <italic>i</italic>th sample with label &#x201c;+,&#x201d; 
<inline-formula>
<mml:math display="inline" id="im4">
<mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mrow><mml:mn>1</mml:mn><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:msub><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mrow><mml:mn>2</mml:mn><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x2009;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:msub><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mrow><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>, 
<inline-formula>
</inline-formula> 
<inline-formula>
<mml:math display="inline" id="im5">
<mml:mrow><mml:mtext>&#xa0;</mml:mtext><mml:msub><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</inline-formula>
can be predicted using the following regression model according to 
<inline-formula>
<mml:math display="inline" id="im6">
<mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mi>j</mml:mi><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math>
</inline-formula>:</p>
<disp-formula>
<label>(3)</label>
<mml:math display="block" id="M3">
<mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mo>&#xa0;</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>0</mml:mn><mml:mo>+</mml:mo></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>1</mml:mn><mml:mo>+</mml:mo></mml:msubsup><mml:msubsup><mml:mi>u</mml:mi><mml:mi>j</mml:mi><mml:mo>+</mml:mo></mml:msubsup><mml:mo>&#xa0;</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>q</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>&#xa0;</mml:mo></mml:mrow></mml:math>
</disp-formula>
<p>where 
<inline-formula>
<mml:math display="inline" id="im7">
<mml:mrow><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>0</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math>
</inline-formula>
 and 
<inline-formula>
<mml:math display="inline" id="im8">
<mml:mrow><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>1</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow></mml:math>
</inline-formula>
 are the regression coefficients estimated according to a set of data 
<inline-formula>
<mml:math display="inline" id="im9">
<mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>1</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>, 
<inline-formula>
<mml:math display="inline" id="im10">
<mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>2</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>
, &#x2026;, 
<inline-formula>
<mml:math display="inline" id="im11">
<mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>2</mml:mn><mml:mo>+</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>
, using the least squares method.</p>
<p>Similarity, for the <italic>i</italic>th sample with label &#x201c;&#x2212;&#x201d; 
<inline-formula>
<mml:math display="inline" id="im12">
<mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#xa0;</mml:mtext><mml:msubsup><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>i</mml:mi></mml:mrow><mml:mtext>'</mml:mtext></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mi>i</mml:mi></mml:mrow><mml:mtext>'</mml:mtext></mml:msubsup><mml:mo>&#x2026;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mtext>'</mml:mtext></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mtext>&#xa0;</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:math>
</inline-formula>
<inline-formula>
<mml:math display="inline" id="im13">
<mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mtext>'</mml:mtext></mml:msubsup><mml:mtext>&#xa0;</mml:mtext></mml:mrow></mml:math>
</inline-formula>
 can be predicted using the following regression model according to 
<inline-formula>
<mml:math display="inline" id="im14">
<mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mi>j</mml:mi><mml:mo>&#x2212;</mml:mo></mml:msubsup></mml:mrow></mml:math>
</inline-formula>:</p>
<disp-formula>
<label>(4)</label>
<mml:math display="block" id="M4">
<mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mrow><mml:mover accent="true"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>'</mml:mo></mml:msubsup></mml:mrow><mml:mo stretchy="true">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>0</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:msubsup><mml:mi>u</mml:mi><mml:mi>j</mml:mi><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:mo>&#xa0;</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>q</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>&#xa0;</mml:mo></mml:mrow></mml:math>
</disp-formula>
<p>where 
<inline-formula>
<mml:math display="inline" id="im15">
<mml:mrow><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>0</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup></mml:mrow></mml:math>
</inline-formula> and 
<inline-formula>
<mml:math display="inline" id="im16">
<mml:mrow><mml:msubsup><mml:mi>&#x3b2;</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup></mml:mrow></mml:math>
</inline-formula>
 are the regression coefficients estimated according to a set of data 
<inline-formula>
<mml:math display="inline" id="im17">
<mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>, 
<inline-formula>
<mml:math display="inline" id="im18">
<mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mn>2</mml:mn><mml:mo>&#x2212;</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>
, &#x2026;, 
<inline-formula>
<mml:math display="inline" id="im19">
<mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mi>q</mml:mi><mml:mo>&#x2212;</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula> using the least squares method.</p>
</sec>
<sec id="s2_5">
<title>Algorithm for Identifying Biomarker Genes of a Single Sample</title>
<p>Among <italic>q</italic> DEGs, expression values of some genes of a single sample may undergo very significant changes compared with their average values, i.e., the observed values of these genes are far from regression line. These genes are called biomarker genes of the single sample. The degree of the significant difference can be calculated using the residual value between the predicted value and observed value.</p>
<p>For the <italic>i</italic>th sample with label &#x201c;+,&#x201d; the residual value of its the <italic>j</italic>th DEG is:</p>
<disp-formula>
<label>(5)</label>
<mml:math display="block" id="M5">
<mml:mrow><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>+</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="true">^</mml:mo></mml:mover><mml:mo>&#xa0;</mml:mo><mml:mo>,</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mi>q</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>
</disp-formula>
<p>Similarity, for the <italic>i</italic>th sample with label &#x201c;&#x2212;&#x201d;, the residual value of it&#x2019;s the <italic>j</italic>th DEG is:</p>
<disp-formula>
<label>(6)</label>
<mml:math display="block" id="M6">
<mml:mrow><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>&#x2212;</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>'</mml:mo></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mover accent="true"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>'</mml:mo></mml:msubsup></mml:mrow><mml:mo stretchy="true">^</mml:mo></mml:mover><mml:mo>&#xa0;</mml:mo><mml:mo>,</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mo>&#xa0;</mml:mo><mml:mi>q</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math>
</disp-formula>
<p>To obtain biomarker genes of the <italic>i</italic>th sample with label &#x201c;+&#x201d;, the KDE is introduced to estimate the probability density function
<inline-formula>
<mml:math display="inline" id="im20">
<mml:mrow><mml:mo>&#xa0;</mml:mo><mml:mover accent="true"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mrow/></mml:msubsup></mml:mrow><mml:mo stretchy="true">^</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula> of residual values:
<inline-formula>
<mml:math display="inline" id="im21">
<mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mi>i</mml:mi></mml:mrow><mml:mo>+</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mi>i</mml:mi></mml:mrow><mml:mo>+</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:mo>...</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>q</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>+</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula>. Its kernel density estimator with Gaussian kernel K is as follows:</p>
<disp-formula>
<label>(7)</label>
<mml:math display="block" id="M7">
<mml:mrow><mml:mover accent="true"><mml:mrow><mml:msubsup><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mrow/></mml:msubsup></mml:mrow><mml:mo stretchy="true">^</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>q</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>q</mml:mi></mml:msubsup><mml:mi>K</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>+</mml:mo></mml:msubsup></mml:mrow><mml:mi>h</mml:mi></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math>
</disp-formula>
<disp-formula>
<label>(8)</label>
<mml:math display="block" id="M8">
<mml:mrow><mml:mi>K</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msqrt><mml:mrow><mml:mn>2</mml:mn><mml:mi>&#x3c0;</mml:mi></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msup><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msup></mml:mrow></mml:math>
</disp-formula>
<p>where h is a smoothing parameter called the bandwidth (h &gt; 0). Let &#x3a6; be the cumulative distribution function of the kernel density estimator; then, the CI at confidence level &#x3b1; is</p>
<disp-formula>
<label>(9)</label>
<mml:math display="block" id="M9">
<mml:mrow><mml:mi>C</mml:mi><mml:msub><mml:mi>I</mml:mi><mml:mi>&#x3b1;</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>&#x3a6;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac><mml:mi>&#x3b1;</mml:mi><mml:mn>2</mml:mn></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mstyle displaystyle="true"><mml:mo>&#x222a;</mml:mo></mml:mstyle><mml:mtext>&#x200b;</mml:mtext></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x3a6;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mi>&#x3b1;</mml:mi><mml:mn>2</mml:mn></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#xa0;</mml:mo></mml:mrow></mml:math>
</disp-formula>
<p>The <italic>j</italic>th gene is considered a biomarker gene of the <italic>i</italic>th sample with label &#x201c;+&#x201d; (<italic>n</italic>
<sub>1</sub> &#x2265; <italic>i</italic>&#x2265; 1) if 
<inline-formula>
<mml:math display="inline" id="im22">
<mml:mrow><mml:mi>&#x3a6;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mo>+</mml:mo></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mi>C</mml:mi><mml:msub><mml:mi>I</mml:mi><mml:mi>&#x3b1;</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>
. Similarity, we can obtain the biomarker gene of the <italic>i</italic>th sample with label &#x201c;&#x2212;&#x201d;(<italic>n</italic>
<sub>2</sub> &#x2265;<italic>i</italic> &#x2265;1).</p>
</sec>
</sec>
<sec id="s3" sec-type="results">
<title>Results</title>
<sec id="s3_1">
<title>Performance Evaluation</title>
<p>It was somewhat difficult to directly evaluate the performance of the proposed method. Three methods were employed to evaluate the power of the method.</p>
<list list-type="order">
<list-item>
<p>Statistical test: The biomarker genes of each sample should be specific, that is, their expression values in the sample should be significantly different from those of other samples. We designed a method to test such differences, as follows. First, biomarker genes of sample <italic>S</italic>
<italic>
<sub>i</sub>
</italic> are selected and their expression values extracted from all samples. Then, the expression values of each biomarker gene in different samples are sorted respectively and used to construct a rank matrix. The <italic>i</italic>th row vector, <italic>R</italic>
<italic>
<sub>i</sub>
</italic>, of the matrix denotes orders of biomarker genes of <italic>S</italic>
<italic>
<sub>i</sub>
</italic>. Finally, the Kolmogorov-Smirnov test is performed to determine whether there is a significant difference between <italic>R</italic>
<italic>
<sub>i</sub>
</italic> and <italic>R</italic>
<italic>
<sub>j</sub>
</italic> (<italic>j</italic>&#x2260;<italic>i</italic>).</p>
</list-item>
<list-item>
<p>Survival analysis: The biomarker genes of each tumor sample should reflect its characteristics, namely, it should be possible to use biomarker genes to classify tumor samples into high- and low-risk groups and predict the survival risk of tumor patients.</p>
</list-item>
<list-item>
<p>Validation <italic>via</italic> biological evidence: The biomarker genes of each tumor sample should reflect the pathogenesis of cancer, that is, they should have been reported to be associated with tumor development in the published literature.</p>
</list-item>
</list>
</sec>
<sec id="s3_2">
<title>Experimental Results for TCGA-BRCA</title>
<p>The experiments on TCGA-BRCA were performed as follows. First, 6120 DEGs in two groups of samples were identified using DESeq2 (<xref ref-type="bibr" rid="B21">Love et al., 2014</xref>) at a 95% confidence level and absolute value of log fold change &gt; 1. Next, average tumor and normal samples based on 6120 DEGs were obtained using Equations. (1) and (2). Then, 1,090 (113) regression models were constructed based on average tumor (normal) samples and 1,090 tumor (113 normal) samples, respectively; an example is shown in <xref ref-type="fig" rid="f2">
<bold>Figure 2</bold>
</xref>. The residuals of the genes of each sample were calculated using Equations (5) and (6); <xref ref-type="fig" rid="f3">
<bold>Figure 3</bold>
</xref> shows residual values of biomarker genes from two samples. Finally, biomarker genes for each sample were identified using Equations (7), (8), and (9). The distribution of the number of biomarker genes in the 1,090 (113) tumor (normal) samples is shown in <xref ref-type="fig" rid="f4">
<bold>Figure 4</bold>
</xref>.</p>
<fig id="f2" position="float">
<label>Figure 2</label>
<caption>
<p>Regression model based on tumor sample TCGA-Z7-A8R6-01A-11R-A41B-07 and average tumor sample. The points in the upper-left (lower-right) partition are two biomarker genes with the highest (lowest) expression levels.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g002.tif"/>
</fig>
<fig id="f3" position="float">
<label>Figure 3</label>
<caption>
<p>Residuals of genes of a single sample. <bold>(A)</bold> Breast tumor sample TCGA-A2-A0D2-01A-21R-A034-07; <bold>(B)</bold> normal tissue sample: TCGA-A7-A0D9-11A-53R-A089-07. The green points denote the two biomarker genes with the highest/lowest expression levels in the two samples.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g003.tif"/>
</fig>
<fig id="f4" position="float">
<label>Figure 4</label>
<caption>
<p>Distribution of the number of biomarker genes in <bold>(A)</bold> 1090 breast tumor samples and <bold>(B)</bold> 113 normal tissue samples.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g004.tif"/>
</fig>
<p>As shown clearly in <xref ref-type="fig" rid="f2">
<bold>Figures 2</bold>
</xref> and <xref ref-type="fig" rid="f3">
<bold>3</bold>
</xref>, genes were distributed in two main areas. The genes scattered in the upper-left of the plots are those with higher expression levels, whereas genes in the lower-right portion have lower expression values, in the single tumor/normal sample. In <xref ref-type="fig" rid="f2">
<bold>Figure 2</bold>
</xref>, there are several spots that are distant from the regression lines. These spots represent biomarker genes of the single sample. <xref ref-type="fig" rid="f3">
<bold>Figure 3</bold>
</xref> shows more clearly which genes had very significant variation in expression. For example, the residuals of <italic>CLEC3A</italic> and <italic>CCNO</italic> were 4.92 and 3.83, respectively, significantly higher than the values for other genes; while the residuals of <italic>HIST3H2A</italic> and <italic>TNNT1</italic> were &#x2212;3.33 and &#x2212;2.95, respectively, significantly lower than those of other genes.</p>
<p>It can also be seen from <xref ref-type="fig" rid="f4">
<bold>Figure 4</bold>
</xref> that the number of biomarker genes varied among different samples. Some tumor samples had more than 315 biomarker genes, while others had about 290. The mean numbers of biomarker genes in the tumor samples and normal samples were 304.9 and 305, respectively. In addition, the biomarker genes of different samples were also different. In 1090 tumor samples and 113 normal samples, the biomarker genes had different frequencies (a biomarker gene has higher frequency if it is found in more samples). The top 15 biomarker genes with significantly different frequencies in tumor and normal samples are listed in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary Table 1</bold>
</xref>. These genes were common biomarkers of most tumor samples, and they had higher frequency in tumor samples than in normal samples. Therefore, these genes were likely to be related to the development of breast cancer. To test our hypothesis, we searched the literature using public databases and found that 14 of the 15 genes were indeed related to the development of breast cancer. The top gene was <italic>S100A7</italic>, which has been found to be expressed in several tissues including breast adenocarcinomas and squamous carcinomas of the head and neck, the cervix, and the lung (<xref ref-type="bibr" rid="B8">Emberley et al., 2004</xref>); <italic>S100A7</italic> is also related survival of breast cancer patients (<xref ref-type="bibr" rid="B9">Emberley, 2003</xref>). <italic>CLEC3A</italic> had the highest frequency in tumor samples; its overexpression promotes tumor progression and poor prognosis in breast invasive ductal cancer (IDC) and is related to higher lymph node and poorer overall survival (OS) of breast IDC (<xref ref-type="bibr" rid="B23">Ni et al., 2018</xref>). <italic>PRAME</italic> has a tumor-promoting role in triple-negative breast cancer, increasing cancer cell motility through the epithelial-to-mesenchymal transition (EMT) gene reprogramming. Therefore, <italic>PRAME</italic> could serve as a prognostic biomarker and/or therapeutic target in triple-negative breast cancer (<xref ref-type="bibr" rid="B1">Al-Khadairi et al., 2019</xref>). <xref ref-type="bibr" rid="B13">Kammerer et al. (2016)</xref> suggested that patients with estrogen receptor-positive breast cancer might be stratified into high- and low-risk groups based on the <italic>KCNJ3</italic> levels in the tumor. <italic>CST1</italic> was found to be generally upregulated in breast cancer at both the mRNA and the protein level. Furthermore, OS and disease-free survival in the low <italic>CST1</italic> expression subgroup were significantly superior to those in the high <italic>CST1</italic> expression subgroup, indicating that <italic>CST1</italic> could be a prognostic indicator and a potential therapeutic target for breast cancer (<xref ref-type="bibr" rid="B5">Dai et al., 2017</xref>). <xref ref-type="bibr" rid="B34">Xuan et al. (2015)</xref> reported that higher expression of <italic>MMP1</italic> in breast cancer might play a crucial part in promoting breast cancer metastasis. <xref ref-type="bibr" rid="B25">Powell et al. (2018)</xref> demonstrated that <italic>CEACAM5</italic> was a clinically relevant driver of breast cancer metastasis. <italic>NKAIN1</italic> is associated with OS in breast cancer (<xref ref-type="bibr" rid="B28">Su et al., 2019</xref>). <italic>DSCAM-AS1</italic> promotes tumor growth in breast cancer by reducing miR-204-5p and upregulating <italic>RRM2</italic> (<xref ref-type="bibr" rid="B19">Liang et al., 2019</xref>). Overexpression of <italic>CEACAM6</italic> promotes migration and invasion of estrogen-deprived breast cancer cells (<xref ref-type="bibr" rid="B15">Lewis-Wambi et al., 2008</xref>). <xref ref-type="bibr" rid="B2">Bhakta et al. (2018)</xref> suggested that anti-GFRA1-vcMMAE ADC might provide a targeted therapeutic opportunity for luminal A breast cancer patients. <italic>BMPR1B</italic> is related to proliferation of breast cancer cells (<xref ref-type="bibr" rid="B3">Bokobza et al., 2009</xref>). <xref ref-type="bibr" rid="B11">Jia et al. (2016)</xref> identified <italic>COL11A1</italic> as a highly specific biomarker of activated cancer-associated fibroblasts (CAFs), which could promote breast cancer and inhibit pancreatic cancer. In summary, 14 of the top 15 biomarker genes have been reported to be associated with breast cancer. Therefore, these results demonstrate that the proposed method can effectively identify biomarkers related to cancer.</p>
<p>Statistical tests were performed to evaluate whether expression levels of biomarker genes of a sample were significantly different compared with those of other samples. As the biomarker gene set of each sample was represented by a p-value vector with dimension n, 1,090*1,089 [n(n&#x2212;1)], where n is the number of samples) p-values were obtained for the 1090 tumor samples, and 113*112 p-values for the 113 normal samples; 1,186,999 (99.99%) and 12,626 (99.76%) of these p-values were less than 0.05 for the tumor samples and normal samples, respectively. These results indicate that there were significant differences between the expression levels of the identified biomarker genes of a sample and those of other samples, that is, the proposed method can effectively identify the biomarker genes of a single sample.</p>
<p>The frequencies of biomarker genes in tumor and normal samples were different. Here, we mainly analyzed biomarker genes whose frequency was higher in tumor samples than in normal samples, to explore which genes might have important roles in survival prediction and development of breast cancer. We selected 305 biomarker genes with higher frequency in tumor samples, and clustered the tumor samples into two groups using the multiple survival screening (MSS) algorithm (<xref ref-type="bibr" rid="B18">Li et al., 2010</xref>). Survival was significantly different between the two groups (p-value = 0.0089) (<xref ref-type="fig" rid="f5">
<bold>Figure 5</bold>
</xref>). This means these biomarker genes are important features of breast cancer and can be used to distinguish tumor patients into high- and low-risk groups (here, we removed two samples with the negative follow-up-time, so there were 1,088 samples participating in survival analysis).</p>
<fig id="f5" position="float">
<label>Figure 5</label>
<caption>
<p>Kaplan-Meier survival curves based on 305 tumor biomarker genes. In the high-risk group (red line), there are 329 tumor samples. In the low-risk group (blue line), there are 759 tumor samples.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g005.tif"/>
</fig>
</sec>
<sec id="s3_3">
<title>Experimental Results for Immunotherapeutic Response Samples</title>
<p>The proposed method was also used to analyze mouse AB1-HA tumor data: GSE63557. A total of 8,042 DEGs in two groups of samples were identified using GEO2R (<xref ref-type="bibr" rid="B27">Smyth, 2004</xref>) at a 95% confidence level. Regression models of 10 anti-CTLA-4 immunotherapeutic response samples and 10 non-response samples were constructed; one of these is shown in <xref ref-type="fig" rid="f6">
<bold>Figure 6</bold>
</xref>. <xref ref-type="fig" rid="f7">
<bold>Figure 7</bold>
</xref> shows residual values of biomarker genes from two samples. The number of biomarker genes of 10 response samples and 10 non-response samples is shown in <xref ref-type="fig" rid="f8">
<bold>Figure 8</bold>
</xref>. In <xref ref-type="fig" rid="f6">
<bold>Figures 6</bold>
</xref> and <xref ref-type="fig" rid="f7">
<bold>7</bold>
</xref>, there are several genes that are far from the regression lines. For example, the residuals of <italic>Krt6b</italic> and <italic>Stfa3</italic> were 2.07 and 2.26, respectively, significantly higher than those of other genes; the residuals of <italic>Chil3</italic> and <italic>Igkv2-109</italic> were &#x2212;1.82 and &#x2212;2.10, respectively, significantly lower than those of other genes.</p>
<fig id="f6" position="float">
<label>Figure 6</label>
<caption>
<p>Regression model based on response sample GSM1552230 and the average response sample.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g006.tif"/>
</fig>
<fig id="f7" position="float">
<label>Figure 7</label>
<caption>
<p>Residuals of biomarker genes <bold>(A)</bold> GSM1552230, <bold>(B)</bold> GSM1552221.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g007.tif"/>
</fig>
<fig id="f8" position="float">
<label>Figure 8</label>
<caption>
<p>Number of biomarker genes in <bold>(A)</bold> response samples and <bold>(B)</bold> non-response samples.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g008.tif"/>
</fig>
<p>The number of biomarker genes of different samples is shown in <xref ref-type="fig" rid="f8">
<bold>Figure 8</bold>
</xref>, illustrating the variation between samples. The biomarker genes from different samples were also different. For 10 response samples and 10 non-response samples, the top 15 genes with the most significant differences in frequency are shown in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary Table 2</bold>
</xref>. Four of these genes, <italic>Gzme</italic>, <italic>CD38</italic>, <italic>CD3D</italic>, and <italic>Chil3</italic>, appeared in the important cancer modules identified by <xref ref-type="bibr" rid="B14">Lesterhuis et al. (2015)</xref> However, the top gene, <italic>Jchain</italic>, had not been identified as a member of these important cancer modules; notably, <italic>Jchain</italic> was also found to be the most important of the anti-CTLA-4 immunotherapeutic response biomarker genes in our study, with frequencies in response and non-response samples of 80% and 0%, respectively. This suggests that <italic>Jchain</italic> is related to immunotherapeutic response. GeneCards (<uri xlink:href="https://www.genecards.org/">https://www.genecards.org/</uri>) indeed confirms that <italic>Jchain</italic> has an important role in immune response. Moreover, <italic>Iglj1</italic>, <italic>Cd38</italic>, and <italic>Cd3d</italic> are also immune response related. This demonstrates that the IBI method can detect important genes contributing to the immunotherapeutic response mechanism.</p>
<p>According to the statistical tests, 100% of p-values were less than 0.05 in both response and non-response samples. The rank matrix of each response sample is shown in <xref ref-type="fig" rid="f9">
<bold>Figure 9A</bold>
</xref>. These results indicate that there are significant differences between the identified response biomarker genes of a sample and those of other samples, that is, the proposed method also can effectively identify biomarker genes of individual samples even when fewer samples are used. We wanted to analyze biomarker genes whose frequency was higher in response samples than in non-response samples, and estimate their ability to predict survival in AB1-HA tumor samples. However, there was no follow-up information for AB1-HA mice. The selected 392 biomarker genes with higher frequency were tested against a human mesothelioma data set (TCGA-MESO, <uri xlink:href="https://portal.gdc.cancer.gov">https://portal.gdc.cancer.gov</uri>). Notably, these biomarker genes could still effectively distinguish all patients into high- and low-risk groups (<xref ref-type="fig" rid="f9">
<bold>Figure 9B</bold>
</xref>) with a p-value of 1.57&#xd7;10<sup>-5</sup>. These results further support the validity of the proposed method.</p>
<fig id="f9" position="float">
<label>Figure 9</label>
<caption>
<p>
<bold>(A)</bold> Rank matrix of each response sample. <bold>(B)</bold> Kaplan-Meier survival curves for human mesothelioma tumor samples based on biomarker genes from mouse AB1-HA tumor samples; p-value=1.57&#xd7;10<sup>-5</sup>. High-risk group includes 44 samples; low-risk group consists of 40 samples.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g009.tif"/>
</fig>
</sec>
<sec id="s3_4">
<title>Experimental Results for Advanced Melanoma Data</title>
<p>The proposed method was used to analyze advanced melanoma data: GSE35640. A total of 1420 DEGs were identified in 22 MAGE&#x2212;A3 immunotherapeutic response and 34 non-response samples using GEO2R (<xref ref-type="bibr" rid="B27">Smyth, 2004</xref>) at a 95% confidence level. Regression models of 22 MAGE&#x2212;A3 immunotherapeutic response and 34 non-response samples were constructed; one of these is shown in <xref ref-type="fig" rid="f10">
<bold>Figure 10</bold>
</xref>. <xref ref-type="fig" rid="f11">
<bold>Figure 11</bold>
</xref> shows residual values of biomarker genes from two samples. The number of biomarker genes of 22 response samples and 34 non-response samples is shown in <xref ref-type="fig" rid="f12">
<bold>Figure 12</bold>
</xref>.</p>
<fig id="f10" position="float">
<label>Figure 10</label>
<caption>
<p>Regression model based on response sample GSM872356 and the average response sample from GSE35640 gene expression data.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g010.tif"/>
</fig>
<fig id="f11" position="float">
<label>Figure 11</label>
<caption>
<p>Residuals of biomarker genes. <bold>(A)</bold> GSM872356, <bold>(B)</bold> GSM872328.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g011.tif"/>
</fig>
<fig id="f12" position="float">
<label>Figure 12</label>
<caption>
<p>Number of biomarker genes in <bold>(A)</bold> response samples and <bold>(B)</bold> non-response samples.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g012.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="f12">
<bold>Figure 12</bold>
</xref>, there were small differences in the number of biomarkers from different samples. The mean number of biomarker genes in response samples was 70. The top 15 genes with the most significant difference of frequency in 22 response samples and 34 non-response samples are shown in <xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary Table 3</bold>
</xref>. We proposed that these genes were likely to be mainly immune or tumor related. To test our hypothesis, we searched GeneCards for these genes and found that some of them play important roles in the development of immune-related cells. For example, <italic>MS4A1</italic> is associated with the development of B-cells into plasma cells; <italic>CD37</italic> may play a part in T-cell&#x2013;B-cell interactions; <italic>CD5L</italic> participates in obesity-associated autoimmunity; <italic>MMP8</italic>, <italic>IRF5</italic>, and <italic>RHOF</italic> are related to innate immune pathways; <italic>MMP9</italic> has a role in tumor-associated tissue remodeling; and <italic>TRAM1L1</italic> is related to the well-known cancer-related NF-kB pathway. This demonstrated that the IBI method could detect important genes contributing drug response mechanisms and help to elucidate immunotherapeutic response mechanisms. In the statistical tests, 96.96 and 95.72% of p-values were less than 0.05 in the response and non-response samples, respectively. These results also indicate that biomarker genes of a sample show significant differences compared with those of other samples, that is, the proposed method can also effectively identify MAGE&#x2212;A3 immunotherapeutic response biomarker genes in individual advanced melanoma samples even with fewer samples.</p>
<p>We wanted to analyze biomarker genes whose frequency was higher in response samples than in non-response samples, and estimate their ability to predict survival in advanced melanoma. However, there was no follow-up information in GSE35640, so we used skin cutaneous melanoma gene expression data (TCGA-SKCM) for the survival analysis. The selected 70 biomarker genes were tested against TCGA-SKCM, showing that these biomarker genes could effectively distinguish skin cutaneous melanoma patients into high- and low-risk groups (<xref ref-type="fig" rid="f13">
<bold>Figure 13</bold>
</xref>), with a p-value of 0.016. These results indicate that the proposed method performs well. In their original paper, <xref ref-type="bibr" rid="B30">Ulloa-Montoya et al. (2013)</xref> identified 84 gene expression signatures associated with response to MAGE-A3 immunotherapy in metastatic melanoma and non-small-cell lung cancer, whereas 61 of the 84 genes were chosen as biomarker genes by our proposed method (e.g., <italic>CD86</italic>, <italic>CCL5</italic>, and <italic>IRF1</italic>). These genes were mainly immune related and were involved in interferon gamma pathways and specific chemokines. Experimental results showed that pretreatment MAGE-A3 immunotherapy in metastatic melanoma influenced the tumor&#x2019;s immune microenvironment and the patient&#x2019;s clinical response. The proposed method could be used to identify these biomarker genes and predict the influence of MAGE-A3 immunotherapy on survival in metastatic melanoma (<xref ref-type="fig" rid="f13">
<bold>Figure 13</bold>
</xref>).</p>
<fig id="f13" position="float">
<label>Figure 13</label>
<caption>
<p>Kaplan-Meier survival curves for TCGA-SKCM based on biomarker genes from GSE35640; p-value = 0.016. There were 281 and 166 samples in the high-risk and low-risk groups, respectively.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fgene-10-01236-g013.tif"/>
</fig>
</sec>
<sec id="s3_5">
<title>Experimental Results for the Simulated Data</title>
<p>In order to further test the performance of the proposed method, we added a supplemental experiment on the simulated gene expression data. First, the simulated gene expression data with 10 samples 1000 genes is generated using <italic>simulateGEdata</italic> function in the <italic>RUVcorr</italic> (<xref ref-type="bibr" rid="B10">Freytag et al., 2015</xref>) package. Then, 1,000 genes are divided into 10 groups, we increase/decrease gene expression value of the <italic>i</italic>th group of genes in the <italic>i</italic>th sample by an up or down perturbation value. The range of perturbation value is from 0 to mean value of the corresponding gene in 10 samples. Thus, the <italic>i</italic>th group of genes can be considered as biomarker genes of the <italic>i</italic>th sample. Finally, experiment is performed on the simulated data to observe whether the proposed method can find these markers. We repeated the above steps ten times and experimental results shown that the proposed method can effectively identify the biomarker genes of 10 samples. The 99% biomarker genes identified by the proposed method are the predefined biomarkers when the perturbation value is twice (see <xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary Figure 1</bold>
</xref>).</p>
</sec>
</sec>
<sec id="s4" sec-type="discussion">
<title>Discussion</title>
<p>Precision medicine is an active area of cancer research. The key to cancer precision medicine is to find biomarker genes with high performance, and various approaches to identify such genes have been developed. However, identification of biomarker genes for individual tumor samples remains a challenging problem; for many reasons, there is a lack of effective approaches to identify biomarkers in individual patients. Here, we developed a novel approach to address this issue. Experimental results based on several different data sets show that the proposed method can effectively identify biomarker genes of individual human tumor samples, not only from several hundred samples but also from a few samples without clinical information, and even from mouse samples.</p>
</sec>
<sec id="s5">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study: TCGA-BRCA data (found at The Cancer Genome Atlas), GSE63557 (found at Gene Expression Omnibus) and GSE35640 (found at Gene Expression Omnibus).</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>JL and DW designed and implemented the algorithm. JL and DW analyzed the results and wrote the manuscript, and YW made suggestions. All authors read and approved the final manuscript.</p>
</sec>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>This work was partially supported by National Key Research and Development Program of China (Grant No.2016YFC0901905) and the Natural Science Foundation of Heilongjiang Province (Grant No. F2016016).</p>
</sec>
<sec id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>The authors acknowledge the contributions of colleagues in the group.</p>
</ack>
<sec sec-type="supplementary-material" id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2019.01236/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2019.01236/full#supplementary-material</ext-link>
</p>
<supplementary-material xlink:href="DataSheet_1.docx" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Al-Khadairi</surname> <given-names>G.</given-names>
</name>
<name>
<surname>Naik</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Thomas</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Al-Sulaiti</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Rizly</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Decock</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>PRAME promotes epithelial-to-mesenchymal transition in triple negative breast cancer</article-title>. <source>J. Transl. Med.</source> <volume>17</volume>, <fpage>9</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12967-018-1757-3</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bhakta</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Crocker</surname> <given-names>L. M.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Hazen</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Schutten</surname> <given-names>M. M.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>D.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>An anti-GDNF family receptor alpha 1 (GFRA1) antibody&#x2013;drug conjugate for the treatment of hormone receptor-positive breast cancer</article-title>. <source>Mol. Cancer Ther.</source> <volume>17</volume>, <fpage>638</fpage>&#x2013;<lpage>649</lpage>. doi: <pub-id pub-id-type="doi">10.1158/1535-7163.MCT-17-0813</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bokobza</surname> <given-names>S. M.</given-names>
</name>
<name>
<surname>Ye</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Kynaston</surname> <given-names>H. E.</given-names>
</name>
<name>
<surname>Mansel</surname> <given-names>R. E.</given-names>
</name>
<name>
<surname>Jiang</surname> <given-names>W. G.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Reduced expression of BMPR-IB correlates with poor prognosis and increased proliferation of breast cancer cells</article-title>. <source>Cancer Genomics Proteomics</source> <volume>6</volume>, <fpage>101</fpage>.</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname> <given-names>H. Y.</given-names>
</name>
<name>
<surname>Nuyten</surname> <given-names>D. S.</given-names>
</name>
<name>
<surname>Sneddon</surname> <given-names>J. B.</given-names>
</name>
<name>
<surname>Hastie</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Tibshirani</surname> <given-names>R.</given-names>
</name>
<name>
<surname>S&#xf8;rlie</surname> <given-names>T.</given-names>
</name>
<etal/>
</person-group>. (<year>2005</year>). <article-title>Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>102</volume>, <fpage>3738</fpage>&#x2013;<lpage>3743</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.0409462102</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dai</surname> <given-names>D.-N.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Du</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>S.-B.</given-names>
</name>
<name>
<surname>Lu</surname> <given-names>S.-X.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>Elevated expression of CST1 promotes breast cancer progression and predicts a poor prognosis</article-title>. <source>J. Mol. Med.</source> <volume>95</volume>, <fpage>873</fpage>&#x2013;<lpage>886</lpage>. doi: <pub-id pub-id-type="doi">10.1007/s00109-017-1537-1</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Demb&#xe9;l&#xe9;</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Kastner</surname> <given-names>P.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Fold change rank ordering statistics: a new method for detecting differentially expressed genes</article-title>. <source>BMC Bioinf.</source> <volume>15</volume>, <fpage>14</fpage>. doi: <pub-id pub-id-type="doi">10.1186/1471-2105-15-14</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drier</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Sheffer</surname> <given-names>M.</given-names>
</name>
<name>
<surname>Domany</surname> <given-names>E.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Pathway-based personalized analysis of cancer</article-title>. <source>Proc. Natl. Acad. Sci.</source> <volume>110</volume>, <fpage>6388</fpage>&#x2013;<lpage>6393</lpage>. doi: <pub-id pub-id-type="doi">10.1073/pnas.1219651110</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Emberley</surname> <given-names>E. D.</given-names>
</name>
<name>
<surname>Murphy</surname> <given-names>L. C.</given-names>
</name>
<name>
<surname>Watson</surname> <given-names>P. H.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>S100A7 and the progression of breast cancer</article-title>. <source>Breast Cancer Res.</source> <volume>6</volume>, <fpage>153</fpage>&#x2013;<lpage>159</lpage>. doi: <pub-id pub-id-type="doi">10.1186/bcr816</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Emberley</surname> <given-names>E. D.</given-names>
</name>
</person-group> (<year>2003</year>). <article-title>Psoriasin (S100A7) expression is associated with poor outcome in estrogen receptor-negative invasive breast cancer</article-title>. <source>Clin. Cancer Res.</source> <volume>9</volume>, <fpage>2627</fpage>&#x2013;<lpage>2631</lpage>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freytag</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Gagnon-Bartsch</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Speed</surname> <given-names>T. P.</given-names></name>
<name>
<surname>Bahlo</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Systematic noise degrades gene co-expression signals but can be corrected</article-title>. <source>BMC Bioinformatics</source> <volume>16</volume> (<issue>1</issue>), <fpage>309</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s12859-015-0745-3</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jia</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Deng</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Tan</surname> <given-names>T. Z.</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>R. Y.-J.</given-names>
</name>
<name>
<surname>Taylor-Harding</surname> <given-names>B.</given-names>
</name>
<etal/>
</person-group>. (<year>2016</year>). <article-title>A COL11A1-correlated pan-cancer gene signature of activated fibroblasts for the prioritization of therapeutic targets</article-title>. <source>Cancer Lett.</source> <volume>382</volume>, <fpage>203</fpage>&#x2013;<lpage>214</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.canlet.2016.09.001</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Gu</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Pan</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Fu</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Sahu</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>X.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response</article-title>. <source>Nat. Med.</source> <volume>24</volume>, <fpage>10</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41591-018-0136-1</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kammerer</surname> <given-names>S.</given-names>
</name>
<name>
<surname>Sokolowski</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Hackl</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Platzer</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Jahn</surname> <given-names>S. W.</given-names>
</name>
<name>
<surname>El-Heliebi</surname> <given-names>A.</given-names>
</name>
<etal/>
</person-group>. (<year>2016</year>). <article-title>KCNJ3 is a new independent prognostic marker for estrogen receptor positive breast cancer patients</article-title>. <source>Oncotarget</source> <volume>7</volume>, <fpage>84705</fpage>. doi: <pub-id pub-id-type="doi">10.18632/oncotarget.13224</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lesterhuis</surname> <given-names>W. J.</given-names>
</name>
<name>
<surname>Rinaldi</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Jones</surname> <given-names>A.</given-names>
</name>
<name>
<surname>Rozali</surname> <given-names>E. N.</given-names>
</name>
<name>
<surname>Dick</surname> <given-names>I. M.</given-names>
</name>
<name>
<surname>Khong</surname> <given-names>A.</given-names>
</name>
<etal/>
</person-group>. (<year>2015</year>). <article-title>Network analysis of immunotherapy-induced regressing tumours identifies novel synergistic drug combinations</article-title>. <source>Sci. Rep.</source> <volume>5</volume>, <fpage>12298</fpage>. doi: <pub-id pub-id-type="doi">10.1038/srep12298</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis-Wambi</surname> <given-names>J. S.</given-names>
</name>
<name>
<surname>Cunliffe</surname> <given-names>H. E.</given-names>
</name>
<name>
<surname>Kim</surname> <given-names>H. R.</given-names>
</name>
<name>
<surname>Willis</surname> <given-names>A. L.</given-names>
</name>
<name>
<surname>Jordan</surname> <given-names>V. C.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Overexpression of CEACAM6 promotes migration and invasion of oestrogen-deprived breast cancer cells</article-title>. <source>Eur. J. Cancer</source> <volume>44</volume>, <fpage>1770</fpage>&#x2013;<lpage>1779</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.ejca.2008.05.016</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Tang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Zhao</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>J.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>A new framework for identifying differentially expressed genes</article-title>. <source>Pattern Recognit.</source> <volume>40</volume>, <fpage>3249</fpage>&#x2013;<lpage>3262</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.patcog.2007.01.032</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Tang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>A novel approach to feature extraction from classification models based on information gene pairs</article-title>. <source>Pattern Recognit.</source> <volume>41</volume>, <fpage>1975</fpage>&#x2013;<lpage>1984</lpage>. doi: <pub-id pub-id-type="doi">10.1016/j.patcog.2007.11.019</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Lenferink</surname> <given-names>A. E. G.</given-names>
</name>
<name>
<surname>Deng</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Collins</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Cui</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Purisima</surname> <given-names>E. O.</given-names>
</name>
<etal/>
</person-group>. (<year>2010</year>). <article-title>Identification of high-quality cancer prognostic markers and metastasis network modules</article-title>. <source>Nat. Commun.</source> <volume>1</volume>, <fpage>34</fpage>. doi: <pub-id pub-id-type="doi">10.1038/ncomms1033</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liang</surname> <given-names>W.-H.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>N.</given-names>
</name>
<name>
<surname>Yuan</surname> <given-names>Z.-Q.</given-names>
</name>
<name>
<surname>Qian</surname> <given-names>X.-L.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>Z.-H.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>DSCAM-AS1 promotes tumor growth of breast cancer by reducing miR-204-5p and up-regulating RRM2
</article-title>. <source>Mol. Carcinog.</source> <volume>58</volume>, <fpage>461</fpage>&#x2013;<lpage>473</lpage>. doi: <pub-id pub-id-type="doi">10.1002/mc.22941</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Chang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>R.</given-names>
</name>
<name>
<surname>Yu</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Aihara</surname> <given-names>K.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Quantifying critical states of complex diseases using single-sample dynamic network biomarkers</article-title>. <source>PloS Comput. Biol.</source> <volume>13</volume>, <elocation-id>e1005633</elocation-id>. doi: <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005633</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Love</surname> <given-names>M. I.</given-names>
</name>
<name>
<surname>Huber</surname> <given-names>W.</given-names>
</name>
<name>
<surname>Anders</surname> <given-names>S.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
</article-title>. <source>Genome Biol.</source> <volume>15</volume>, <fpage>550</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moore</surname> <given-names>S. G.</given-names>
</name>
<name>
<surname>Pryce</surname> <given-names>J. E.</given-names>
</name>
<name>
<surname>Hayes</surname> <given-names>B. J.</given-names>
</name>
<name>
<surname>Chamberlain</surname> <given-names>A. J.</given-names>
</name>
<name>
<surname>Kemper</surname> <given-names>K. E.</given-names>
</name>
<name>
<surname>Berry</surname> <given-names>D. P.</given-names>
</name>
<etal/>
</person-group>. (<year>2016</year>). <article-title>Differentially expressed genes in endometrium and corpus luteum of Holstein cows selected for high and low fertility are enriched for sequence variants associated with fertility</article-title>. <source>Biol. Reprod.</source> <volume>94</volume>, <fpage>11</fpage>&#x2013;<lpage>19</lpage>. doi: <pub-id pub-id-type="doi">10.1095/biolreprod.115.132951</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ni</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Yun</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Fu-Lan</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Xun</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Xing-Wei</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Chun</surname> <given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Overexpression of CLEC3A promotes tumor progression and poor prognosis in breast invasive ductal cancer</article-title>. <source>Oncotargets Ther.</source> <volume>11</volume>, <fpage>3303</fpage>&#x2013;<lpage>3312</lpage>. doi: <pub-id pub-id-type="doi">10.2147/OTT.S161311</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ock</surname> <given-names>C.-Y.</given-names>
</name>
<name>
<surname>Hwang</surname> <given-names>J.-E.</given-names>
</name>
<name>
<surname>Keam</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Kim</surname> <given-names>S.-B.</given-names>
</name>
<name>
<surname>Shim</surname> <given-names>J.-J.</given-names>
</name>
<name>
<surname>Jang</surname> <given-names>H.-J.</given-names>
</name>
<etal/>
</person-group>. (<year>2017</year>). <article-title>Genomic landscape associated with potential response to anti-CTLA-4 treatment in cancers</article-title>. <source>Nat. Commun.</source> <volume>8</volume>, <fpage>1050</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41467-017-01018-0</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Powell</surname> <given-names>E.</given-names>
</name>
<name>
<surname>Shao</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Picon</surname> <given-names>H. M.</given-names>
</name>
<name>
<surname>Bristow</surname> <given-names>C.</given-names>
</name>
<name>
<surname>Ge</surname> <given-names>Z.</given-names>
</name>
<name>
<surname>Peoples</surname> <given-names>M.</given-names>
</name>
<etal/>
</person-group>. (<year>2018</year>). <article-title>A functional genomic screen in vivo identifies CEACAM5 as a clinically relevant driver of breast cancer metastasis</article-title>. <source>NPJ Breast Cancer</source> <volume>4</volume>, <fpage>9</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41523-018-0062-x</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rezwan</surname> <given-names>F. I.</given-names>
</name>
<name>
<surname>Docherty</surname> <given-names>L. E.</given-names>
</name>
<name>
<surname>Poole</surname> <given-names>R. L.</given-names>
</name>
<name>
<surname>Lockett</surname> <given-names>G. A.</given-names>
</name>
<name>
<surname>Arshad</surname> <given-names>S. H.</given-names>
</name>
<name>
<surname>Holloway</surname> <given-names>J. W.</given-names>
</name>
<etal/>
</person-group>. (<year>2015</year>). <article-title>A statistical method for single sample analysis of HumanMethylation450 array data: genome-wide methylation analysis of patients with imprinting disorders</article-title>. <source>Clin. Epigenet.</source> <volume>7</volume>, <fpage>48</fpage>. doi: <pub-id pub-id-type="doi">10.1186/s13148-015-0081-5</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smyth</surname> <given-names>G. K.</given-names>
</name>
</person-group> (<year>2004</year>). <article-title>Linear models and empirical bayes methods for assessing differential expression in microarray experiments</article-title>. <source>Stat. Appl. Genet. Mol. Biol.</source> <volume>3</volume>, <fpage>3</fpage>. doi: <pub-id pub-id-type="doi">10.2202/1544-6115.1027</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Su</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Miao</surname> <given-names>L.-F.</given-names>
</name>
<name>
<surname>Ye</surname> <given-names>X.-H.</given-names>
</name>
<name>
<surname>Cui</surname> <given-names>M.-S.</given-names>
</name>
<name>
<surname>He</surname> <given-names>X.-F.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Development of prognostic signature and nomogram for patients with breast cancer</article-title>. <source>Medicine</source> <volume>98</volume>, <fpage>11</fpage>. doi: <pub-id pub-id-type="doi">10.1097/MD.0000000000014617</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tomczak</surname> <given-names>K.</given-names>
</name>
<name>
<surname>Czerwi&#x144;ska</surname> <given-names>P.</given-names>
</name>
<name>
<surname>Wiznerowicz</surname> <given-names>M.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge</article-title>. <source>Contemp. Oncol.</source> <volume>19</volume>, <fpage>A68</fpage>. doi: <pub-id pub-id-type="doi">10.5114/wo.2014.47136</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ulloa-Montoya</surname> <given-names>F.</given-names>
</name>
<name>
<surname>Louahed</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Dizier</surname> <given-names>B.</given-names>
</name>
<name>
<surname>Gruselle</surname> <given-names>O.</given-names>
</name>
<name>
<surname>Brichard</surname> <given-names>V. G.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Predictive gene signature in MAGE-A3 antigen-specific cancer immunotherapy</article-title>. <source>J. Clin. Oncol.</source> <volume>31</volume>, <fpage>2388</fpage>. doi: <pub-id pub-id-type="doi">10.1200/JCO.2012.44.3762</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van&#x2019;t Veer</surname> <given-names>L. J.</given-names>
</name>
<name>
<surname>Bernards</surname> <given-names>R.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Enabling personalized cancer medicine through analysis of gene-expression patterns</article-title>. <source>Nature</source> <volume>452</volume>, <fpage>564</fpage>. doi: <pub-id pub-id-type="doi">10.1038/nature06915</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van&#x2019;t Veer</surname> <given-names>L. J.</given-names>
</name>
<name>
<surname>Dai</surname> <given-names>H.</given-names>
</name>
<name>
<surname>Van De Vijver</surname> <given-names>M. J.</given-names>
</name>
<name>
<surname>He</surname> <given-names>Y. D.</given-names>
</name>
<name>
<surname>Hart</surname> <given-names>A. A.</given-names>
</name>
<name>
<surname>Mao</surname> <given-names>M.</given-names>
</name>
<etal/>
</person-group>. (<year>2002</year>). <article-title>Gene expression profiling predicts clinical outcome of breast cancer</article-title>. <source>Nature</source> <volume>415</volume>, <fpage>530</fpage>. doi: <pub-id pub-id-type="doi">10.1038/415530a</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>J.-R.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>Y.-H.</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>L.</given-names>
</name>
<name>
<surname>Huang</surname> <given-names>T.</given-names>
</name>
<name>
<surname>Cai</surname> <given-names>Y.-D.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Identification of differentially expressed genes between original breast cancer and xenograft using machine learning algorithms</article-title>. <source>Genes</source> <volume>9</volume>, <fpage>155</fpage>. doi: <pub-id pub-id-type="doi">10.3390/genes9030155</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xuan</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>Y.</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>X.</given-names>
</name>
<name>
<surname>Hu</surname> <given-names>F.</given-names>
</name>
</person-group> (<year>2015</year>). <article-title>Matrix metalloproteinase-1 expression in breast cancer and cancer-adjacent tissues by immunohistochemical staining</article-title>. <source>Biomed. Rep.</source> <volume>3</volume>, <fpage>395</fpage>&#x2013;<lpage>397</lpage>. doi: <pub-id pub-id-type="doi">10.3892/br.2015.420</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>Q.</given-names>
</name>
<name>
<surname>Li</surname> <given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>D.</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>Y.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Finding disagreement pathway signatures and constructing an ensemble model for cancer classification</article-title>. <source>Sci. Rep.</source> <volume>7</volume>, <fpage>10044</fpage>. doi: <pub-id pub-id-type="doi">10.1038/s41598-017-10258-5</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>