<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2020.538492</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>MGMIN: A Normalization Method for Correcting Probe Design Bias in Illumina Infinium HumanMethylation450 BeadChips</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Zhenxing</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/875318/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Yongzhuang</given-names></name>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wang</surname> <given-names>Yadong</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff><institution>School of Computer Science and Technology, Harbin Institute of Technology</institution>, <addr-line>Harbin</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Zhongyu Wei, Fudan University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Teng Zhixia, Northeast Forestry University, China; Zhen Tian, Zhengzhou University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Yadong Wang <email>ydwang&#x00040;hit.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p></fn></author-notes>
<pub-date pub-type="epub">
<day>27</day>
<month>10</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>11</volume>
<elocation-id>538492</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>02</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>09</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Wang, Liu and Wang.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Wang, Liu and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license></permissions>
<abstract><p>The Illumina Infinium HumanMethylation450 Beadchips have been widely utilized in epigenome-wide association studies (EWAS). However, the existing two types of probes (type I and type II), with the distribution of measurements of probes and dynamic range different, may bias downstream analyses. Here, we propose a method, MGMIN (<italic>M</italic>-values Gaussian-MIxture Normalization), to correct the probe designs based on <italic>M</italic>-values of DNA methylation. Our strategy includes fitting Gaussian mixture distributions to type I and type II probes separately, a transformation of <italic>M</italic>-values into quantiles and finally a dilation transformation based on <italic>M</italic>-values of DNA methylation to maintain the continuity of the data. Our method is validated on several public datasets on reducing probe design bias, reducing the technical variation and improving the ability to find biologically differential methylation signals. The results show that MGMIN achieves competitive performances compared to BMIQ which is a well-known normalization method on &#x003B2;-values of DNA methylation.</p></abstract>
<kwd-group>
<kwd>DNA methylation</kwd>
<kwd>design bias</kwd>
<kwd>normalization</kwd>
<kwd>M-value</kwd>
<kwd>Gaussian mixture model</kwd>
<kwd>Illumina Infinium 450K</kwd>
</kwd-group>
<counts>
<fig-count count="6"/>
<table-count count="1"/>
<equation-count count="12"/>
<ref-count count="14"/>
<page-count count="8"/>
<word-count count="3741"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>DNA methylation, as a well-known epigenetic marker, plays an essential role in biological processes and complex genetic diseases like cancer and diabetes (Irizarry et al., <xref ref-type="bibr" rid="B5">2009</xref>; Paul et al., <xref ref-type="bibr" rid="B10">2016</xref>). The Illumina Infinium HumanMethylation450 (450K) BeadChip (Bibikova et al., <xref ref-type="bibr" rid="B1">2011</xref>) provides measurements of the level of methylation at over 480K CpG sites and has been widely used in epigenome-wide association studies (EWAS) and large-scale projects, such as The Cancer Genome Atlas (TCGA). The probes in the Infinium 450K BeadChip come in two different designs, type I (<italic>n</italic> = 135,501) and type II (<italic>n</italic> = 350,076), in order to increase the genomic coverage of the assay. However, the methylation values (&#x003B2;-values or <italic>M</italic>-values) derived from the two types of designs exhibit different distributions. Particularly, the type I probes possess a larger range of measurement than the type II probes (Dedeurwaerder et al., <xref ref-type="bibr" rid="B2">2011</xref>). The differences between the two types of probe designs may impact the downstream analyses.</p>
<p>Several approaches have been published to correct the probe design bias. A peak-based correction (PBC) method normalizes type II probes to render them comparable with type I probes (Dedeurwaerder et al., <xref ref-type="bibr" rid="B2">2011</xref>). In fact, PBC gets poor performance when the density distribution of methylation values does not show well-defined peaks. SQN (Touleimat and Tost, <xref ref-type="bibr" rid="B13">2012</xref>) and SWAN (Maksimovic et al., <xref ref-type="bibr" rid="B7">2012</xref>) select subset of probes with similar biological category to adjust the probe design bias. Beta MIxture Quantile dilation (BMIQ) is a model-based normalization approach to correct &#x003B2;-values of type II probes according to the beta distribution of &#x003B2;-values of type I probes, which appears to outperform PBC, SQN, and SWAN (Teschendorff et al., <xref ref-type="bibr" rid="B12">2012</xref>).</p>
<p>In this work, we propose a method to correct the probe design bias based on the Gaussian Mixture Model (GMM) of the <italic>M</italic>-values of DNA methylation, which is called <italic>M</italic>-value Gaussian-MIxture Normalization (MGMIN). The method includes three steps: (i) fit Gaussian-mixture distributions to type I and type II probes separately, (ii) utilize a transformation of <italic>M</italic>-values into quantiles, (iii) perform a dilation transformation based on <italic>M</italic>-values to maintain the continuity of the data. We evaluate MGMIN using several independent datasets in terms of reducing the replicate technical variance and correcting the type II bias. By comparison with BMIQ, the results show that MGMIN improves the overall performance of normalization.</p></sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and Methods</title>
<sec>
<title>2.1. Measure DNA Methylation With <italic>M</italic>-value</title>
<p>The &#x003B2;-value of DNA methylation for each probe is defined by the ratio of the methylated intensity (M) and the overall intensity (sum of methylated intensity and unmethylated intensity: M &#x0002B; U):
<disp-formula id="E1"><mml:math id="M1"><mml:mi>&#x003B2;</mml:mi><mml:mo>-</mml:mo><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>U</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>
where &#x003B1; is a constant offset (by default, &#x003B1; = 100) to regularize the &#x003B2;-value when the overall intensity is low. The &#x003B2;-value falls between 0 and 1 which follows a Beta distribution naturally. A &#x003B2;-value of 0 indicates the CpG site of the measured sample is fully unmethylated and a value of 1 indicates that the CpG site is completely methylated.</p>
<p>The <italic>M</italic>-value is calculated by the log2 ratio of the methylated intensity (M) vs. the unmethylated intensity (U):
<disp-formula id="E2"><mml:math id="M2"><mml:mi>M</mml:mi><mml:mo>-</mml:mo><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>u</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo class="qopname">log</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>
where &#x003B1; here is also an offset (by default, &#x003B1; = 1) to counteract the big changes caused by small intensity estimation errors. An <italic>M</italic>-value close to zero indicates that the measured CpG site is about hemimethylated. A positive <italic>M</italic>-value suggests that more copies of the measured CpG site are methylated than unmethylated and a negative <italic>M</italic>-value means more copies of the CpG site are unmethylated. The <italic>M</italic>-value has been widely used in two-color expression microarray analysis (Du et al., <xref ref-type="bibr" rid="B4">2010</xref>).</p>
<p>Due to more than 95% CpG sites have intensities more than 1,000 in Illumina methylation data, the &#x003B1; in &#x003B2;-value and <italic>M</italic>-value has an insignificant effect on observed results. So the relationship between &#x003B2;-value and <italic>M</italic>-value is shown as (with &#x003B1; ignored):
<disp-formula id="E3"><mml:math id="M3"><mml:mi>&#x003B2;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:mo>;</mml:mo><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo class="qopname">log</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi>&#x003B2;</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>
According to the conclusions in Du et al. (<xref ref-type="bibr" rid="B4">2010</xref>), the <italic>M</italic>-value is more statistically valid in an analysis by modeling the distribution of <italic>M</italic>-values because of it&#x00027;s <italic>homoscedastic</italic>. So we choose to adjust the <italic>M</italic>-values of type II probes into the distribution property of type I probes to correct the probe design bias.</p></sec>
<sec>
<title>2.2. MGMIN: <italic>M</italic>-value Gaussian-MIxture Normalization</title>
<p>Gaussian Mixture Model (GMM) has been widely applied as a clustering method in analyzing gene-expression microarray data (Yeung et al., <xref ref-type="bibr" rid="B14">2001</xref>; Pan et al., <xref ref-type="bibr" rid="B9">2002</xref>) and used to detect differential gene expression (McLachlan et al., <xref ref-type="bibr" rid="B8">2006</xref>). In this paper, we apply GMM to distinguish different methylation states of CpG sites for further correction. The <italic>M</italic>-values of a single 450K microarray can be viewed as a finite Gaussian mixture model of several methylation states (hypomethylated-U, hemimethylated-H, hypermethylated-F). The probability density function of the <italic>M</italic>-value for a single CpG site (<italic>M</italic><sub><italic>i</italic></sub>) is defined as:
<disp-formula id="E4"><label>(1)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>&#x003B8;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <italic>p</italic>(<italic>M</italic><sub><italic>i</italic></sub>, &#x003B8;) represents the model density for <italic>M</italic><sub><italic>i</italic></sub> with unknown parameter vector &#x003B8;, K is the number of different methylation states (components), <inline-formula><mml:math id="M5"><mml:mi>N</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is the probability density function of the <italic>k</italic>th Gaussian component, and &#x003C0;<sub><italic>k</italic></sub> is the mixing proportions which satisfy the constraint that <inline-formula><mml:math id="M6"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> and 0 &#x02264; &#x003C0;<sub><italic>k</italic></sub> &#x02264; 1. The parameter vector &#x003B8; consists of the mixing proportions &#x003C0;<sub><italic>k</italic></sub>, the mean value &#x003BC;<sub><italic>k</italic></sub> and the standard deviation &#x003C3;<sub><italic>k</italic></sub>, which can be estimated by the EM algorithm.</p>
<p>Next, we describe MGMIN in detail. First, <italic>M</italic>-values of type I and type II probes are modeled by GMM separately. Let <inline-formula><mml:math id="M7"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M8"><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> denote the mean value and standard deviation where <italic>S</italic> &#x02208; (<italic>U, H, F</italic>) and <italic>T</italic> &#x02208; (<italic>I, II</italic>). <italic>K</italic><sub><italic>I</italic></sub> and <italic>K</italic><sub><italic>II</italic></sub> are the numbers of components for type I and type II probes, which are both set as 3 by default.</p>
<p>Second, each probe is assigned to hypomethylated (<italic>U</italic><sub><italic>T</italic></sub>), hemimethylated (<italic>H</italic><sub><italic>T</italic></sub>), or hypermethylated (<italic>F</italic><sub><italic>T</italic></sub>) states by using the maximum probability criterion. Let <inline-formula><mml:math id="M9"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> (<inline-formula><mml:math id="M10"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>) denote the <italic>U</italic><sub><italic>T</italic></sub> probes with <italic>M</italic>-values smaller (larger) than <inline-formula><mml:math id="M11"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, and let <inline-formula><mml:math id="M12"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> (<inline-formula><mml:math id="M13"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>) represent the <italic>F</italic><sub><italic>T</italic></sub> probes with <italic>M</italic>-values smaller (larger) than <inline-formula><mml:math id="M14"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> where <italic>T</italic> &#x02208; (<italic>I, II</italic>). Then, we calculate the probabilities of <inline-formula><mml:math id="M15"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> probes, i.e.,
<disp-formula id="E5"><label>(2)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where P represents the cumulative distribution function of the Gaussian component. These probabilities are transformed back to quantiles (<italic>M</italic>-value) by using the parameters <inline-formula><mml:math id="M17"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M18"><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of type I probes, i.e.,
<disp-formula id="E6"><label>(3)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>p</mml:mi><mml:mo>|</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>P</italic><sup>&#x02212;1</sup> returns the value of the inverse cumulative density function given the probability p and q is the normalized <italic>M</italic>-values for <inline-formula><mml:math id="M20"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. The similar operation is performed on <inline-formula><mml:math id="M21"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> probes.</p>
<p>Then, we merge the <inline-formula><mml:math id="M22"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, <italic>H</italic><sub><italic>II</italic></sub>, and <inline-formula><mml:math id="M23"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> probes into one set <italic>G</italic> on which a conformal (shift &#x0002B; dilation) transformation is performed. Some parameters are identified as <italic>minG</italic> &#x0003D; min{<italic>M</italic><sub><italic>G</italic></sub>}, <italic>maxG</italic> &#x0003D; max{<italic>M</italic><sub><italic>G</italic></sub>} and <inline-formula><mml:math id="M24"><mml:msubsup><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>G</mml:mi><mml:mo>-</mml:mo><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>G</mml:mi></mml:math></inline-formula>. Similarly, the minimum value of <inline-formula><mml:math id="M25"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and the maximum value of <inline-formula><mml:math id="M26"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are also identified, i.e., <inline-formula><mml:math id="M27"><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mo class="qopname">min</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M28"><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>U</mml:mi><mml:mo>=</mml:mo><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. Two distance values can be calculated as
<disp-formula id="E7"><mml:math id="M29"><mml:msub><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>U</mml:mi><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>G</mml:mi><mml:mo>-</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>U</mml:mi></mml:math></disp-formula>
<disp-formula id="E8"><mml:math id="M30"><mml:msub><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>G</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>F</mml:mi><mml:mo>-</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>G</mml:mi></mml:math></disp-formula>
The new normalized maximum and minimum values of G-probes are expected to satisfy the constraint that
<disp-formula id="E9"><mml:math id="M31"><mml:msup><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo class="qopname">min</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:mstyle></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>G</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula>
<disp-formula id="E10"><mml:math id="M32"><mml:msup><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo class="qopname">max</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:mstyle></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>U</mml:mi><mml:mi>G</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula>
where <inline-formula><mml:math id="M33"><mml:msup><mml:mrow><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math id="M34"><mml:msup><mml:mrow><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are new normalized values for <inline-formula><mml:math id="M35"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M36"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, respectively. So the new normalized range value of set <italic>G</italic> is <inline-formula><mml:math id="M37"><mml:msup><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. The normalized <italic>M</italic>-values of set <italic>G</italic>, <inline-formula><mml:math id="M38"><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, is calculated by
<disp-formula id="E11"><label>(4)</label><mml:math id="M39"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>G</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <inline-formula><mml:math id="M40"><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>/</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x00394;</mml:mo></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the dilation factor. So, the normalized <italic>M</italic>-values for type II probes consist of <italic>q</italic> for <inline-formula><mml:math id="M41"><mml:msubsup><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math id="M42"><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, and <italic>q</italic> for <inline-formula><mml:math id="M43"><mml:msubsup><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>.
<disp-formula id="E12"><mml:math id="M44"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow></mml:mrow><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:msubsup></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msup><mml:mrow></mml:mrow><mml:mo>&#x02032;</mml:mo></mml:msup><mml:mo>,</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mi>R</mml:mi></mml:msubsup></mml:mrow></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math></disp-formula>
Lastly, the normalized <italic>M</italic>-values are transformed to &#x003B2;-values.</p>
<p>There are some important points to notice: (i) the initial values for &#x003BC; and &#x003C3; in EM algorithm are set as (&#x02212;4,0,4) and (1,1,1) and small perturbations to the initial &#x003BC; and &#x003C3; will not affect the final model because MGMIN captures the natural property of the <italic>M</italic>-value of DNA methylation, (ii) <italic>K</italic><sub><italic>I</italic></sub> will be changed to 4 automatically when <inline-formula><mml:math id="M45"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is smaller than <inline-formula><mml:math id="M46"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> in order to ensure that <inline-formula><mml:math id="M47"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> can always be larger than <inline-formula><mml:math id="M48"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and avoid the presence of an unexpected peak in transformed <italic>M</italic>-values of hypermethylated type II probes, (iii) if <italic>K</italic><sub><italic>I</italic></sub> &#x0003D; 4, the <italic>F</italic><sub><italic>I</italic></sub> will be the set of probes belonging to the component with the largest &#x003BC;, while the <italic>U</italic><sub><italic>I</italic></sub> contains the probes belonging to the component with the smallest &#x003BC; and the other two components are assigned to <italic>H</italic><sub><italic>I</italic></sub>, (iv) no thresholds need to be set by default or estimated by manual to distinguish the three different states of DNA methylation.</p></sec>
<sec>
<title>2.3. Datasets</title>
<p>We selected several public 450K datasets as following:</p>
<p>Dataset 1: GSE29290 downloaded from GEO considered in Dedeurwaerder et al. (<xref ref-type="bibr" rid="B2">2011</xref>). We used the three replicates (GSM15136, GSM15137 and GSM15138) from the HCT116WT cell-line and matched bisulfite pyrosequencing (BPS) date for nine type II probes of sample GSM815138 (r3) (Table 1 in Dedeurwaerder et al., <xref ref-type="bibr" rid="B2">2011</xref>) to evaluate the performance of different methods.</p>
<p>Dataset 2: GSE38268 downloaded from GEO considered in Lechner et al. (<xref ref-type="bibr" rid="B6">2013</xref>) which consists of 6 fresh frozen HNC samples. We selected 5 samples as same as (Teschendorff et al., <xref ref-type="bibr" rid="B12">2012</xref>), of which 2 were HPV&#x0002B; and 3 HPV&#x02212; (GSM937820 to GSM937824).</p>
<p>Dataset 3: GSE38266 downloaded from GEO considered in Lechner et al. (<xref ref-type="bibr" rid="B6">2013</xref>) which contains 21 FFPE HPV&#x0002B; HNSCC samples and 21 FFPE HPV&#x02212; HNSCC samples. Note that the entire quality of the dataset GSE38266 is not high.</p>
<p>Dataset 4: GSE95036 downloaded from GEO considered in Degli Esposti et al. (<xref ref-type="bibr" rid="B3">2017</xref>) which contains 6 HPV&#x0002B; HNC samples and 5 HPV&#x02212; HNC samples.</p></sec></sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<sec>
<title>3.1. MGMIN Needs No Default Initial Values of Parameters</title>
<p>Similar to the mixture model of BMIQ, MGMIN applies Gaussian mixture models for <italic>M</italic>-values instead of beta-mixture models for &#x003B2;-values. MGMIN also uses quantile information to correct the <italic>M</italic>-values of the type II probes into a distribution which is comparable with that of type I probes. MGMIN complies the inherent Gaussian mixture distributions for <italic>M</italic>-values of type I and type II probes to avoid setting any parameters manually, which is different from the default breakpoints in BMIQ. Thus, MGMIN needs less manual intervention than BMIQ. However, MGMIN is slightly inferior to BMIQ on some dataset (<xref ref-type="table" rid="T1">Table 1</xref>) due to the entire low quality of the dataset. Note that the PPV of BMIQ on Dataset 3 is lower than that of no normalization (RAW).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Comparison of MGMIN and BMIQ on detecting the differentially methylated probes (DMPs) associated with HPV status was performed by counting the number of DMPs (Dataset 2), the number of validated differentially methylated probes (nTPs) (Dataset 3: GSE38266 and Dataset 4: GSE95036) and corresponding estimates for the positive predictive value (PPV = nTP/nDMPs).</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Metric</bold></th>
<th valign="top" align="center"><bold>Raw</bold></th>
<th valign="top" align="center"><bold>BMIQ</bold></th>
<th valign="top" align="center"><bold>MGMIN</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">nDMP</td>
<td valign="top" align="center">51 (51<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref>)</td>
<td valign="top" align="center">239 (252<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref>)</td>
<td valign="top" align="center">220</td>
</tr>
<tr>
<td valign="top" align="left">nTP (GSE38266)</td>
<td valign="top" align="center">16 (13<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref>)</td>
<td valign="top" align="center">55 (51<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref>)</td>
<td valign="top" align="center">37</td>
</tr>
<tr>
<td valign="top" align="left">PPV (GSE38266)</td>
<td valign="top" align="center">0.31 (0.25<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref>)</td>
<td valign="top" align="center">0.23 (0.20<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref>)</td>
<td valign="top" align="center">0.17</td>
</tr>
<tr>
<td valign="top" align="left">nTP (GSE95036)</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">13</td>
<td valign="top" align="center">27</td>
</tr>
<tr>
<td valign="top" align="left">PPV (GSE95036)</td>
<td valign="top" align="center">0.06</td>
<td valign="top" align="center">0.05</td>
<td valign="top" align="center">0.12</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1"><label>a</label><p><italic>Values reported in Teschendorff et al. (<xref ref-type="bibr" rid="B12">2012</xref>)</italic>.</p></fn>
</table-wrap-foot>
</table-wrap></sec>
<sec>
<title>3.2. MGMIN Reduces Technical Variation</title>
<p>MGMIN is applied to Dataset 1 to identify the ability to improve reproducibility. The standard deviation (SD) for each probe across the three replicates was computed using no normalization (RAW), BMIQ, and MGMIN separately. As can be seen in <xref ref-type="fig" rid="F1">Figure 1</xref>, both MGMIN and BMIQ almost made the density curves for the three replicates coincide with each other and reduced the technical variation significantly compared to no normalization. Compared to BMIQ, the standard deviation for type II probes adjusted by MGMIN is smaller (<xref ref-type="fig" rid="F2">Figure 2</xref>). MGMIN also provided significant reduction of average absolute difference in &#x003B2;-values of type II probes between two samples in each of the three pairs of the three replicates (<xref ref-type="fig" rid="F3">Figure 3</xref>).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The density curves of &#x003B2;-values for the three replicates in Dataset 1. The left panel is for the case of raw data with no normalization, middle panel for BMIQ and right panel for MGMIN.</p></caption>
<graphic xlink:href="fgene-11-538492-g0001.tif"/>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Boxplots of the standard deviations of &#x003B2;-values for the three replicates in Dataset 1, for raw &#x003B2;-values (RAW), normalized &#x003B2;-values by BMIQ (BMIQ), and normalized &#x003B2;-values by MGMIN (MGMIN). RAW-1 represents the type I of raw values and RAW-2 represents the type II of raw values, and so on.</p></caption>
<graphic xlink:href="fgene-11-538492-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Barplots of the average absolute difference in &#x003B2;-values of type II probes between two samples in each of the three pairs of the three replicates in Dataset 1.</p></caption>
<graphic xlink:href="fgene-11-538492-g0003.tif"/>
</fig></sec>
<sec>
<title>3.3. MGMIN Reduces Probe Design Bias</title>
<p>MGMIN reduces the probe design bias via correcting the <italic>M</italic>-values of the type II probes such that the distribution curves for the <italic>M</italic>-values of the type I and type II probes show similar dynamic ranges and peaks (<xref ref-type="fig" rid="F4">Figure 4</xref>). In Dedeurwaerder et al. (<xref ref-type="bibr" rid="B2">2011</xref>), the &#x003B2;-values for nine probes of type II by bisulfite pyrosequencing technique for sample GSM815138 (r3) were provided, which can be used as a gold-standard to evaluate the performance of different correction methods. Hence, we compared the normalized results of the nine type II probes in 450K arrays by MGMIN and BMIQ. As shown in <xref ref-type="fig" rid="F5">Figure 5</xref>, although MGMIN performed slightly worse than BMIQ at the maximum value of the absolute deviation from BPS data, MGMIN significantly reduced the type II bias than BMIQ and raw data in terms of mean and root mean square error (RMSE) of the absolute deviation from the matched BPS values.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>The density curves of &#x003B2;-values for type I probes, type II probes and normalized type II probes (type II-MGMIN) for sample GSM815138 from GEO29290.</p></caption>
<graphic xlink:href="fgene-11-538492-g0004.tif"/>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Barplots for the maximum (MAX), mean (MEAN) and root mean square error (RMSE) of the absolute deviation from the matched BPS values of nine type II probes for GSM815138 (r3) in Dataset 1 considered in Dedeurwaerder et al. (<xref ref-type="bibr" rid="B2">2011</xref>) using no normalization (RAW), BMIQ, and MGMIN, respectively.</p></caption>
<graphic xlink:href="fgene-11-538492-g0005.tif"/>
</fig></sec>
<sec>
<title>3.4. MGMIN Robustly Finds Informative Differential Methylation Probes Associated With HPV Status</title>
<p>The goal of a bias correction approach is to reduce the technical variation and identify the biological informative signals at the same time. We used a strategy similar to Teschendorff et al. (<xref ref-type="bibr" rid="B12">2012</xref>) to compare the result between MGMIN and BMIQ in identifying the differential methylation probes (DMPs) associated with HPV status. First, Dataset 2 consisting of two HPV&#x0002B; and three HPV&#x02212; fresh frozen HNC samples were used as the training set to obtain the DMPs associated with HPV status by the <italic>limma</italic> method (Smyth, <xref ref-type="bibr" rid="B11">2005</xref>) and an FDR threshold 0.35 which was as same as (Teschendorff et al., <xref ref-type="bibr" rid="B12">2012</xref>). Both Dataset 3 and Dataset 4 described in the methods section were used as test set. We reanalyzed Dataset 2 and got similar numbers of DMPs to those reported in Teschendorff et al. (<xref ref-type="bibr" rid="B12">2012</xref>) with no normalization method (Raw) or BMIQ method (shown in <xref ref-type="table" rid="T1">Table 1</xref>). The results in <xref ref-type="table" rid="T1">Table 1</xref> shows that the positive predictive value (PPV) of MGMIN is slightly less than BMIQ in terms of GSE38266 (Dataset 3) whereas MGMIN outperforms BMIQ in GSE95036 (Dataset 4). The reason for MGMIN slightly inferior to BMIQ in Dataset 3 may be the entire low quality of the dataset (see <xref ref-type="fig" rid="F6">Figure 6</xref>) which is that the ratio of samples passing filters is &#x0003C;0.9 (<italic>r</italic> &#x0003D; 0.88) under the least restrictive condition. Let &#x003C4;<sub><italic>p</italic></sub> represent the <italic>p</italic>-value threshold for bad probes and &#x003C4;<sub><italic>r</italic></sub> represent the threshold for the ratio of bad probes in a sample. The maximum value of &#x003C4;<sub><italic>r</italic></sub> is set to 0.3 here in our opinion because a sample with more than 30% bad probes is vulnerable. We can get the same test dataset from GSE38266 with the one described in Teschendorff et al. (<xref ref-type="bibr" rid="B12">2012</xref>) which consists of 18 HPV&#x0002B; and 14 HPV&#x02212; samples under the following conditions: (i) &#x003C4;<sub><italic>p</italic></sub> &#x0003D; 1<italic>e</italic> &#x02212; 4 or 1<italic>e</italic> &#x02212; 3 and &#x003C4;<sub><italic>r</italic></sub> &#x0003D; 0.2 or 0.25, (ii) &#x003C4;<sub><italic>p</italic></sub> &#x0003D; 1<italic>e</italic> &#x02212; 2 and &#x003C4;<sub><italic>r</italic></sub> &#x0003D; 0.1 or 0.15. Overall, MGMIN identified more true positive features than BMIQ.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Barplots of the ratio of good samples in GSE38266 under different quality control options (&#x003C4;<sub><italic>p</italic></sub>&#x00026;&#x003C4;<sub><italic>r</italic></sub>).</p></caption>
<graphic xlink:href="fgene-11-538492-g0006.tif"/>
</fig></sec></sec>
<sec id="s4">
<title>4. Discussions</title>
<p>In this paper, we have proposed a method called MGMIN for correcting the probe design bias of type II probes in Illumina Infinium 450K BeadChips, which can reduce the technical variation and improve the ability to find biologically differential methylation signals. We have shown that MGMIN outperforms BMIQ on multiple evaluation datasets in correcting the type II design bias and improving the data quality.</p>
<p>Similar to BMIQ, MGMIN uses quantile information to correct the <italic>M</italic>-values of type II probes while leaving the <italic>M</italic>-values of type I probes unchanged. The three-state beta-mixture distribution model in BMIQ sets two default breakpoints (0.2, 0.75) to divide the &#x003B2;-values into three classes: hypomethylated, hemimethylated, and hypermethylated, which works well for most cases. However, the result curves of BMIQ show obviously inconsistent in some samples with high heterogeneity. We set 3 or 4 classes for probes depending on the result of <inline-formula><mml:math id="M49"><mml:msubsup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> to ensure that the fitted hypermethylated component of type II probes can be located in the left of the hypermethylated component of type I probes, which can partly eliminate the effects of the heterogeneity of samples.</p>
<p>Based on the results of Dataset 3, we think the high quality of dataset is the base of normalization, in other words, there is no meaning to correct the samples with low quality. It should be pointed out that the parameter estimation of MGMIN is slower than that of BMIQ (about 1.5 times), which can be relieved by reducing the number of iterations.</p>
<p>MGMIN can be used in the 450K methylation data preprocessing with other methods to normalize the values of the two type probes and improve the data quality.</p></sec>
<sec sec-type="data-availability-statement" id="s5">
<title>Data Availability Statement</title>
<p>The datasets for this study can be found in GEO: GSE29290, GSE38268, GSE38266, and GSE95036.</p></sec>
<sec id="s6">
<title>Author Contributions</title>
<p>ZW performed the experiments and wrote the manuscript. All authors read and revised the final manuscript.</p></sec>
<sec id="s7">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bibikova</surname> <given-names>M.</given-names></name> <name><surname>Barnes</surname> <given-names>B.</given-names></name> <name><surname>Tsan</surname> <given-names>C.</given-names></name> <name><surname>Ho</surname> <given-names>V.</given-names></name> <name><surname>Klotzle</surname> <given-names>B.</given-names></name> <name><surname>Le</surname> <given-names>J. M.</given-names></name> <etal/></person-group>. (<year>2011</year>). <article-title>High density DNA methylation array with single CPG site resolution</article-title>. <source>Genomics</source> <volume>98</volume>, <fpage>288</fpage>&#x02013;<lpage>295</lpage>. <pub-id pub-id-type="doi">10.1016/j.ygeno.2011.07.007</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dedeurwaerder</surname> <given-names>S.</given-names></name> <name><surname>Defrance</surname> <given-names>M.</given-names></name> <name><surname>Calonne</surname> <given-names>E.</given-names></name> <name><surname>Denis</surname> <given-names>H.</given-names></name> <name><surname>Sotiriou</surname> <given-names>C.</given-names></name> <name><surname>Fuks</surname> <given-names>F.</given-names></name></person-group> (<year>2011</year>). <article-title>Evaluation of the infinium methylation 450K technology</article-title>. <source>Epigenomics</source> <volume>3</volume>, <fpage>771</fpage>&#x02013;<lpage>784</lpage>. <pub-id pub-id-type="doi">10.2217/epi.11.105</pub-id><pub-id pub-id-type="pmid">22126295</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Degli Esposti</surname> <given-names>D.</given-names></name> <name><surname>Sklias</surname> <given-names>A.</given-names></name> <name><surname>Lima</surname> <given-names>S. C.</given-names></name> <name><surname>Beghelli-de la Forest Divonne</surname> <given-names>S.</given-names></name> <name><surname>Cahais</surname> <given-names>V.</given-names></name> <name><surname>Fernandez-Jimenez</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Unique DNA methylation signature in HPV-positive head and neck squamous cell carcinomas</article-title>. <source>Genome Med</source>. <volume>9</volume>:<fpage>33</fpage>. <pub-id pub-id-type="doi">10.1186/s13073-017-0419-z</pub-id><pub-id pub-id-type="pmid">28381277</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Du</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Huang</surname> <given-names>C.-C.</given-names></name> <name><surname>Jafari</surname> <given-names>N.</given-names></name> <name><surname>Kibbe</surname> <given-names>W. A.</given-names></name> <name><surname>Hou</surname> <given-names>L.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Comparison of beta-value and <italic>M</italic>-value methods for quantifying methylation levels by microarray analysis</article-title>. <source>BMC Bioinformatics</source> <volume>11</volume>:<fpage>587</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-587</pub-id><pub-id pub-id-type="pmid">21118553</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Irizarry</surname> <given-names>R. A.</given-names></name> <name><surname>Ladd-Acosta</surname> <given-names>C.</given-names></name> <name><surname>Wen</surname> <given-names>B.</given-names></name> <name><surname>Wu</surname> <given-names>Z.</given-names></name> <name><surname>Montano</surname> <given-names>C.</given-names></name> <name><surname>Onyango</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>The human colon cancer methylome shows similar hypo-and hypermethylation at conserved tissue-specific cpg island shores</article-title>. <source>Nat. Genet</source>. <volume>41</volume>, <fpage>178</fpage>&#x02013;<lpage>186</lpage>. <pub-id pub-id-type="doi">10.1038/ng.298</pub-id><pub-id pub-id-type="pmid">19151715</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lechner</surname> <given-names>M.</given-names></name> <name><surname>Fenton</surname> <given-names>T.</given-names></name> <name><surname>West</surname> <given-names>J.</given-names></name> <name><surname>Wilson</surname> <given-names>G.</given-names></name> <name><surname>Feber</surname> <given-names>A.</given-names></name> <name><surname>Henderson</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Identification and functional validation of HPV-mediated hypermethylation in head and neck squamous cell carcinoma</article-title>. <source>Genome Med</source>. <volume>5</volume>:<fpage>15</fpage>. <pub-id pub-id-type="doi">10.1186/gm419</pub-id><pub-id pub-id-type="pmid">23419152</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Maksimovic</surname> <given-names>J.</given-names></name> <name><surname>Gordon</surname> <given-names>L.</given-names></name> <name><surname>Oshlack</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <article-title>Swan: subset-quantile within array normalization for illumina infinium humanmethylation450 beadchips</article-title>. <source>Genome Biol</source>. <volume>13</volume>:<fpage>R44</fpage>. <pub-id pub-id-type="doi">10.1186/gb-2012-13-6-r44</pub-id><pub-id pub-id-type="pmid">22703947</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McLachlan</surname> <given-names>G. J.</given-names></name> <name><surname>Bean</surname> <given-names>R.</given-names></name> <name><surname>Jones</surname> <given-names>L. B.-T.</given-names></name></person-group> (<year>2006</year>). <article-title>A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays</article-title>. <source>Bioinformatics</source> <volume>22</volume>, <fpage>1608</fpage>&#x02013;<lpage>1615</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btl148</pub-id><pub-id pub-id-type="pmid">16632494</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pan</surname> <given-names>W.</given-names></name> <name><surname>Lin</surname> <given-names>J.</given-names></name> <name><surname>Le</surname> <given-names>C. T.</given-names></name></person-group> (<year>2002</year>). <article-title>Model-based cluster analysis of microarray gene-expression data</article-title>. <source>Genome Biol</source>. <volume>3</volume>:<fpage>research0009-1</fpage>. <pub-id pub-id-type="doi">10.1186/gb-2002-3-2-research0009</pub-id><pub-id pub-id-type="pmid">11864371</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paul</surname> <given-names>D. S.</given-names></name> <name><surname>Teschendorff</surname> <given-names>A. E.</given-names></name> <name><surname>Dang</surname> <given-names>M. A.</given-names></name> <name><surname>Lowe</surname> <given-names>R.</given-names></name> <name><surname>Hawa</surname> <given-names>M. I.</given-names></name> <name><surname>Ecker</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Increased DNA methylation variability in type 1 diabetes across three immune effector cell types</article-title>. <source>Nat. Commun</source>. <volume>7</volume>:<fpage>13555</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms13555</pub-id><pub-id pub-id-type="pmid">27898055</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Smyth</surname> <given-names>G. K.</given-names></name></person-group> (<year>2005</year>). <article-title>Limma: linear models for microarray data</article-title>, in <source>Bioinformatics sand Computational Biology Solutions Using R and Bioconductor</source>, eds <person-group person-group-type="editor"><name><surname>Gentleman</surname> <given-names>R.</given-names></name> <name><surname>Carey</surname> <given-names>V.</given-names></name> <name><surname>Huber</surname> <given-names>W.</given-names></name> <name><surname>Irizarry</surname> <given-names>R.</given-names></name> <name><surname>Dudoit</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>397</fpage>&#x02013;<lpage>420</lpage>. <pub-id pub-id-type="doi">10.1007/0-387-29362-0_23</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Teschendorff</surname> <given-names>A. E.</given-names></name> <name><surname>Marabita</surname> <given-names>F.</given-names></name> <name><surname>Lechner</surname> <given-names>M.</given-names></name> <name><surname>Bartlett</surname> <given-names>T.</given-names></name> <name><surname>Tegner</surname> <given-names>J.</given-names></name> <name><surname>Gomez-Cabrero</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>A beta-mixture quantile normalization method for correcting probe design bias in illumina infinium 450K DNA methylation data</article-title>. <source>Bioinformatics</source> <volume>29</volume>, <fpage>189</fpage>&#x02013;<lpage>196</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts680</pub-id><pub-id pub-id-type="pmid">23175756</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Touleimat</surname> <given-names>N.</given-names></name> <name><surname>Tost</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Complete pipeline for infinium<sup>&#x000AE;</sup> human methylation 450K beadchip data processing using subset quantile normalization for accurate dna methylation estimation</article-title>. <source>Epigenomics</source> <volume>4</volume>, <fpage>325</fpage>&#x02013;<lpage>341</lpage>. <pub-id pub-id-type="doi">10.2217/epi.12.21</pub-id><pub-id pub-id-type="pmid">22690668</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yeung</surname> <given-names>K. Y.</given-names></name> <name><surname>Fraley</surname> <given-names>C.</given-names></name> <name><surname>Murua</surname> <given-names>A.</given-names></name> <name><surname>Raftery</surname> <given-names>A. E.</given-names></name> <name><surname>Ruzzo</surname> <given-names>W. L.</given-names></name></person-group> (<year>2001</year>). <article-title>Model-based clustering and data transformations for gene expression data</article-title>. <source>Bioinformatics</source> <volume>17</volume>, <fpage>977</fpage>&#x02013;<lpage>987</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/17.10.977</pub-id><pub-id pub-id-type="pmid">11673243</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work has been supported by the National Key Research and Development Program of China (Nos: 2017YFC1201201 and 2017YFC0907503).</p>
</fn>
</fn-group>
</back>
</article>