<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2020.604790</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zhu</surname> <given-names>Xiaoshu</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1081774/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Jian</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Li</surname> <given-names>Hong-Dong</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/864058/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Xie</surname> <given-names>Miao</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1095072/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Wang</surname> <given-names>Jianxin</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Computer Science and Engineering, Yulin Normal University</institution>, <addr-line>Yulin</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University</institution>, <addr-line>Changsha</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Chunhou Zheng, Anhui University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Xiujuan Lei, Shaanxi Normal University, China; Jin-Xing Liu, Qufu Normal University, China; Yannan Bin, Anhui University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Jianxin Wang  <email>jxwang&#x00040;mail.csu.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p></fn></author-notes>
<pub-date pub-type="epub">
<day>15</day>
<month>12</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>11</volume>
<elocation-id>604790</elocation-id>
<history>
<date date-type="received">
<day>10</day>
<month>09</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>11</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Zhu, Li, Li, Xie and Wang.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Zhu, Li, Li, Xie and Wang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>Clustering is an efficient way to analyze single-cell RNA sequencing data. It is commonly used to identify cell types, which can help in understanding cell differentiation processes. However, different clustering results can be obtained from different single-cell clustering methods, sometimes including conflicting conclusions, and biologists will often fail to get the right clustering results and interpret the biological significance. The cluster ensemble strategy can be an effective solution for the problem. As the graph partitioning-based clustering methods are good at clustering single-cell, we developed Sc-GPE, a novel cluster ensemble method combining five single-cell graph partitioning-based clustering methods. The five methods are SNN-cliq, PhenoGraph, SC3, SSNN-Louvain, and MPGS-Louvain. In Sc-GPE, a consensus matrix is constructed based on the five clustering solutions by calculating the probability that the cell pairs are divided into the same cluster. It solved the problem in the hypergraph-based ensemble approach, including the different cluster labels that were assigned in the individual clustering method, and it was difficult to find the corresponding cluster labels across all methods. Then, to distinguish the different importance of each method in a clustering ensemble, a weighted consensus matrix was constructed by designing an importance score strategy. Finally, hierarchical clustering was performed on the weighted consensus matrix to cluster cells. To evaluate the performance, we compared Sc-GPE with the individual clustering methods and the state-of-the-art SAME-clustering on 12 single-cell RNA-seq datasets. The results show that Sc-GPE obtained the best average performance, and achieved the highest NMI and ARI value in five datasets.</p></abstract>
<kwd-group>
<kwd>single-cell clustering</kwd>
<kwd>cluster ensemble</kwd>
<kwd>consensus matrix</kwd>
<kwd>importance score</kwd>
<kwd>graph partitioning</kwd>
</kwd-group>
<contract-num rid="cn001">61662028</contract-num>
<contract-num rid="cn001">61702555</contract-num>
<contract-num rid="cn001">61762087</contract-num>
<contract-num rid="cn001">61772557</contract-num>
<contract-num rid="cn001">61841603</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<contract-sponsor id="cn002">Natural Science Foundation of Guangxi Province<named-content content-type="fundref-id">10.13039/501100004607</named-content></contract-sponsor>
<counts>
<fig-count count="4"/>
<table-count count="2"/>
<equation-count count="8"/>
<ref-count count="41"/>
<page-count count="9"/>
<word-count count="5387"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Single-cell RNA sequencing (scRNA-seq) data measures the gene expression level in individual cells instead of the average gene expression level in bulk RNA-seq cells (Stuart and Satija, <xref ref-type="bibr" rid="B25">2019</xref>). So, it has advantages in accurately identifying the transcriptomic signatures for cell types (Gr&#x000FC;n et al., <xref ref-type="bibr" rid="B10">2015</xref>). Along with the rapid development of scRNA-seq technologies, the cost of sequencing is reduced, and larger datasets are generated, carrying a higher error rate (Vitak et al., <xref ref-type="bibr" rid="B29">2017</xref>). The development brought some computational challenges (Kiselev et al., <xref ref-type="bibr" rid="B14">2019</xref>; Zhu et al., <xref ref-type="bibr" rid="B39">2019a</xref>), for example, (1) high noise. The drop-out rate from reverse transcription failure and sequencing depth would reach 80% (Soneson and Robinson, <xref ref-type="bibr" rid="B23">2018</xref>; Andrews and Hemberg, <xref ref-type="bibr" rid="B1">2019</xref>); (2) high dimension. The dimension usually exceeds 10,000, making it difficult to measure the similarity of cell pairs; (3) larger sample size. The sample size increases from dozens to hundreds of thousands, which raises the time and complexity involved in identifying cell types (Grun, <xref ref-type="bibr" rid="B9">2020</xref>).</p>
<p>Clustering is an efficient way of analyzing scRNA-seq data to identify novel cell types, and some single-cell clustering methods are proposed (Xu et al., <xref ref-type="bibr" rid="B33">2019</xref>; Yip et al., <xref ref-type="bibr" rid="B36">2019</xref>). However, it can be observed that the clustering results from various clustering methods are different in the number of clusters and cell assignments. Meanwhile, no method performs best on all scRNA-seq datasets. The reason is that the existing methods focus on a different step in identifying cell types, including data denoising (Wang et al., <xref ref-type="bibr" rid="B30">2018</xref>), dimensionality reduction (Wang and Gu, <xref ref-type="bibr" rid="B31">2018</xref>; Becht et al., <xref ref-type="bibr" rid="B2">2019</xref>), similarity measurement (Kim et al., <xref ref-type="bibr" rid="B13">2019</xref>) and clustering (Qi et al., <xref ref-type="bibr" rid="B21">2019</xref>; Zhu et al., <xref ref-type="bibr" rid="B40">2019b</xref>). Notably, the similarity measurement plays an important role in identifying cell types. Some graph partitioning-based clustering methods achieved better performance for the accurate similarity measurement. For example, SNN-cliq (Xu and Su, <xref ref-type="bibr" rid="B32">2015</xref>) constructed a weighted shared nearest neighbor (SNN) graph; and clustered cells by partitioning the cliques on the graph. PhenoGraph (Levine et al., <xref ref-type="bibr" rid="B17">2015</xref>) performed another weighted strategy to generate an SNN graph; and partitioned the graph using the Louvain community detection method. SSNN-Louvain (Zhu et al., <xref ref-type="bibr" rid="B41">2020</xref>) integrated the structural information to construct a structural SNN graph; and clustered cells by modifying the Louvain community detection method. The cells are sorted as per their importance in the initialization step of Louvain community detection method. MPGS-Louvain (Zhu et al., <xref ref-type="bibr" rid="B38">2019c</xref>) constructed a novel global and path-based similarity graph, and also partitioned it using a modified Louvain community detection method. Therefore, it is a challenge to enhance the accuracy of clustering by combining more efficient clustering information in multiple views.</p>
<p>An increasing number of research shows that the cluster ensemble method is a good idea, which integrates the information of each clustering method in a different view (Kuncheva and Vetrov, <xref ref-type="bibr" rid="B16">2006</xref>; Vega-Pons and Ruiz-Shulcloper, <xref ref-type="bibr" rid="B28">2011</xref>; Liu et al., <xref ref-type="bibr" rid="B19">2019</xref>). ISSCE (Yu et al., <xref ref-type="bibr" rid="B37">2016</xref>) designed a clustering ensemble strategy to cluster high dimensional data, including three steps: firstly, the incremental approach was implemented to select clustering members; secondly, the random subspace division was applied to handle high dimensional data; finally, the constraint propagation method was used to integrate prior knowledge. Recently, some cluster ensemble methods for scRNA-seq data have been proposed. SC3 (Kiselev et al., <xref ref-type="bibr" rid="B15">2017</xref>) ensembled several clustering results from <italic>k</italic>-means algorithm into a consensus matrix; and clustered cells using hierarchical clustering (HC). SAFE-clustering (Yang et al., <xref ref-type="bibr" rid="B35">2019</xref>) implemented a hypergraph-based strategy to ensemble CIDR, Seurat, tSNE, and SC3 to construct a consensus matrix. <italic>k</italic>-means was used to cluster cells. They also proposed the SAME-clustering (Huh et al., <xref ref-type="bibr" rid="B12">2020</xref>) methods by using a consensus matrix-based strategy to ensemble the same four clustering methods and combining the Expectation-Maximization algorithm to cluster cells. We find that these cluster ensemble methods are based on hypergraph-based or voting-based integrated learning and do not consider the different importance of the individual clustering method.</p>
<p>According to the principle that the minority is subordinate to the majority, we assume that the more consistent the cluster labels predicted by different clustering methods are, the more accurate they will be. That is, the individual clustering method with a higher similarity to others would be more important in the cluster ensemble strategy. Base on this assumption, we propose a novel graph partitioning-based ensemble method for single-cell clustering (Sc-GPE), integrating SNN-cliq, PhenoGraph, SSNN-Louvain, MPGS-Louvain, and SC3 by a weighted voting-based method. To measure the importance of the individual clustering method, we design a scoring strategy based on the adjusted rand index (ARI) (Hubert and Arabie, <xref ref-type="bibr" rid="B11">1985</xref>). Then we construct a weighted consensus matrix, the weight is a score of the importance of each method. Finally, HC is performed to cluster cells. To prove the performance, Sc-GPE is compared to the five original clustering methods and the state-of-the-art cluster ensemble method &#x0201C;SAME-clustering.&#x0201D; The results demonstrate that Sc-GPE outperforms other methods.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and Methods</title>
<p>According to the analysis above, we can find that integrating multiple clustering results would merge more information in different views. Moreover, different clustering methods play different roles in integration. Inspired by these ideas, we propose the Sc-GPE method by ensembling five graph partitioning-based clustering methods which are SNN-cliq, PhenoGraph, SSNN-Louvan, MPGS-Louvain, and SC3. The main reasons for choosing the five clustering methods are as follows: firstly, the first four clustering methods are graph partitioning-based methods, and the last one is the consensus matrix-based method. Their good performance provides the basis to improve the accuracy of the cluster ensemble. Secondly, in the five clustering methods, different strategies of similarity graph construction and graph partitioning have been implemented, respectively. They would enhance the generalization ability of clustering. Sc-GPE has three following advantages: (1) it does not need to deal with the problem of different cluster labels from different cluster methods, so it is suitable for unsupervised clustering lacking the true cluster labels; (2) It is easy to implement since no special parameters need to be adjusted; (3) The weighted strategy is comprehensible and effective.</p>
<sec>
<title>Sc-GPE</title>
<p>In Sc-GPE, a gene expression matrix with <italic>m</italic> rows (genes) and <italic>n</italic> columns (cells) is the input of the five clustering methods. The five clustering results sets are achieved and ensembled into a consensus matrix with <italic>n</italic> rows (cells) and <italic>n</italic> columns (cells). Then, based on the consensus matrix, a weighted consensus matrix is constructed by measuring the importance of the individual clustering method. That is, the voting strategy in the original consensus matrix is replaced as a weighted voting strategy, and the weight is determined according to the similarity of the clustering result pairs. The overview of Sc-GPE method is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>The overview of the Sc-GPE method. <bold>(A)</bold> The gene expression matrix is input; <bold>(B)</bold> five individual clustering methods are performed to generate five clustering solutions; <bold>(C)</bold> the original consensus matrix is constructed; <bold>(D)</bold> the weighted consensus matrix is produced by measuring the importance of the individual clustering methods; <bold>(E)</bold> HC clustering is performed.</p></caption>
<graphic xlink:href="fgene-11-604790-g0001.tif"/>
</fig>
<p>Cells are defined as set <italic>C</italic> = {<italic>c</italic><sub>1</sub>, &#x02026;, <italic>c</italic><sub><italic>n</italic></sub>}, where <italic>n</italic> is the number of cells. Let <italic>k</italic> be the number of individual clustering methods, the clustering results set is defined as <italic>R</italic>= {<italic>R</italic><sup>1</sup>, &#x02026;, <italic>R</italic><sup><italic>k</italic></sup>}. So, in the <italic>k</italic> clustering methods, the <italic>i</italic>-th cell <italic>c</italic><sub><italic>i</italic></sub> is assigned to <italic>k</italic> predicted cluster labels, denoted as <italic>R</italic>(<italic>c</italic><sub><italic>i</italic></sub>) = {<italic>R</italic><sup>1</sup>(<italic>c</italic><sub><italic>i</italic></sub>), &#x02026;, <italic>R</italic><sup><italic>k</italic></sup>(<italic>c</italic><sub><italic>i</italic></sub>)}. The detail of Sc-GPE is described as follows.</p>
<p>Firstly, the original consensus matrix is constructed. The consensus matrix <bold><italic>I</italic></bold><sub><italic>x, y</italic></sub> is calculated based on Equations (1) and (2). In Equations (1) and (2), when the cell <italic>c</italic><sub><italic>x</italic></sub> and cell <italic>c</italic><sub><italic>y</italic></sub> are assigned into the same cluster in the <italic>l</italic>-th method, the value of <inline-formula><mml:math id="M1"><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> is equal to 1, otherwise is 0. The element of the consensus matrix presents the probability of cell pairs divided into the same cluster by each method. For example, when <italic>k</italic> is 5, the element of the consensus matrix <bold><italic>I</italic></bold><sub><italic>x, y</italic></sub> equals the sum of <inline-formula><mml:math id="M2"><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> in the five methods multiplying by the same weight 1/5. Because this represents the probability of the occurrence of cell pairs in the same cluster, this strategy does not need to solve the problem that each cell achieves different cluster labels from the individual clustering methods.</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="true">{</mml:mo><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mi>X</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>Y</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>Y</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>c</italic><sub><italic>x</italic></sub> and <italic>c</italic><sub><italic>y</italic></sub> are cell pairs in cells set <italic>C</italic>. <italic>k</italic> is the number of individual clustering methods. <italic>R</italic><sup><italic>l</italic></sup> is the clustering results in the <italic>l</italic>-th method.</p>
<p>Next, based on the assumption that the more consistent cluster labels predicted by all the clustering methods are more accurate, we design an importance score of the individual clustering methods. As ARI is a popular index for measuring the consensus of two clustering solutions, we use ARI to measure the importance of the individual clustering method. The importance score is defined as Equations (3) and (4). In Equations (3) and (4), &#x003C9;<sub><italic>l</italic></sub> denotes the importance of the <italic>l</italic>-th clustering method in all <italic>k</italic> methods. <italic>r</italic><sub><italic>l</italic></sub> represents the similarity between the <italic>l</italic>-th clustering method and other methods, which is calculated by averaging the ARI between predicted clusters in the <italic>l</italic>-th clustering method and the ones in each of the other methods.</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>A</mml:mi><mml:mi>R</mml:mi><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003C9;<sub><italic>l</italic></sub> is the importance score of the <italic>l</italic>-th clustering method. <italic>r</italic><sub><italic>l</italic></sub> is the average of ARI between predicted clusters from the <italic>l</italic>-th method and other methods, and <italic>k</italic> is the number of individual clustering methods.</p>
<p>Then, the weighted consensus matrix is constructed by introducing the importance score of the individual clustering method to the original consensus matrix. The weighted consensus matrix <bold><italic>I</italic></bold><sub><italic>x, y</italic></sub>&#x00027; is defined as Equation (5). In Equation (5), the weighted consensus matrix <bold><italic>I</italic></bold><sub><italic>x, y</italic></sub>&#x00027; multiplies the importance score &#x003C9;<sub><italic>l</italic></sub> of the individual clustering methods, instead of the constant 1/<italic>k</italic> in the original consensus matrix.</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>&#x003C9;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Finally, the HC method is performed to cluster cells on the weighted consensus matrix.</p>
</sec>
<sec>
<title>Evaluation Indices</title>
<p>We use two popular indices to evaluate the performance of clustering methods, including Normalized Mutual Information (NMI) (Est&#x000E9;vez et al., <xref ref-type="bibr" rid="B6">2009</xref>) and Adjusted Rand Index (ARI) (Hubert and Arabie, <xref ref-type="bibr" rid="B11">1985</xref>). The two criteria are statistic-based indicators, showing the consensus of the predicted labels and the true ones in different views. NMI demonstrates the difference by calculating Mutual Information and Entropy between the two clustering solutions, with the range of values from 0 to 1. ARI presents the probability that a data pair will appear in the same cluster in the true clusters and the predicted clusters, with the range of values from &#x02212;1 to 1. The higher the NMI or ARI value obtained, the better performance the method has.</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>N</mml:mi><mml:mi>M</mml:mi><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>Q</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mfrac><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>;</mml:mo><mml:mi>Q</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>Q</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>I</italic>(<italic>P</italic>; <italic>Q</italic>) is the mutual information between <italic>P</italic> and <italic>Q</italic>. H(<italic>P</italic>) and H(<italic>Q</italic>) is the entropy of <italic>P</italic> and <italic>Q</italic>, respectively.</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mi>R</mml:mi><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mi>/</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mi>n</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mi>/</mml:mi><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mi>n</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>n</italic> is the number of cells. In the contingency table resulting from the overlap between true clusters and predicted ones, <italic>n</italic><sub><italic>ij</italic></sub> is the element in the <italic>i</italic>-th row and the <italic>j</italic>-th column, <italic>a</italic><sub><italic>i</italic></sub> is the summation of the elements in the <italic>i</italic>-th row, and <italic>b</italic><sub><italic>j</italic></sub> is the summation of the elements in the <italic>j</italic>-th column.</p>
</sec>
<sec>
<title>Datasets</title>
<p>We collected 12 published scRNA-seq datasets. Generally, they serve as gold standard datasets with true labels. They are available from Gene Expression Omnibus (GEO) and European Bioinformatics Institute (EMBL-EBI), respectively. These datasets have been normalized to various units, such as Transcripts Per Million reads (TPM), Fragments Per Kilobase of transcript per Million fragments mapped (FPKM), and Reads Per Kilobase per Million mapped reads (RPKM), etc. The details of the datasets are presented in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>The detail of scRNA-seq datasets.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Accessed ID</bold></th>
<th valign="top" align="left"><bold>Datasets</bold></th>
<th valign="top" align="left"><bold>Data unit</bold></th>
<th valign="top" align="center"><bold>&#x00023;Cells</bold></th>
<th valign="top" align="center"><bold>&#x00023;Genes</bold></th>
<th valign="top" align="center"><bold>&#x00023;Cell types</bold></th>
<th valign="top" align="left"><bold>References</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">GSE38495</td>
<td valign="top" align="left">Ramskold</td>
<td valign="top" align="left">RPKM</td>
<td valign="top" align="center">33</td>
<td valign="top" align="center">21042</td>
<td valign="top" align="center">7</td>
<td valign="top" align="left">Ramsk&#x000F6;ld et al., <xref ref-type="bibr" rid="B22">2012</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE57249</td>
<td valign="top" align="left">Biase</td>
<td valign="top" align="left">FPKM</td>
<td valign="top" align="center">49</td>
<td valign="top" align="center">25384</td>
<td valign="top" align="center">3</td>
<td valign="top" align="left">Biase et al., <xref ref-type="bibr" rid="B3">2014</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE36552</td>
<td valign="top" align="left">Yan</td>
<td valign="top" align="left">RPKM</td>
<td valign="top" align="center">90</td>
<td valign="top" align="center">20214</td>
<td valign="top" align="center">6</td>
<td valign="top" align="left">Yan et al., <xref ref-type="bibr" rid="B34">2013</xref></td>
</tr>
<tr>
<td valign="top" align="left">E-MTAB-3321</td>
<td valign="top" align="left">Goolam</td>
<td valign="top" align="left">RPM</td>
<td valign="top" align="center">124</td>
<td valign="top" align="center">40315</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">Goolam et al., <xref ref-type="bibr" rid="B7">2016</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE70657</td>
<td valign="top" align="left">Grover</td>
<td valign="top" align="left">RPKM</td>
<td valign="top" align="center">135</td>
<td valign="top" align="center">15158</td>
<td valign="top" align="center">2</td>
<td valign="top" align="left">Grover et al., <xref ref-type="bibr" rid="B8">2016</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE70605</td>
<td valign="top" align="left">Liu</td>
<td valign="top" align="left">RPKM</td>
<td valign="top" align="center">145</td>
<td valign="top" align="center">18855</td>
<td valign="top" align="center">25</td>
<td valign="top" align="left">Liu et al., <xref ref-type="bibr" rid="B18">2016</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE51372</td>
<td valign="top" align="left">Ting</td>
<td valign="top" align="left">RPM</td>
<td valign="top" align="center">187</td>
<td valign="top" align="center">21583</td>
<td valign="top" align="center">7</td>
<td valign="top" align="left">Ting et al., <xref ref-type="bibr" rid="B26">2014</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE85908</td>
<td valign="top" align="left">Yeo</td>
<td valign="top" align="left">TPM</td>
<td valign="top" align="center">214</td>
<td valign="top" align="center">27473</td>
<td valign="top" align="center">4</td>
<td valign="top" align="left">Song et al., <xref ref-type="bibr" rid="B24">2017</xref></td>
</tr>
<tr>
<td valign="top" align="left">E-MTAB-2805</td>
<td valign="top" align="left">Pollen</td>
<td valign="top" align="left">TPM</td>
<td valign="top" align="center">249</td>
<td valign="top" align="center">6982</td>
<td valign="top" align="center">11</td>
<td valign="top" align="left">Pollen et al., <xref ref-type="bibr" rid="B20">2014</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE45719</td>
<td valign="top" align="left">Deng</td>
<td valign="top" align="left">RPKM</td>
<td valign="top" align="center">259</td>
<td valign="top" align="center">22147</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Deng et al., <xref ref-type="bibr" rid="B5">2014</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE52529</td>
<td valign="top" align="left">Trapnell</td>
<td valign="top" align="left">FPKM</td>
<td valign="top" align="center">372</td>
<td valign="top" align="center">35988</td>
<td valign="top" align="center">4</td>
<td valign="top" align="left">Trapnell et al., <xref ref-type="bibr" rid="B27">2014</xref></td>
</tr>
<tr>
<td valign="top" align="left">GSE67835</td>
<td valign="top" align="left">Darmanis</td>
<td valign="top" align="left">CPM</td>
<td valign="top" align="center">466</td>
<td valign="top" align="center">22085</td>
<td valign="top" align="center">9</td>
<td valign="top" align="left">Darmanis et al., <xref ref-type="bibr" rid="B4">2015</xref></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s3">
<title>Experiments and Results</title>
<sec>
<title>Implementation of the Five Clustering Methods</title>
<p>For optimal performance, we performed the five clustering methods with the default parameters in the references. The details of the parameters are described as follows.</p>
<p>For SNN-cliq, the nearest neighbor parameter <italic>k</italic> is set to 3; the connectivity parameter of quasi-cliques <italic>r</italic> is set to 0.7; the threshold of the overlap of quasi-cliques <italic>m</italic> is set to 0.5.</p>
<p>For PhenoGraph, the surface marker expression data is normalized based on dividing by the maximum values. To construct the SNN graph, the nearest neighbor parameter <italic>k</italic> is set to 50.</p>
<p>For SC3, the log-transformed normalized log<sub>2</sub>(<italic>x</italic>&#x0002B;1) is performed.</p>
<p>For SSNN-Louvain and MPGS-Louvain, SIMLR is performed with the default parameters in the initial similarity measurement step. The width parameter of the Gaussian kernel function &#x003C3; is set to 1.0, 1.25, 1.5, 1.75, and 2. The nearest neighbor parameter <italic>k</italic> is set to 10, 12, 14&#x02026; 30. (&#x003C3;, <italic>k</italic>) pair resulting in 55 Gaussian kernels. In SSNN-Louvain, to construct the structural SNN graph, the nearest neighbor parameter <italic>k</italic> is set to 0.1<italic>n</italic> (<italic>n</italic> is the number of nodes). In MPGS-Louvain, the path length <italic>l</italic> is set to 2 for high performance and low time complexity.</p>
<p>Furthermore, in SNN-cliq, PhenoGraph, SSNN-Louvain, and MPGS-Louvain, the number of categories can be automatically estimated by using quasi-clique partition or Louvain community detection, without a priori true categories.</p>
</sec>
<sec>
<title>Similarity Measurement of the Individual Clustering Methods</title>
<p>To analyze the difference of predicted results between the individual clustering methods, we calculate the ARI between the different clustering results and provide the consensus matrix heatmap. We select four scRNA-seq datasets: Ramskold, Yan, Yeo, and Liu, in which the Ramskold dataset is easy to partition while the Liu dataset is hard to cluster. The first three datasets have a smaller number of true categories from four to seven, and the latter dataset has the true categories 25. The heatmaps are shown in <xref ref-type="fig" rid="F2">Figure 2</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The similarity of the individual clustering methods. <bold>(A)</bold> Liu dataset; <bold>(B)</bold> Ramskold dataset; <bold>(C)</bold> Yan dataset; <bold>(D)</bold> Yeo dataset.</p></caption>
<graphic xlink:href="fgene-11-604790-g0002.tif"/>
</fig>
<p>From <xref ref-type="fig" rid="F2">Figure 2</xref>, it is observed that some faint similarity exists among the solutions of the individual clustering methods, which is consistent with the results from Yang et al. (<xref ref-type="bibr" rid="B35">2019</xref>). In different datasets, the similarities between the results of the individual clustering methods vary. For example, SSNN-Louvain shows relatively high similarity with SC3 and PhenoGraph on the Liu dataset. MPGS-Louvain shows a higher similarity than other clustering methods to the Ramskold dataset. SC3 is observed in the high similar to PhenoGraph on the Yan dataset. SNN-cliq shows a low similarity with other methods on the Yeo dataset. The difference between SC3 and PhenoGraph varies greatly in different datasets. The similarity between SC3 and PhenoGraph is close to one on the Yan and Yeo datasets, but the opposite results are achieved on the Liu and Ramskold datasets.</p>
<p>Furthermore, we can observe big differences between SNN-cliq and SC3, PhenoGraph on the four datasets. Therefore, we can find that different clustering methods would capture information about scRNA-seq data from different perspectives.</p>
</sec>
<sec>
<title>Comparisons With the Individual Clustering Methods and SAME-Clustering</title>
<p>To test the performance of our proposed Sc-GPE method, we compare it with both the five clustering methods and the state-of-the-art clustering ensemble algorithm SAME-clustering on 12 scRNA-seq datasets in terms of NMI and ARI. The results are shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. SAME-Clustering achieves the NA value of NMI and ARI on the Pollen dataset, because the clustering member Seurat in SAME-Clustering failed to run on this dataset.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>The performance of Sc-GPE, MPGS-Louvain, SSNN-Louvain, SSNN-cliq, PhenoGraph, and SC3. <bold>(A)</bold> NMI; <bold>(B)</bold> ARI.</p></caption>
<graphic xlink:href="fgene-11-604790-g0003.tif"/>
</fig>
<p>From the experimental results, Sc-GPE achieves the highest average of NMI and ARI in all methods. Sc-GPE outperforms the six methods on five scRNA-seq datasets: Yan, Grover, Liu, Yeo, and Ramskold, while SC3 achieves the best performance on five scRNA-seq datasets: Biase, Deng, Pollen, Ting, and Goolam. The averages of NMI and ARI obtained by Sc-GPE are 6.92 and 17.79% higher than those of SC3, respectively. SAME-Clustering works best on three datasets: Biase, Darmanis, and Trapnell. The averages of NMI and ARI obtained by Sc-GPE are 21.84 and 20.19% higher than those of SAME-clustering, respectively. A large difference in clustering performance can be observed on the Grover, Liu, and Goolam datasets. The results show that Sc-GPE performs well and outperforms other methods.</p>
<p>Moreover, we compare the number of clusters in the seven methods, shown in <xref ref-type="table" rid="T2">Table 2</xref>. It can be observed that the number of predicted clusters has an obvious influence on the clustering solutions. For example, the clustering number of SNN-cliq and PhonoGraph is quite different from that of other methods, which is in consensus with their relatively poor performance on most datasets. SNN-cliq achieves the clustering numbers commonly more than the true categories except for the pollen dataset, PhonoGraph is just the opposite.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>The comparison of the number of clusters from seven methods.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Datasets</bold></th>
<th valign="top" align="center"><bold>Sc-GPE</bold></th>
<th valign="top" align="center"><bold>MPGS-Louvain</bold></th>
<th valign="top" align="center"><bold>SSNN-Louvain</bold></th>
<th valign="top" align="center"><bold>SNN-cliq</bold></th>
<th valign="top" align="center"><bold>PhonoGraph</bold></th>
<th valign="top" align="center"><bold>SC3</bold></th>
<th valign="top" align="center"><bold>SAME-clustering</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Ramskold</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left">Biase</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Yan</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">18</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Goolam</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">25</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Grover</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left">Liu</td>
<td valign="top" align="center">25</td>
<td valign="top" align="center">15</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">26</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">Ting</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">21</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">Yeo</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">28</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">3</td>
</tr>
<tr>
<td valign="top" align="left">Pollen</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">NA<xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">Deng</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">7</td>
<td valign="top" align="center">43</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">5</td>
</tr>
<tr>
<td valign="top" align="left">Trapnell</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">56</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left">Darmanis</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">5</td>
<td valign="top" align="center">38</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">5</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1"><label>&#x0002A;</label><p><italic>SAME-Clustering method achieves NA on the Pollen dataset for that the clustering member Seurat in SAME-Clustering failed to run on this dataset</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>To further demonstrate the performance of Sc-GPE, we provide a box plot of the seven methods for 12 datasets, measured by NMI and ARI, shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. The box plot clearly shows that Sc-GPE outperforms the other six methods. The worse ARI value of 0.249 in Sc-GPE is from the Trapnell dataset, where some cells are misallocated resulting from two poor clustering solutions. SNN-cliq achieves the worst results in terms of ARI, and PhenoGraph performs worst on the NMI.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>The box plot of performance for the seven methods. <bold>(A)</bold> NMI; <bold>(B)</bold> ARI.</p></caption>
<graphic xlink:href="fgene-11-604790-g0004.tif"/>
</fig>
</sec>
</sec>
<sec sec-type="conclusions" id="s4">
<title>Conclusions</title>
<p>Currently, various single-cell clustering algorithms have been proposed with the advantage of accurately representing cell heterogeneity. However, there is a problem that the predicted cluster results from different clustering methods are quite different, which would limit the generalization capabilities. Combining the information from different cluster results would be a good resolution to improve the performance of clustering. So, we propose a novel cluster ensemble method Sc-GPE, which integrating five clustering methods: SNN-cliq, PhenoGraph, SSNN-Louvain, MPGS-Louvain, and SC3.</p>
<p>In Sc-GPE, a consensus matrix-based ensemble model is performed. It is a good statistics approach that can solve the problem of the different cluster labels generated in the individual clustering methods making it difficult to determine the correspondence cluster labels across all methods, which usually exists in the hypergraph-based cluster ensemble method. Furthermore, a weighted strategy is designed to measure the importance of individual clustering methods according to the similarity with other methods. A weighted consensus matrix is constructed based on the weighted strategy, which can distinguish the role of the individual clustering methods.</p>
<p>Sc-GPE provides close-to-the-best clustering solutions by combing the clustering methods that perform various similarity measurements and graph partitioning algorithms. The experimental results from twelve scRNA-seq datasets show that Sc-GPE outperforms the five individual clustering methods and state-of-the-art SAME-clustering method. However, the relatively small number of individual clustering methods may provide insufficient information and limit the performance of the Sc-GPE, and how to choose more optimal individual clustering methods should be researched in future work.</p>
</sec>
<sec sec-type="data-availability-statement" id="s5">
<title>Data Availability Statement</title>
<p>The datasets analyzed in this work are available in the following repositories: GEO: <ext-link ext-link-type="uri" xlink:href="https://xenabrowser.net/datapages/">https://xenabrowser.net/datapages/</ext-link>; EMBL-EBI: <ext-link ext-link-type="uri" xlink:href="https://www.ebi.ac.uk/">https://www.ebi.ac.uk/</ext-link> and details of the datasets can be found in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>XZ and JW: conceptualization and design. XZ and H-DL: writing. H-DL and MX: data acquisition. XZ and JL: methodology. All authors: contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack><p>This paper is recommended by the 5th CCF Bioinformatics Conference.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andrews</surname> <given-names>T. S.</given-names></name> <name><surname>Hemberg</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>M3Drop: dropout-based feature selection for scRNASeq</article-title>. <source>Bioinformatics</source> <volume>35</volume>, <fpage>2865</fpage>&#x02013;<lpage>2867</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty1044</pub-id><pub-id pub-id-type="pmid">30590489</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Becht</surname> <given-names>E.</given-names></name> <name><surname>Mcinnes</surname> <given-names>L.</given-names></name> <name><surname>Healy</surname> <given-names>J.</given-names></name> <name><surname>Dutertre</surname> <given-names>C.</given-names></name> <name><surname>Kwok</surname> <given-names>I. W. H.</given-names></name> <name><surname>Ng</surname> <given-names>L. G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>Dimensionality reduction for visualizing single-cell data using UMAP</article-title>. <source>Nat. Biotechnol.</source> <volume>37</volume>, <fpage>38</fpage>&#x02013;<lpage>44</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.4314</pub-id><pub-id pub-id-type="pmid">30531897</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Biase</surname> <given-names>F. H.</given-names></name> <name><surname>Cao</surname> <given-names>X.</given-names></name> <name><surname>Zhong</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing</article-title>. <source>Genome Res.</source> <volume>24</volume>, <fpage>1787</fpage>&#x02013;<lpage>1796</lpage>. <pub-id pub-id-type="doi">10.1101/gr.177725.114</pub-id><pub-id pub-id-type="pmid">25096407</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Darmanis</surname> <given-names>S.</given-names></name> <name><surname>Sloan</surname> <given-names>S. A.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Enge</surname> <given-names>M.</given-names></name> <name><surname>Caneda</surname> <given-names>C.</given-names></name> <name><surname>Shuer</surname> <given-names>L. M.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>A survey of human brain transcriptome diversity at the single cell level</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A.</source> <volume>112</volume>, <fpage>7285</fpage>&#x02013;<lpage>7290</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1507125112</pub-id><pub-id pub-id-type="pmid">26060301</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>Q.</given-names></name> <name><surname>Ramsk&#x000F6;ld</surname> <given-names>D.</given-names></name> <name><surname>Reinius</surname> <given-names>B.</given-names></name> <name><surname>Sandberg</surname> <given-names>R.</given-names></name></person-group> (<year>2014</year>). <article-title>Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells</article-title>. <source>Science</source> <volume>343</volume>, <fpage>193</fpage>&#x02013;<lpage>196</lpage>. <pub-id pub-id-type="doi">10.1126/science.1245316</pub-id><pub-id pub-id-type="pmid">24408435</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Est&#x000E9;vez</surname> <given-names>P. A.</given-names></name> <name><surname>Tesmer</surname> <given-names>M.</given-names></name> <name><surname>Perez</surname> <given-names>C. A.</given-names></name> <name><surname>Zurada</surname> <given-names>J. M.</given-names></name></person-group> (<year>2009</year>). <article-title>Normalized mutual information feature selection</article-title>. <source>IEEE Trans. Neural Netw.</source> <volume>20</volume>, <fpage>189</fpage>&#x02013;<lpage>201</lpage>. <pub-id pub-id-type="doi">10.1109/TNN.2008.2005601</pub-id><pub-id pub-id-type="pmid">19150792</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goolam</surname> <given-names>M.</given-names></name> <name><surname>Scialdone</surname> <given-names>A.</given-names></name> <name><surname>Graham</surname> <given-names>S. J.</given-names></name> <name><surname>Macaulay</surname> <given-names>I. C.</given-names></name> <name><surname>Jedrusik</surname> <given-names>A.</given-names></name> <name><surname>Hupalowska</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos</article-title>. <source>Cell</source> <volume>165</volume>, <fpage>61</fpage>&#x02013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2016.01.047</pub-id><pub-id pub-id-type="pmid">27015307</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grover</surname> <given-names>A.</given-names></name> <name><surname>Sanjuan-Pla</surname> <given-names>A.</given-names></name> <name><surname>Thongjuea</surname> <given-names>S.</given-names></name> <name><surname>Carrelha</surname> <given-names>J.</given-names></name> <name><surname>Giustacchini</surname> <given-names>A.</given-names></name> <name><surname>Gambardella</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells</article-title>. <source>Nat. Commun.</source> <volume>7</volume>:<fpage>11075</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms11075</pub-id><pub-id pub-id-type="pmid">27009448</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grun</surname> <given-names>D.</given-names></name></person-group> (<year>2020</year>). <article-title>Revealing dynamics of gene expression variability in cell state space</article-title>. <source>Nat. Methods</source> <volume>17</volume>, <fpage>45</fpage>&#x02013;<lpage>49</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0632-3</pub-id><pub-id pub-id-type="pmid">31740822</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gr&#x000FC;n</surname> <given-names>D.</given-names></name> <name><surname>Lyubimova</surname> <given-names>A.</given-names></name> <name><surname>Kester</surname> <given-names>L.</given-names></name> <name><surname>Wiebrands</surname> <given-names>K.</given-names></name> <name><surname>Basak</surname> <given-names>O.</given-names></name> <name><surname>Sasaki</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Single-cell messenger RNA sequencing reveals rare intestinal cell types</article-title>. <source>Nature</source> <volume>525</volume>, <fpage>251</fpage>&#x02013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.1038/nature14966</pub-id><pub-id pub-id-type="pmid">26287467</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hubert</surname> <given-names>L.</given-names></name> <name><surname>Arabie</surname> <given-names>P.</given-names></name></person-group> (<year>1985</year>). <article-title>Comparing partitions</article-title>. <source>J. Classif.</source> <volume>2</volume>, <fpage>193</fpage>&#x02013;<lpage>218</lpage>. <pub-id pub-id-type="doi">10.1007/BF01908075</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huh</surname> <given-names>R.</given-names></name> <name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Jiang</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>SAME-clustering: S ingle-cell A ggregated clustering via M ixture Model E nsemble</article-title>. <source>Nucleic Acids Res.</source> <volume>48</volume>, <fpage>86</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkz959</pub-id><pub-id pub-id-type="pmid">31777938</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>T.</given-names></name> <name><surname>Chen</surname> <given-names>I. R.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>A. Y.</given-names></name> <name><surname>Yang</surname> <given-names>J. Y. H.</given-names></name> <name><surname>Yang</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>Impact of similarity metrics on single-cell RNA-seq data clustering</article-title>. <source>Brief. Bioinform.</source> <volume>20</volume>, <fpage>2316</fpage>&#x02013;<lpage>2326</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bby076</pub-id><pub-id pub-id-type="pmid">30137247</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiselev</surname> <given-names>V. Y.</given-names></name> <name><surname>Andrews</surname> <given-names>T. S.</given-names></name> <name><surname>Hemberg</surname> <given-names>M.</given-names></name></person-group> (<year>2019</year>). <article-title>Challenges in unsupervised clustering of single-cell RNA-seq data</article-title>. <source>Nat. Rev. Genet.</source> <volume>20</volume>, <fpage>273</fpage>&#x02013;<lpage>282</lpage>. <pub-id pub-id-type="doi">10.1038/s41576-018-0088-9</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kiselev</surname> <given-names>V. Y.</given-names></name> <name><surname>Kirschner</surname> <given-names>K.</given-names></name> <name><surname>Schaub</surname> <given-names>M. T.</given-names></name> <name><surname>Andrews</surname> <given-names>T. S.</given-names></name> <name><surname>Yiu</surname> <given-names>A.</given-names></name> <name><surname>Chandra</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>SC3: consensus clustering of single-cell RNA-seq data</article-title>. <source>Nat. Methods</source> <volume>14</volume>, <fpage>483</fpage>&#x02013;<lpage>486</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.4236</pub-id><pub-id pub-id-type="pmid">28346451</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuncheva</surname> <given-names>L. I.</given-names></name> <name><surname>Vetrov</surname> <given-names>D. P.</given-names></name></person-group> (<year>2006</year>). <article-title>Evaluation of stability of k-means cluster ensembles with respect to random initialization</article-title>. <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>28</volume>, <fpage>1798</fpage>&#x02013;<lpage>1808</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2006.226</pub-id><pub-id pub-id-type="pmid">17063684</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levine</surname> <given-names>J. H.</given-names></name> <name><surname>Simonds</surname> <given-names>E. F.</given-names></name> <name><surname>Bendall</surname> <given-names>S. C.</given-names></name> <name><surname>Davis</surname> <given-names>K. L.</given-names></name> <name><surname>El-ad</surname> <given-names>D. A.</given-names></name> <name><surname>Tadmor</surname> <given-names>M. D.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis</article-title>. <source>Cell</source> <volume>162</volume>, <fpage>184</fpage>&#x02013;<lpage>197</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2015.05.047</pub-id><pub-id pub-id-type="pmid">26095251</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>W.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Wang</surname> <given-names>C.</given-names></name> <name><surname>Gao</surname> <given-names>Y.</given-names></name> <name><surname>Gao</surname> <given-names>R.</given-names></name> <name><surname>Kou</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Identification of key factors conquering developmental arrest of somatic cell cloned embryos by combining embryo biopsy and single-cell sequencing</article-title>. <source>Cell Discov.</source> <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1038/celldisc.2016.10</pub-id><pub-id pub-id-type="pmid">27462457</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>Z.</given-names></name> <name><surname>Liu</surname> <given-names>F.</given-names></name> <name><surname>Hong</surname> <given-names>C.</given-names></name> <name><surname>Gao</surname> <given-names>M.</given-names></name> <name><surname>Chen</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>&#x0201C;Detection of cell types from single-cell RNA-seq data using similarity via kernel preserving learning embedding,&#x0201D;</article-title> <source>in 2019 IEEE International Conference on Bioinformatics and Biomedicine</source> (<publisher-loc>San Diego, CL</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/BIBM47256.2019.8983395</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pollen</surname> <given-names>A. A.</given-names></name> <name><surname>Nowakowski</surname> <given-names>T. J.</given-names></name> <name><surname>Shuga</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>X.</given-names></name> <name><surname>Leyrat</surname> <given-names>A. A.</given-names></name> <name><surname>Lui</surname> <given-names>J. H.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex</article-title>. <source>Nat. Biotechnol.</source> <volume>32</volume>, <fpage>1053</fpage>&#x02013;<lpage>1058</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.2967</pub-id><pub-id pub-id-type="pmid">25086649</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qi</surname> <given-names>R.</given-names></name> <name><surname>Ma</surname> <given-names>A.</given-names></name> <name><surname>Ma</surname> <given-names>Q.</given-names></name> <name><surname>Zou</surname> <given-names>Q.</given-names></name></person-group> (<year>2019</year>). <article-title>Clustering and classification methods for single-cell RNA-sequencing data</article-title>. <source>Brief. Bioinform</source>. <volume>21</volume>, <fpage>1196</fpage>&#x02013;<lpage>1208</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbz062</pub-id><pub-id pub-id-type="pmid">31271412</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ramsk&#x000F6;ld</surname> <given-names>D.</given-names></name> <name><surname>Luo</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>Y.-C.</given-names></name> <name><surname>Li</surname> <given-names>R.</given-names></name> <name><surname>Deng</surname> <given-names>Q.</given-names></name> <name><surname>Faridani</surname> <given-names>O. R.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells</article-title>. <source>Nat. Biotechnol.</source> <volume>30</volume>, <fpage>777</fpage>&#x02013;<lpage>782</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.2282</pub-id><pub-id pub-id-type="pmid">22820318</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Soneson</surname> <given-names>C.</given-names></name> <name><surname>Robinson</surname> <given-names>M. D.</given-names></name></person-group> (<year>2018</year>). <article-title>Bias, robustness and scalability in single-cell differential expression analysis</article-title>. <source>Nat. Methods</source> <volume>15</volume>, <fpage>255</fpage>&#x02013;<lpage>261</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.4612</pub-id><pub-id pub-id-type="pmid">29481549</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Song</surname> <given-names>Y.</given-names></name> <name><surname>Botvinnik</surname> <given-names>O. B.</given-names></name> <name><surname>Lovci</surname> <given-names>M. T.</given-names></name> <name><surname>Kakaradov</surname> <given-names>B.</given-names></name> <name><surname>Liu</surname> <given-names>P.</given-names></name> <name><surname>Xu</surname> <given-names>J. L.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Single-cell alternative splicing analysis with expedition reveals splicing dynamics during neuron differentiation</article-title>. <source>Mol. Cell</source> <volume>67</volume>, <fpage>148</fpage>&#x02013;<lpage>161</lpage>.e145. <pub-id pub-id-type="doi">10.1016/j.molcel.2017.06.003</pub-id><pub-id pub-id-type="pmid">28673540</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stuart</surname> <given-names>T.</given-names></name> <name><surname>Satija</surname> <given-names>R.</given-names></name></person-group> (<year>2019</year>). <article-title>Integrative single-cell analysis</article-title>. <source>Nat. Rev. Genet.</source> <volume>20</volume>, <fpage>257</fpage>&#x02013;<lpage>272</lpage>. <pub-id pub-id-type="doi">10.1038/s41576-019-0093-7</pub-id><pub-id pub-id-type="pmid">30696980</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ting</surname> <given-names>D. T.</given-names></name> <name><surname>Wittner</surname> <given-names>B. S.</given-names></name> <name><surname>Ligorio</surname> <given-names>M.</given-names></name> <name><surname>Jordan</surname> <given-names>N. V.</given-names></name> <name><surname>Shah</surname> <given-names>A. M.</given-names></name> <name><surname>Miyamoto</surname> <given-names>D. T.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Single-cell RNA sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells</article-title>. <source>Cell Rep.</source> <volume>8</volume>, <fpage>1905</fpage>&#x02013;<lpage>1918</lpage>. <pub-id pub-id-type="doi">10.1016/j.celrep.2014.08.029</pub-id><pub-id pub-id-type="pmid">25242334</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trapnell</surname> <given-names>C.</given-names></name> <name><surname>Cacchiarelli</surname> <given-names>D.</given-names></name> <name><surname>Grimsby</surname> <given-names>J.</given-names></name> <name><surname>Pokharel</surname> <given-names>P.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Morse</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells</article-title>. <source>Nat. Biotechnol.</source> <volume>32</volume>, <fpage>381</fpage>&#x02013;<lpage>386</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.2859</pub-id><pub-id pub-id-type="pmid">24658644</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vega-Pons</surname> <given-names>S.</given-names></name> <name><surname>Ruiz-Shulcloper</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>A survey of clustering ensemble algorithms</article-title>. <source>Int. J. Pattern Recogn. Artif. Intell.</source> <volume>25</volume>, <fpage>337</fpage>&#x02013;<lpage>372</lpage>. <pub-id pub-id-type="doi">10.1142/S0218001411008683</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vitak</surname> <given-names>S. A.</given-names></name> <name><surname>Torkenczy</surname> <given-names>K. A.</given-names></name> <name><surname>Rosenkrantz</surname> <given-names>J. L.</given-names></name> <name><surname>Fields</surname> <given-names>A. J.</given-names></name> <name><surname>Christiansen</surname> <given-names>L.</given-names></name> <name><surname>Wong</surname> <given-names>M. H.</given-names></name> <etal/></person-group>. (<year>2017</year>). <article-title>Sequencing thousands of single-cell genomes with combinatorial indexing</article-title>. <source>Nat. Methods</source> <volume>14</volume>, <fpage>302</fpage>&#x02013;<lpage>308</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.4154</pub-id><pub-id pub-id-type="pmid">28135258</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>B.</given-names></name> <name><surname>Pourshafeie</surname> <given-names>A.</given-names></name> <name><surname>Zitnik</surname> <given-names>M.</given-names></name> <name><surname>Zhu</surname> <given-names>J.</given-names></name> <name><surname>Bustamante</surname> <given-names>C.</given-names></name> <name><surname>Batzoglou</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>Network enhancement as a general method to denoise weighted biological networks</article-title>. <source>Nat. Commun.</source> <volume>9</volume>:<fpage>3108</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-018-05469-x</pub-id><pub-id pub-id-type="pmid">30082777</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Gu</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder</article-title>. <source>Genom. Proteom. Bioinform.</source> <volume>16</volume>, <fpage>320</fpage>&#x02013;<lpage>331</lpage>. <pub-id pub-id-type="doi">10.1016/j.gpb.2018.08.003</pub-id><pub-id pub-id-type="pmid">30576740</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>C.</given-names></name> <name><surname>Su</surname> <given-names>Z.</given-names></name></person-group> (<year>2015</year>). <article-title>Identification of cell types from single-cell transcriptomes using a novel clustering method</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>1974</fpage>&#x02013;<lpage>1980</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv088</pub-id><pub-id pub-id-type="pmid">25805722</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>H-D.</given-names></name> <name><surname>Pan</surname> <given-names>Y.</given-names></name> <name><surname>Luo</surname> <given-names>F.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;BioRank: a similarity assessment method for single cell clustering,&#x0201D;</article-title> in <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics</source> (<publisher-loc>Madrid</publisher-loc>), <fpage>157</fpage>&#x02013;<lpage>162</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2019.2931582</pub-id><pub-id pub-id-type="pmid">31369384</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname> <given-names>L.</given-names></name> <name><surname>Yang</surname> <given-names>M.</given-names></name> <name><surname>Guo</surname> <given-names>H.</given-names></name> <name><surname>Yang</surname> <given-names>L.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Li</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells</article-title>. <source>Nat. Struct. Mol. Biol.</source> <volume>20</volume>, <fpage>1131</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1038/nsmb.2660</pub-id><pub-id pub-id-type="pmid">23934149</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Y.</given-names></name> <name><surname>Huh</surname> <given-names>R.</given-names></name> <name><surname>Culpepper</surname> <given-names>H. W.</given-names></name> <name><surname>Lin</surname> <given-names>Y.</given-names></name> <name><surname>Love</surname> <given-names>M. I.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name></person-group> (<year>2019</year>). <article-title>SAFE-clustering: single-cell aggregated (from ensemble) clustering for single-cell RNA-seq data</article-title>. <source>Bioinformatics</source> <volume>35</volume>, <fpage>1269</fpage>&#x02013;<lpage>1277</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty793</pub-id><pub-id pub-id-type="pmid">30202935</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yip</surname> <given-names>S. H.</given-names></name> <name><surname>Sham</surname> <given-names>P. C.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2019</year>). <article-title>Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data</article-title>. <source>Brief. Bioinform.</source> <volume>20</volume>, <fpage>1583</fpage>&#x02013;<lpage>1589</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bby011</pub-id><pub-id pub-id-type="pmid">29481632</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>Z.</given-names></name> <name><surname>Luo</surname> <given-names>P.</given-names></name> <name><surname>You</surname> <given-names>J.</given-names></name> <name><surname>Wong</surname> <given-names>H.-S.</given-names></name> <name><surname>Leung</surname> <given-names>H.</given-names></name> <name><surname>Wu</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Incremental semi-supervised clustering ensemble for high dimensional data clustering</article-title>. <source>IEEE Trans. Knowl. Data Eng.</source> <volume>28</volume>, <fpage>701</fpage>&#x02013;<lpage>714</lpage>. <pub-id pub-id-type="doi">10.1109/TKDE.2015.2499200</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Guo</surname> <given-names>L.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name> <name><surname>Liao</surname> <given-names>X.</given-names></name> <name><surname>Wu</surname> <given-names>F.</given-names></name> <etal/></person-group>. (<year>2019c</year>). <article-title>A global similarity learning for clustering of single-cell RNA-seq data</article-title>. <source>2019 IEEE International Conference on Bioinformatics and Biomedicine</source> (<publisher-loc>San Diego, CA</publisher-loc>: <publisher-name>IEEE</publisher-name>). <pub-id pub-id-type="doi">10.1109/BIBM47256.2019.8983200</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>H-D.</given-names></name> <name><surname>Guo</surname> <given-names>L.</given-names></name> <name><surname>Wu</surname> <given-names>F-X.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2019a</year>). <article-title>Analysis of single-cell RNA-seq data by clustering approaches</article-title>. <source>Curr. Bioinf</source>. <volume>14</volume>, <fpage>314</fpage>&#x02013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.2174/1574893614666181120095038</pub-id><pub-id pub-id-type="pmid">28263960</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>H-D.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Guo</surname> <given-names>L.</given-names></name> <name><surname>Wu</surname> <given-names>F-X.</given-names></name> <name><surname>Duan</surname> <given-names>G.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name></person-group> (<year>2019b</year>). <article-title>A hybrid clustering algorithm for identifying cell types from single-cell RNA-Seq data</article-title>. <source>Genes</source>. <volume>10</volume>:<fpage>98</fpage>. <pub-id pub-id-type="doi">10.3390/genes10020098</pub-id><pub-id pub-id-type="pmid">30700040</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Xu</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Peng</surname> <given-names>X.</given-names></name> <name><surname>Li</surname> <given-names>H.</given-names></name></person-group> (<year>2020</year>). <article-title>Single-cell clustering based on shared nearest neighbor and graph partitioning</article-title>. <source>Interdiscip. Sci. Computat. Life Sci</source>. <volume>12</volume>, <fpage>117</fpage>&#x02013;<lpage>130</lpage>. <pub-id pub-id-type="doi">10.1007/s12539-019-00357-4</pub-id><pub-id pub-id-type="pmid">32086753</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This research was supported by the National Natural Science Foundation of China (Nos: 61762087, 61702555, 61662028, and 61772557), Hunan Provincial Science and Technology Program (No. 2018WK4001), 111 Project (No. B18059), and Natural Science Foundation of Guangxi Province (No. 2018JJA170175).</p>
</fn>
</fn-group>
</back>
</article>