<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">591315</article-id>
<article-id pub-id-type="doi">10.3389/fdata.2020.591315</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics</article-title>
<alt-title alt-title-type="left-running-head">Rovere et al.</alt-title>
<alt-title alt-title-type="right-running-head">CLUE: A Fast Parallel Clustering Algorithm</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Rovere</surname>
<given-names>Marco</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">
<sup>&#x2a;</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/829450/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chen</surname>
<given-names>Ziheng</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1051437/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Di Pilato</surname>
<given-names>Antonio</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pantaleo</surname>
<given-names>Felice</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/699348/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Seez</surname>
<given-names>Chris</given-names>
</name>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>European Organization for Nuclear Research (CERN), <addr-line>Meyrin</addr-line>, <country>Switzerland</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Northwestern University, <addr-line>Evanston</addr-line>, <addr-line>IL</addr-line>, <country>United States</country>
</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>University of Bari, <addr-line>Bari</addr-line>, <country>Italy</country>
</aff>
<aff id="aff4">
<label>
<sup>4</sup>
</label>National Institute for Nuclear Physics (INFN)&#x2014;Sezione di Bari, <addr-line>Bari</addr-line>, <country>Italy</country>
</aff>
<aff id="aff5">
<label>
<sup>5</sup>
</label>Imperial College London, South Kensington Campus, <addr-line>London</addr-line>, <country>United Kingdom</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/676757/overview">Daniele D&#x2019;Agostino</ext-link>, National Research Council (CNR), Italy</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/680427/overview">Corey Adams</ext-link>, Argonne Leadership Computing Facility (ALCF), United States</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/Anushree/overview">Anushree Ghosh</ext-link>, University of Padua, Italy</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Marco Rovere, <email>marco.rovere@cern.ch</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Big Data and AI in High Energy Physics, a section of the journal Frontiers in Big Data</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>27</day>
<month>11</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>3</volume>
<elocation-id>591315</elocation-id>
<history>
<date date-type="received">
<day>04</day>
<month>08</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>09</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2020 Pantaleo, Rovere, Chen, Di Pilato and Seez.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Pantaleo, Rovere, Chen, Di Pilato and Seez</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>One of the challenges of high granularity calorimeters, such as that to be built to cover the endcap region in the CMS Phase-2 Upgrade for HL-LHC, is that the large number of channels causes a surge in the computing load when clustering numerous digitized energy deposits (hits) in the reconstruction stage. In this article, we propose a fast and fully parallelizable density-based clustering algorithm, optimized for high-occupancy scenarios, where the number of clusters is much larger than the average number of hits in a cluster. The algorithm uses a grid spatial index for fast querying of neighbors and its timing scales linearly with the number of hits within the range considered. We also show a comparison of the performance on CPU and GPU implementations, demonstrating the power of algorithmic parallelization in the coming era of heterogeneous computing in high-energy physics.</p>
</abstract>
<kwd-group>
<kwd>graphics processing unit</kwd>
<kwd>clustering</kwd>
<kwd>density</kwd>
<kwd>calorimeters</kwd>
<kwd>high granularity</kwd>
<kwd>HL-LHC</kwd>
<kwd>FCC</kwd>
</kwd-group>
<contract-sponsor id="cn001">U.S. Department of Energy<named-content content-type="fundref-id">10.13039/100000015</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">Ministero dell&#x2019;Istruzione, dell&#x2019;Universit&#xe0; e della Ricerca<named-content content-type="fundref-id">10.13039/501100003407</named-content>
</contract-sponsor>
<counts>
<page-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction</title>
<p>Calorimeters with high lateral and longitudinal readout granularity, capable of providing a fine grained image of electromagnetic and hadronic showers, have been suggested for future high-energy physics experiments (<xref ref-type="bibr" rid="B1">CALICE Collaboration, 2012</xref>). The silicon sensor readout cells of the CMS endcap calorimeter (HGCAL) (<xref ref-type="bibr" rid="B3">CMS Collaboration, 2017</xref>) for HL-LHC (<xref ref-type="bibr" rid="B2">Apollinari et al., 2017</xref>) have an area of about <inline-formula id="inf1">
<mml:math>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msup>
<mml:mrow>
<mml:mtext>cm</mml:mtext>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. When a particle showers, the deposited energy is collected by the sensors on the layers that the shower traverses. The purpose of the clustering algorithm when applied to shower reconstruction is to group together individual energy deposits (hits) originating from a particle shower. Due to the high lateral granularity, the number of hits per layer is large, and it is computationally advantageous to collect together hits in 2D clusters layer-by-layer (<xref ref-type="bibr" rid="B2">Chen et al., 2017</xref>) and then associate these 2D clusters in different layers (<xref ref-type="bibr" rid="B3">CMS Collaboration, 2017</xref>).</p>
<p>However, a computational challenge emerges as a consequence of the large data scale and limited time budget. Event reconstruction is tightly constrained by a millisecond-level execution time. This constraint requires the clustering algorithm to be highly efficient while maintaining a low computational complexity. Furthermore, a linear scalability is strongly desired in order to avoid bottlenecking the performance of the entire event reconstruction. Finally, it is highly preferable to have a fully parallelizable clustering algorithm to take advantage of the trend of heterogeneous computing with hardware accelerators, such as graphics processing units (GPUs), achieving a higher event throughput and a better energy efficiency.</p>
<p>The input to the clustering algorithm is a set of <italic>n</italic> hits, whose number varies from a few thousands to a few millions, depending on the longitudinal and transverse granularity of the calorimeter as well as on the number of particles entering the detector. The output is a set of <italic>k</italic> clusters whose number is usually one or two orders of magnitude smaller than <italic>n</italic> and in principle depends on both the number of incoming particles and the number of layers. Assuming that the lateral granularity of sensors is constant and finite, the average number of hits in clusters (<inline-formula id="inf2">
<mml:math>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>) is also constant and finite. For example, in the CMS HGCAL, <italic>m</italic> is in the order of 10. This leads to the relation among the number of hits <italic>n</italic>, the number of clusters <italic>k</italic>, and the average number of hits in clusters <italic>m</italic> as <inline-formula id="inf3">
<mml:math>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x226b;</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>Most well-known algorithms do not simultaneously satisfy the requirements on linear scalability and easy parallelization for applications such as clustering hits in high granularity calorimeters, which is characterized by low dimension and <inline-formula id="inf4">
<mml:math>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x226b;</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>. It is therefore important to investigate new, fast, and parallelizable clustering algorithms, as well as their optimized accompanying spatial index that can be conveniently constructed and queried in parallel.</p>
<p>In this study, we describe CLUstering of Energy (CLUE), a novel and parallel density-based clustering. Its development was inspired by the work described in ref. (<xref ref-type="bibr" rid="B17">Rodriguez and Laio, 2014</xref>). In <xref ref-type="sec" rid="s2">Section 2</xref>, we describe the CLUE algorithm and its accompanying spatial index. Then in <xref ref-type="sec" rid="s3">Section 3</xref>, some details of GPU implementations are discussed. Finally, in <xref ref-type="sec" rid="s4">Section 4</xref> we present CLUE&#x2019;s ability on nonspherical cluster shapes and noise rejection, followed by its computational performance when executed on CPU and GPU with synthetic data, mimicking hits in high granularity calorimeters.</p>
</sec>
<sec id="s2">
<title>2. Clustering Algorithm</title>
<p>Clustering data is one of the most challenging tasks in several scientific domains. The definition of cluster is itself not trivial, as it strongly depends on the context. Many clustering methods have been developed based on a variety of induction principles (<xref ref-type="bibr" rid="B11">Maimon and Rokach, 2005</xref>). Currently popular clustering algorithms include (but are not limited to) partitioning, hierarchical, and density-based approaches (<xref ref-type="bibr" rid="B11">Maimon and Rokach, 2005</xref>; <xref ref-type="bibr" rid="B8">Han et al., 2012</xref>). Partitioning approaches, such as k-mean (<xref ref-type="bibr" rid="B10">Lloyd, 1982</xref>), compose clusters by optimizing a dissimilarity function based on distance. However, in the application to high granularity calorimeters, partitioning approaches are prohibitive because the number of clusters <italic>k</italic> is not known a priori. Hierarchical methods make clusters by constructing a dendrogram with a recursion of splitting or merging. However, hierarchical methods do not scale well because each decision to merge or split needs to scan over many objects or clusters (<xref ref-type="bibr" rid="B8">Han et al., 2012</xref>). Therefore, they are not suitable for our application. Density-based methods, such as DBSCAN (<xref ref-type="bibr" rid="B6">Ester et al., 1996</xref>), OPTICS (<xref ref-type="bibr" rid="B1">Ankerst et al., 1999</xref>), and Clustering by Fast Search and Find Density Peak (CFSFDP) (<xref ref-type="bibr" rid="B17">Rodriguez and Laio, 2014</xref>), group points by detecting continuous high-density regions. They are capable of discovering clusters of arbitrary shapes and are efficient for large spatial database. If a spatial index is used, their computational complexity is <inline-formula id="inf5">
<mml:math>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mtext>log</mml:mtext>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> (<xref ref-type="bibr" rid="B8">Han et al., 2012</xref>). However, one of the potential weaknesses of the currently well-known density-based algorithms is that they intrinsically include serial processes that are hard to parallelize: DBSCAN has to iteratively visit all points within an enclosure of density-connectedness before working on the next cluster (<xref ref-type="bibr" rid="B6">Ester et al., 1996</xref>); OPTICS needs to sequentially add points in an ordered list to obtain a dendrogram of reachability distance (<xref ref-type="bibr" rid="B1">Ankerst et al., 1999</xref>); CFSFDP needs to sequentially assign points to clusters in order of decreasing density (<xref ref-type="bibr" rid="B17">Rodriguez and Laio, 2014</xref>). In the application to high granularity calorimeters, as discussed in <xref ref-type="sec" rid="s1">Section 1</xref>, linear scalability and full parallelization are essential to handle a huge dataset efficiently by means of heterogeneous computing.</p>
<p>In order to satisfy these requirements, we propose a fast and fully parallelizable density-based algorithm (CLUE) inspired by CFSFDP. For the purpose of the algorithm, each sensor cell on a layer with its energy deposit is taken as a 2D point with an associated weight equaling its energy value. As in CFSFDP, two key variables are calculated for each point: the local density <italic>&#x3c1;</italic> and the separation <italic>&#x3b4;</italic> defined in <xref ref-type="disp-formula" rid="e3">Eqs 3</xref> <bold>and</bold> <xref ref-type="disp-formula" rid="e4">4</xref>, where <italic>&#x3b4;</italic> is the distance to the nearest point with higher density (&#x201c;nearest-higher&#x201d;), which is slightly adapted from that in CFSFDP in order to take advantage of the spatial index. Then cluster seeds and outliers are identified based on thresholds on <italic>&#x3c1;</italic> and <italic>&#x3b4;</italic>. Differing from cluster assignment in CFSFDP, which sorts density and adds points to clusters in order of decreasing density, CLUE first builds a list of followers for each point by registering each point as a follower to its nearest-higher. Then it expands clusters by passing cluster indices from the seeds to their followers iteratively. Since such expansion of clusters is fully independent from each others&#x27;, it not only avoids the costly density sorting in CFSFDP, but also enables a <italic>k</italic>-way parallelization. Unlike the noise identification in CFSFDP, CLUE rejects noise by identifying outliers and their iteratively descendant followers, as discussed in <xref ref-type="sec" rid="s4-1">Section 4.1</xref>.</p>
<sec id="s2-1">
<title>2.1. Spatial Index With Fixed-Grid</title>
<p>Query of neighborhood, which retrieves nearby points within a distance, is one of the most frequent operations in density-based clustering algorithms. CLUE uses a spatial index to access and query spatial data points efficiently. Given that the physical layout of sensor cells is a multi-layer tessellation, it is intuitive to index its data with a fixed-grid, which divides the space into fixed rectangular bins (<xref ref-type="bibr" rid="B9">Levinthal, 1966</xref>; <xref ref-type="bibr" rid="B3">Bentley and Friedman, 1979</xref>). Comparing with the data-driven structures such as KD-Tree (<xref ref-type="bibr" rid="B4">Bentley, 1975</xref>) and R-Tree (<xref ref-type="bibr" rid="B7">Guttman, 1984</xref>), space partition in fixed-grid is independent of any particular distribution of data points (<xref ref-type="bibr" rid="B16">Rigaux et al., 2001</xref>), thus can be explicitly predefined before loading data points. In addition, both construction and query with a fixed-grid are computationally simple and can be easily parallelized. Therefore, CLUE uses a fixed-grid as spatial index for efficient neighborhood queries.</p>
<p>For each layer of the calorimeter, a fixed-grid spatial index is constructed by registering the indices of 2D points into the square bins in the grid according to the 2D coordinates of the points. When querying <inline-formula id="inf6">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, the d-neighborhood of point <italic>i</italic>, CLUE only needs to loop over points in the bins touched by the square window <inline-formula id="inf7">
<mml:math>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. We denote those points as <inline-formula id="inf8">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, defined as:<disp-formula id="e1">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:mtext>tiles&#xa0;touched&#xa0;by&#xa0;the&#xa0;square&#xa0;window&#xa0;</mml:mtext>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(1)</label>
</disp-formula>Here, <inline-formula id="inf9">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is guaranteed to include all neighbors within a distance <italic>d</italic> from the point <italic>i</italic>. Namely,<disp-formula id="e2">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3c;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>&#x2286;</mml:mo>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(2)</label>
</disp-formula>Here, <inline-formula id="inf10">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the distance between points <italic>i</italic> and <italic>j</italic>. Without any spatial index, the query of <inline-formula id="inf11">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> requires a sequential scan over all points. In contrast, with the grid spatial index, CLUE only needs to loop over the points in <inline-formula id="inf12">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to acquire <inline-formula id="inf13">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>. Given that <italic>d</italic> is small and the maximum granularity of points is constant, the complexity of querying <inline-formula id="inf14">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> with a fixed-grid is <inline-formula id="inf15">
<mml:math>
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>2D points are indexed with a grid for fast neighborhood query in CLUE. Construction of this spatial index only involves registering the indices of points into the bins of the grid according to points&#x2019; 2D spatial positions. To query d-neighborhood <inline-formula id="inf16">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> defined in <xref ref-type="disp-formula" rid="e2">Eq. 2</xref>, taking the red (blue) point for example, we first locate its <inline-formula id="inf17">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> defined in <xref ref-type="disp-formula" rid="e1">Eq. 1</xref>, a set of all points in the bins touched by a square window <inline-formula id="inf18">
<mml:math>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>. The <inline-formula id="inf19">
<mml:math>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#xb1;</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> window is shown as the orange (green) square, while <inline-formula id="inf20">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is shown as orange (green) points. Then, we examine points in <inline-formula id="inf21">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mtext>&#x3a9;</mml:mtext>
<mml:mi>d</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> to identify those within a distance <italic>d</italic> from point <italic>i</italic>, shown as the ones contained in the red (blue) circle.</p>
</caption>
<graphic xlink:href="fdata-03-591315-g001.tif"/>
</fig>
</sec>
<sec id="s2-2">
<title>2.2. Clustering Procedure of CLUE</title>
<p>CLUE requires the following four parameters: <inline-formula id="inf22">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the cut-off distance in the calculation of local density; <inline-formula id="inf23">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the minimum density to promote a point as a seed or the maximum density to demote a point as an outlier; <inline-formula id="inf24">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf25">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are the minimum separation requirements for seeds and outliers, respectively. The choice of these four parameters can be based on physics: for example, <inline-formula id="inf26">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> can be chosen based on the shower size and the lateral granularity of detectors; <inline-formula id="inf27">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> can be chosen to exclude noise; <inline-formula id="inf28">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf29">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> can be chosen based on the shower sizes and separations. These four parameters allow more degrees of freedom to tune CLUE for the desired goals of physics.</p>
<p>
<xref ref-type="fig" rid="F2">Figure 2</xref> illustrates the main steps of CLUE algorithm. The local density <italic>&#x3c1;</italic> in CLUE is defined as:<disp-formula id="e3">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munder>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:mi>&#x3c7;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(3)</label>
</disp-formula>where <inline-formula id="inf30">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the weight of point <italic>j</italic>, <inline-formula id="inf31">
<mml:math>
<mml:mrow>
<mml:mi>&#x3c7;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is a convolution kernel, which can be optimized according to specific applications. Obvious possible kernel options include flat, Gaussian, and exponential functions.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Demonstration of CLUE algorithm. Points are distributed inside a <inline-formula id="inf32">
<mml:math>
<mml:mrow>
<mml:mn>6</mml:mn>
<mml:mo>&#xd7;</mml:mo>
<mml:mn>6</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> 2D area and CLUE parameters are set to <inline-formula id="inf33">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>0.5</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>3.9</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. Before the clustering procedure starts, a fixed-grid spatial index is constructed. In the first step, shown as <bold>(A)</bold>, CLUE calculates the local density <italic>&#x3c1;</italic> for each point, which is defined in <xref ref-type="disp-formula" rid="e3">Eq. 3</xref>. The color and size of points represent their local densities. In the second step, shown as <bold>(B)</bold>, CLUE calculates the nearest-higher <inline-formula id="inf34">
<mml:math>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> and the separation <italic>&#x3b4;</italic> for each point, which are defined in <xref ref-type="disp-formula" rid="e4">Eq. 4</xref>. The black arrows represent the relation from the nearest-higher of a point to the point itself. If the nearest-higher of a point is &#x2212;1, there is no arrow pointing to it. In the third step, shown as <bold>(C)</bold>, CLUE promotes a point as a seed if <inline-formula id="inf35">
<mml:math>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> are both large, or demote it to an outlier if <italic>&#x3c1;</italic> is small and <italic>&#x3b4;</italic> is large. Promoted seeds and demoted outliers are shown as stars and gray squares, respectively. In the fourth step, shown as <bold>(D)</bold>, CLUE propagates the cluster indices from seeds through their chains of followers defined in <xref ref-type="disp-formula" rid="e5">Eq. 5</xref>. Noise points, which are outliers and their descendant followers, are guaranteed not to receive any cluster ids from any seeds. The color of points represents the cluster ids. A gray square means its cluster id is undefined and the point should be considered as noise.</p>
</caption>
<graphic xlink:href="fdata-03-591315-g002.tif"/>
</fig>
<p>The nearest-higher and the distance to it <italic>&#x3b4;</italic> (separation) in CLUE are defined as:<disp-formula id="e4">
<mml:math>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext>arg</mml:mtext>
<mml:munder>
<mml:mrow>
<mml:mtext>min</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msubsup>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>&#x2032;</mml:mo>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext>if&#xa0;</mml:mtext>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>N</mml:mi>
<mml:msub>
<mml:mo>&#x2032;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#x2212;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext>otherwise</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2003;</mml:mtext>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext>if&#xa0;</mml:mtext>
<mml:mo>&#x7c;</mml:mo>
<mml:mi>N</mml:mi>
<mml:msub>
<mml:mo>&#x2032;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x7c;</mml:mo>
<mml:mo>&#x2260;</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>&#x221e;</mml:mi>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext>otherwise</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<label>(4)</label>
</disp-formula>where <inline-formula id="inf36">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mtext>max</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf37">
<mml:math>
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:msub>
<mml:mo>&#x2032;</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>&#x2208;</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> is a subset of <inline-formula id="inf38">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where points have higher local densities than <inline-formula id="inf39">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
<p>After <italic>&#x3c1;</italic> and <italic>&#x3b4;</italic> are calculated, points with density <inline-formula id="inf40">
<mml:math>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and large separation <inline-formula id="inf41">
<mml:math>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are promoted as cluster seeds, while points with density <inline-formula id="inf42">
<mml:math>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>&#x3c;</mml:mo>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and large separation <inline-formula id="inf43">
<mml:math>
<mml:mrow>
<mml:mi>&#x3b4;</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> are demoted to outliers. For each point, there is a list of followers defined as:<disp-formula id="e5">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>F</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>n</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
<label>(5)</label>
</disp-formula>
</p>
<p>The lists of followers are built by registering the points that are neither seeds nor outliers to the follower lists of their nearest-highers. The cluster indices, associating a follower with a particular seed, are passed down from seeds through their chains of followers iteratively. Outliers and their descendant followers are guaranteed not to receive any cluster indices from seeds, which grants a noise rejection as shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. The calculation of <inline-formula id="inf44">
<mml:math>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, and the decision of seeds and outliers both support <italic>n</italic>-way parallelization, while the expansion of clusters can be done with <italic>k</italic>-way parallelization. Pseudocode of CLUE is included in <xref ref-type="sec" rid="s9">Supplementary Material</xref>.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Examples of CLUE clustering on synthetic datasets. Each sample includes 1000 2D points with the same weight generated from certain distributions, including uniform noise points. The color of points represent their cluster ids. Black points represent outliers detached from any clusters. The links between pairs of points illustrate the relationship between nearest-higher and follower. The red stars highlight the cluster seeds.</p>
</caption>
<graphic xlink:href="fdata-03-591315-g003.tif"/>
</fig>
</sec>
</sec>
<sec id="s3">
<title>3. GPU Implementation</title>
<p>To parallelize CLUE on GPU, one GPU thread is assigned to each point, for a total of <italic>n</italic> threads, to construct spatial index, calculate <italic>&#x3c1;</italic> and <italic>&#x3b4;</italic>, promote (demote) seeds (outliers), and register points to the corresponding lists of followers of their nearest-highers. Next, one thread is assigned to each seed, for a total of <italic>k</italic> threads, to expand clusters iteratively along chains of followers. The block size of all kernels, which in practice does not have a remarkable impact on the speed performance, is set to 1,024. In the test in <xref ref-type="table" rid="T1">Table 1</xref>, changing the block size from 1,024 to 256 on GPU leads to only about 0.14&#xa0;ms decrease in the sum of kernel execution times. The details of parallelism for each kernel are listed in <xref ref-type="table" rid="T2">Table 2</xref>. Since the results of a CLUE step are required in the following steps, it is necessary to guarantee that all the threads are synchronized before moving to the next stage. Therefore, each CLUE step can be implemented as a separate kernel. To optimize the performance of accessing the GPU global memory with coalescing, the points on all layers are stored as a single structure-of-array (SoA), including information of their layer numbers and 2D coordinates and weights. Thus, points on all layers are input into kernels in one shot. The total memory required to run CLUE with up to 1&#xa0;M hits is about 284&#xa0;MB. This includes the memory needed to store the input data, the output results, and all the intermediate structures needed by the algorithm.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Decomposition of CLUE execution time in the case of <inline-formula id="inf53">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>4</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> points per layer with 100 layers. The time of subprocesses on GPU is measured with NVIDIA profiler, while that on CPU is measured with std::chrono timers in the C&#x2b;&#x2b; code. The uncertainties are the standard deviations of 200 trial runs of the same event (10,000 trial runs if GPU). The uncertainties of subprocesses on GPU are negligible given that the maximum and minimum kernel execution time measured by NVIDIA Profiler are very close. With respect to the single-threaded CPU, the speed-up factors of the multi-threaded CPU with TBB and the GPU are given in the bracket. &#x201c;mem mgmt &#x2b; overhead&#x201d; represents the time spent in handling and copying data, together with the overhead of issuing instructions to the GPU.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>CLUE step</th>
<th align="center">CPU [1T] (baseline)</th>
<th align="center">CPU TBB [10T]</th>
<th align="center">GPU</th>
</tr>
</thead>
<tbody>
<tr>
<td>Build fixed-grid spatial index</td>
<td align="char" char="plusmn">59.3 &#xb1; 1.6&#xa0;ms</td>
<td align="char" char="plusmn">117.7 &#xb1; 6.4&#xa0;ms (0.50x)</td>
<td align="center">0.28&#xa0;ms (208.6x)</td>
</tr>
<tr>
<td>Calculate local density</td>
<td valign="top" align="char" char="plusmn">218.4 &#xb1; 2.5&#xa0;ms</td>
<td valign="top" align="char" char="plusmn">33.7 &#xb1; 2.6&#xa0;ms (6.48x)</td>
<td align="center">0.51&#xa0;ms (430.6x)</td>
</tr>
<tr>
<td>Calculate nearest-higher and separation</td>
<td valign="top" align="char" char="plusmn">326.9 &#xb1; 2.9&#xa0;ms</td>
<td valign="top" align="char" char="plusmn">45.5 &#xb1; 2.5&#xa0;ms (7.19x)</td>
<td align="center">0.89&#xa0;ms (368.5x)</td>
</tr>
<tr>
<td>Decide seeds/outliers, register followers</td>
<td valign="top" align="char" char="plusmn">54.4 &#xb1; 2.5&#xa0;ms</td>
<td valign="top" align="char" char="plusmn">109.4 &#xb1; 7.7&#xa0;ms (0.50x)</td>
<td align="center">0.34&#xa0;ms (162.4x)</td>
</tr>
<tr>
<td>Expand clusters</td>
<td valign="top" align="char" char="plusmn">17.4 &#xb1; 1.5&#xa0;ms</td>
<td valign="top" align="char" char="plusmn">6.1 &#xb1; 1.3&#xa0;ms (2.86x)</td>
<td align="center">0.35&#xa0;ms (49.7x)</td>
</tr>
<tr>
<td>Mem mgmt &#x2b; overhead</td>
<td valign="top" align="char" char="plusmn">29.1 &#xb1; 1.7&#xa0;ms</td>
<td valign="top" align="char" char="plusmn">44.9 &#xb1; 15.7&#xa0;ms</td>
<td align="center">4.27&#xa0;ms</td>
</tr>
<tr>
<td>TOTAL (10,000 points per layer)</td>
<td valign="top" align="center">705.5 &#xb1; 7.9&#xa0;ms</td>
<td valign="top" align="char" char="plusmn">357.2 &#xb1; 19.7&#xa0;ms (2.0x)</td>
<td align="char" char="plusmn">6.63 &#xb1; 0.63&#xa0;ms (106.4x)</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Kernels and parallelism.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Kernels</th>
<th align="center">Parallelism</th>
<th align="center">Total threads</th>
<th align="center">Block size</th>
</tr>
</thead>
<tbody>
<tr>
<td>Build fixed-grid spatial index</td>
<td align="center">1 point/thread</td>
<td align="center">n</td>
<td align="center">1,024</td>
</tr>
<tr>
<td>Calculate local density</td>
<td align="center">1 point/thread</td>
<td align="center">n</td>
<td align="center">1,024</td>
</tr>
<tr>
<td>Calculate nearest-higher and separation</td>
<td align="center">1 point/thread</td>
<td align="center">n</td>
<td align="center">1,024</td>
</tr>
<tr>
<td>Decide seeds/outliers, register followers</td>
<td align="center">1 point/thread</td>
<td align="center">n</td>
<td align="center">1,024</td>
</tr>
<tr>
<td>Expand clusters</td>
<td align="center">1 seed/thread</td>
<td align="center">k</td>
<td align="center">1,024</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>When parallelizing CLUE on GPU, thread conflicts to access and modify the same memory address in global memory could happen in the following three cases:<list list-type="roman-lower">
<list-item>
<p>multiple points need to register to the same bin simultaneously;</p>
</list-item>
<list-item>
<p>multiple points need to register to the list of seeds simultaneously;</p>
</list-item>
<list-item>
<p>multiple points need to register as followers to the same point simultaneously.</p>
</list-item>
</list>
</p>
<p>Therefore, atomic operations are necessary to avoid the race conditions among threads in the global memory. During an atomic operation, a thread is granted with an exclusive access to read from and write to a memory location that is inaccessible to other concurrent threads until the atomic operation finishes.</p>
<p>This inevitably leads to some microscopic serialization among threads in race. The serialization in cases (i) and (iii) is negligible because bins are usually small as well as the number of followers of a given point. In contrast, serialization in case (ii) can be costly because the number of seeds <italic>k</italic> is large. This can cause delays in the execution of kernel responsible for seed promotion. Since the atomic pushing back to the list of seeds is relatively fast in GPU memory comparing to the data transportation between host and device, the total execution time of CLUE still does not suffer significantly from the serialization in case (ii). The speed performance is further discussed in <xref ref-type="sec" rid="s4">Section 4</xref>.</p>
</sec>
<sec id="s4">
<title>4. Performance Evaluation</title>
<sec id="s4-1">
<label>
</label>
<title>4.1. Clustering Results</title>
<p>We demonstrate the clustering results of CLUE with a set of synthetic datasets, shown in <xref ref-type="fig" rid="F3">Figure 3</xref>. Each example has 1,000 2D points and includes spatially uniform noise points. The datasets in <xref ref-type="fig" rid="F3">Figures 3A,C</xref> are from the scikit-learn package (<xref ref-type="bibr" rid="B14">Pedregosa et al., 2011</xref>). The dataset in <xref ref-type="fig" rid="F3">Figure 3B</xref> is taken from (<xref ref-type="bibr" rid="B17">Rodriguez and Laio, 2014</xref>). <xref ref-type="fig" rid="F3">Figures 3A,B</xref> include elliptical clusters and <xref ref-type="fig" rid="F3">Figure 3C</xref> contains two parabolic arcs. CLUE successfully detects density peaks in <xref ref-type="fig" rid="F3">Figures 3A&#x2013;C</xref>.</p>
<p>In the induction principle of density-based clustering, the confidence of assigning a low-density point to a cluster is established by maintaining the continuity of the cluster. Low-density points with large separation should be deprived of association to any clusters. CFSFDP uses a rather costly technique, which calculates a border region of each cluster and defines core-halo points in each cluster, to detach unreliable assignments from clusters (<xref ref-type="bibr" rid="B17">Rodriguez and Laio, 2014</xref>). In contrast, CLUE achieves this using cuts on <inline-formula id="inf54">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf55">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>, while expanding a cluster, as described in <xref ref-type="sec" rid="s2">Section 2</xref>. The example in <xref ref-type="fig" rid="F4">Figure 4</xref> shows how cutting at different separation values helps to demote outliers. <xref ref-type="fig" rid="F4">Figure 4A</xref> represents the decision plot on the <inline-formula id="inf56">
<mml:math>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>&#x3b4;</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula> plane. Points with density below <inline-formula id="inf57">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>80</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>, shown on the left side of the vertical blue line, could be demoted as outliers if their &#x3b4; is larger than a threshold. Once an outlier is demoted, all its descendant followers are disallowed from attaching to any clusters. While keeping <inline-formula id="inf58">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>80</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula> fixed, the effect of using three different values of <inline-formula id="inf59">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> (10, 20, 60), shown as orange dash lines in <xref ref-type="fig" rid="F4">Figure 4A</xref>, has been investigated. The corresponding results are shown in <xref ref-type="fig" rid="F4">Figures 4B&#x2013;D</xref>, respectively.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Noise rejection using different values of &#x03B4;o. Noise is either an outlier or a descendant follower of an outlier. In this dataset (<xref ref-type="bibr" rid="B17">Rodriguez &#x0026; Laio, 2014</xref>), 4,000 Points are distributed in 500 &#x00D7; 500 2D square area. <bold>(A)</bold> represents the decision plot on the &#x03C1; &#x2212; &#x03B4; plane, where fixed &#x03C1;<sub>c</sub> 80 and &#x03B4;<sub>c</sub> 40 values are shown as vertical and horizontal blue lines, respectively. Three different values of &#x03B4;<sub>o</sub> (10, 20, 60) are shown as orange dash lines. <bold>(B&#x2212;D)</bold> show the results with &#x03B4;o 10, 20, 60, respectively, illustrating how increasing &#x03B4;o loosens the continuity requirement and helps to demote outliers. The level of denoise should be chosen according to the user&#x0027;s needs.</p>
</caption>
<graphic xlink:href="fdata-03-591315-g004.tif"/>
</fig>
<p>The physics requirements of the clustering for the CMS HGCAL can be summarized as collecting a high fraction of the energy deposited by a single shower in a single cluster. The algorithm should form separate clusters of separate showers even when the showers overlap, as far as the granularity of the detector and the shower lateral size allows, but not split the energy deposited by a single shower into more that one cluster. This requirement is most easily definable for electromagnetic showers that have a regular and repeatable form, and slightly less obvious in hadronic showers, which in the fine granularity of the HGCAL frequently have the form of a branching tree of subshowers.</p>
<p>It is found that the CLUE algorithm can be tuned to well satisfy these physics requirements by adjusting its parameters to the cluster characteristics in the calorimeter. In particular, the convolution kernel described in <xref ref-type="sec" rid="s2-2">Section 2.2</xref>, is approximated to a highly simplified description of the lateral shower shape.</p>
</sec>
<sec id="s4-2">
<title>4.2. Execution Time and Scaling</title>
<p>We tested the computational performance of CLUE using a synthetic dataset that resembles high-occupancy events in high granularity calorimeters operated at HL-LHC. The dataset represents a calorimeter with 100 sensor layers. A fixed number of points on each layer are assigned a unit weight in such a way that the density represents circular clusters of energy whose magnitude decreases radially from the center of the cluster according to a Gaussian distribution with the standard deviation, <italic>&#x3c3;</italic>, set to 3&#xa0;cm. 5% of the points represent noise distributed uniformly over the layers. When clustering with CLUE, the bin size is set to 5&#xa0;cm comparable with the width of the clusters, and the algorithm parameters are set to <inline-formula id="inf60">
<mml:math>
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>3</mml:mn>
<mml:mtext>&#xa0;cm</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>o</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:msub>
<mml:mi>&#x3b4;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>5</mml:mn>
<mml:mtext>&#xa0;cm</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>&#x2009;</mml:mtext>
<mml:msub>
<mml:mi>&#x3c1;</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>8</mml:mn>
</mml:mrow>
</mml:math>
</inline-formula>. To test CLUE&#x2019;s linear scaling, the number of points on each layer is incremented from 1,000 to 10,000 in 10 equaling steps. A total of 100 layers are input to CLUE simultaneously, which simulates the proposed CMS HGCAL design (<xref ref-type="bibr" rid="B3">The Phase-2 Upgrade of the CMS Endcap Calorimeter, 2017</xref>). Therefore, the total number of points in the test ranges from <inline-formula id="inf61">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>5</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula id="inf62">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>. The linear scaling of execution time is validated in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>
<bold>(Upper)</bold> Execution time of CLUE on the single-threaded CPU, multi-threaded CPU with TBB, and GPU scale linearly with number of input points, ranging from <inline-formula id="inf63">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>5</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula id="inf64">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> in total. Execution time on single-threaded CPU is shown as blue circle dots and on 10 multi-threaded CPU with TBB is shown as blue square dots, while the time on GPU is shown as green circle dots, scaled up by a factor 50 to fit the same vertical scale. The stacked bars represent the decomposition of execution time. The gray narrower bars are latency for data traffic and memory management; wider bars represent time of essential CLUE steps <bold>(Lower)</bold> Comparing with the single-threaded CPU, the speed-up factors of the GPU range from 48 to 112, while the speed-up factors of the multi-threaded CPU with TBB range from 1.2 to 2.0, which is less than the number of concurrent threads on CPU because of atomic pushing to the data containers discussed in <xref ref-type="sec" rid="s3">Section 3</xref>. <xref ref-type="table" rid="T1">Table 1</xref> shows the details of the decomposition of the execution time in the case of <inline-formula id="inf65">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>4</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> points per layer.</p>
</caption>
<graphic xlink:href="fdata-03-591315-g005.tif"/>
</fig>
<p>The single-threaded version of the CLUE algorithm on CPU has been implemented in C&#x2b;&#x2b;, while the one on GPU has been implemented in C with CUDA (<xref ref-type="bibr" rid="B4">Nvidia Corporation, 2010</xref>). The multi-threaded version of CLUE on CPU uses the Threading Building Blocks (TBB) library (<xref ref-type="bibr" rid="B15">Reinders, 2007</xref>) and has been implemented using the Abstraction Library for Parallel Kernel Acceleration (Alpaka) (<xref ref-type="bibr" rid="B19">Zenker et al., 2016</xref>). The test of the execution time is performed on an Intel Xeon Silver 4114 CPU and NVIDIA Tesla V100 GPU connected by PCIe Gen-3 link. The time of each GPU kernel and CUDA API call is measured using the NVIDIA profiler. The total execution time is averaged over 200 identical events (10,000 identical events if GPU). Since CLUE is performed event-by-event and it is not necessary to repeat memory allocation and release for each event when running on GPU, we perform a one-time allocation of enough GPU memory before processing events and a one-time GPU memory deallocation after finishing all events. Therefore, the one-time <italic>cudaMalloc</italic> and <italic>cudaFree</italic> are not included in the average execution time. Such exclusion is legit because the number of events is extremely massive in high-energy physics experiments and the execution time of the one-time <italic>cudaMalloc</italic> and <italic>cudaFree</italic> reused by each individual event is negligible.</p>
<p>In <xref ref-type="fig" rid="F5">Figure 5</xref> (upper), the scaling of CLUE is linear, consistent with the expectation. The execution time on the single-threaded CPU, multi-threaded CPU with TBB, and GPU increases linearly with the total number of points. The stacked bars represent the decomposition of execution time. In the decomposition, unique to the GPU implementation is the latency of data transfer between host and device, which is accounted for in the grey narrower bar, while common to all the three implementations are the five CLUE steps. Comparing with the single-threaded CPU, when building spatial index and deciding seeds, shown as red and pink bars, the multi-threaded CPU using TBB does not give a notable speed-up due to the implementation of atomic operations in Alpaka (<xref ref-type="bibr" rid="B19">Zenker et al., 2016</xref>) as discussed in <xref ref-type="sec" rid="s3">Section 3</xref>, while the GPU has a prominent outperformance thanks to its larger parallelization scale. For the GPU case, the kernel of seed-promotion in which serialization exists due to atomic appending of points in the list of seeds, does not affect the total execution time significantly if compared with other subprocesses. In the two most computing-intense steps, calculating density and separation, there are no thread conflicts or inevitable atomic operations. Therefore, both the multi-threaded CPU using TBB and the GPU provide a significant speed-up. The details of the decomposition of execution time in the case of <inline-formula id="inf66">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>4</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> points per layer are listed in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<p>
<xref ref-type="fig" rid="F5">Figure 5</xref> (lower) shows the speed-up factors. Compared to the single-threaded CPU, the CUDA implementation on GPU is 48&#x2013;112 times faster, while the multi-threaded version using TBB via Alpaka with 10 threads on CPU is about 1.2&#x2013;2.0 times faster. The speed-up factors are constrained to be smaller than the number of concurrent threads because of the atomic operations that introduce serialization. In <xref ref-type="table" rid="T1">Table 1</xref>, the speed-up factors of multi-threaded CPU using TBB reduce to less than one in the subprocess steps of building spatial index and promoting seeds and registering followers, where atomic operations happen and bottleneck the overall speed-up factor.</p>
</sec>
</sec>
<sec sec-type="conclusion" id="s5">
<title>5. Conclusion</title>
<p>The clustering algorithm is an important part in the shower reconstruction of high granularity calorimeters to identify hot regions of energy deposits. It is required to be computationally linear with data scale <italic>n</italic>, independent from prior knowledge of the number of clusters <italic>k</italic> and conveniently parallelizable when <inline-formula id="inf67">
<mml:math>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x226b;</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>&#x2261;</mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> in 2D. However, most of the well-known algorithms do not simultaneously support linear scalability and easy parallelization. CLUE is proposed to efficiently perform clustering tasks in low-dimension space with <inline-formula id="inf68">
<mml:math>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3e;</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>&#x226b;</mml:mo>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>, including (and beyond) the applications in high granularity calorimeters. The clustering time scales linearly with the number of input hits in the range of multiplicity that is relevant for, e.g., the high granularity calorimeter of the CMS experiment at CERN. We evaluated the performance of CLUE on synthetic data and demonstrated its capability on nonspherical cluster shape with adjustable noise rejection. Furthermore, the studies suggest that CLUE on GPU outperforms single-thread CPU by more than an order of magnitude within the data scale ranging from <inline-formula id="inf69">
<mml:math>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>&#x3d;</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>5</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula> to <inline-formula id="inf70">
<mml:math>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn>
</mml:mrow>
<mml:mn>6</mml:mn>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>.</p>
</sec>
</body>
<back>
<sec id="s6">
<title>Data Availability Statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: <ext-link ext-link-type="uri" xlink:href="https://gitlab.cern.ch/kalos/clue/tree/V_01_20">https://gitlab.cern.ch/kalos/clue/tree/V_01_20</ext-link>.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>The original algorithm was developed by MR and FP, who designed it to be parallel together with the tiling data structures and implemented it in Alpaka. ZC and AD implemented it in CUDA and wrote most of the text of the article. CS gave physics contributions to the clustering kernel definition and cuts to be used.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ack>
<p>The authors would like to thank the CMS and HGCAL colleagues for the many suggestions received in the development of this work. The authors would like to thank Vincenzo Innocente for the suggestions and guidance while developing the clustering algorithm. The authors would also like to thank Benjamin Kilian and George Adamov for their helpful discussion during the development of CLUE. This material is based upon work partially supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences Energy Frontier Research Centers program under Award Number DE-SP0035530, and the European project with CUP H92H18000110006, within the &#x201c;Innovative research fellowships with industrial characterization&#x201d; in the National Operational Program FSE-ERDF Research and Innovation 2014&#x2013;2020, Axis I, Action I.1.</p>
</ack>
<sec id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fdata.2020.591315/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fdata.2020.591315/full&#x23;supplementary-material</ext-link>.</p>
<supplementary-material xlink:href="datasheet1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ankerst</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Breunig</surname>
<given-names>M. M.</given-names>
</name>
<name>
<surname>Kriegel</surname>
<given-names>H.-P.</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>1999</year>). &#x201c;<article-title>Optics: ordering points to identify the clustering structure</article-title>,&#x201d; in <conf-name>Proceedings of the 1999 ACM SIGMOD international conference on management of data</conf-name>, <conf-loc>PA, United States</conf-loc>, <conf-date>June 1&#x2013;3, 1999</conf-date> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>ACM</publisher-name>).</citation>
</ref>
<ref id="B2">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Apollinari</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Bejar Alonso</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Bruning</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Fessia</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lamont</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rossi</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <source>High-luminosity large hadron collider (HL-LHC): technical design report V. 0.1</source>. <publisher-loc>Geneva, Switzerland</publisher-loc>: <publisher-name>CERN</publisher-name>, <fpage>599</fpage>.</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bentley</surname>
<given-names>J. L.</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>J. H.</given-names>
</name>
</person-group> (<year>1979</year>). <article-title>Data structures for range searching</article-title>. <source>ACM Comput. Surv.</source> <volume>11</volume>, <fpage>397</fpage>. <pub-id pub-id-type="doi">10.1145/356789.356797</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bentley</surname>
<given-names>J. L.</given-names>
</name>
</person-group> (<year>1975</year>). <article-title>Multidimensional binary search trees used for associative searching</article-title>. <source>Commun. ACM.</source> <volume>18</volume>, <fpage>509</fpage>. <pub-id pub-id-type="doi">10.1145/361002.361007</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="web">
<collab>CALICE Collaboration</collab> (<year>2012</year>). <article-title>Calorimetry for lepton collider experiments-calice results and activities</article-title>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://arXiv:1212.5127">arXiv:1212.5127</ext-link> Accessed December 20, 2012</comment>).</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Lange</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Meschi</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Scott</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>and Seez</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2017</year>). &#x201c;<article-title>Offline reconstruction algorithms for the CMS high granularity calorimeter for HL-LHC</article-title>,&#x201d; in <conf-name>2017 IEEE nuclear science symposium and medical imaging conference and 24th international symposium on room-temperature semiconductor X-ray &#x26; gamma-ray detectors</conf-name>, <conf-loc>Atlanta, GA, USA</conf-loc>, <conf-date>October 21&#x2013;28, 2017</conf-date>, p. <fpage>8532605</fpage>.</citation>
</ref>
<ref id="B7">
<citation citation-type="book">
<collab>CMS Collaboration</collab> (<year>2017</year>). <comment>Tech. Rep. CERN-LHCC-2017-023</comment>. <article-title>The phase-2 upgrade of the CMS endcap calorimeter</article-title>. <publisher-loc>Geneva, Switzerland</publisher-loc>: <publisher-name>CERN</publisher-name>.</citation>
</ref>
<ref id="B8">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Ester</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kriegel</surname>
<given-names>H.-P.</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X.</given-names>
</name>
</person-group> (<year>1996</year>). &#x201c;<article-title>A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise</article-title>,&#x201d; in <conf-name>Proceedings of the second international conference on knowledge discovery and data mining</conf-name>, <publisher-loc>Portland, OR</publisher-loc>, <publisher-name>AAAI Press</publisher-name>, pp. <fpage>226</fpage>&#x2013;<lpage>231</lpage>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="http://dl.acm.org/citation.cfm?id=3001460.3001507">http://dl.acm.org/citation.cfm?id&#x3d;3001460.3001507</ext-link>
</comment>.</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guttman</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>1984</year>). <article-title>R-trees</article-title>. <source>SIGMOD Rec.</source> <volume>14</volume>, <fpage>47</fpage>. <pub-id pub-id-type="doi">10.1145/971697.602266</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pei</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Kamber</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2012</year>). <source>Data mining: concepts and techniques</source>, <publisher-loc>Amsterdam, Netherlands</publisher-loc>: <publisher-name>Elsevier</publisher-name>, <fpage>744</fpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Levinthal</surname>
<given-names>C</given-names>
</name>
</person-group> (<year>1966</year>). <article-title>Molecular model-building by computer</article-title>. <source>Sci. Am.</source> <volume>214</volume>, <fpage>42</fpage>&#x2013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1038/scientificamerican0666-42</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lloyd</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>1982</year>). <article-title>Least squares quantization in pcm</article-title> <source>IEEE Trans. Inf. Theor.</source> <volume>28</volume>, <fpage>129</fpage>. <pub-id pub-id-type="doi">10.1109/tit.1982.1056489</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Maimon</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Rokach</surname>
<given-names>L.</given-names>
</name>
</person-group> (<year>2005</year>). <source>Data mining and knowledge discovery handbook</source>. <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B14">
<citation citation-type="book">
<collab>NVIDIA Corporation</collab> (<year>2010</year>). <source>Nvidia cuda c programming guide</source>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pedregosa</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Varoquaux</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gramfort</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>ichel</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Thirion</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Grisel</surname>
<given-names>O.</given-names>
</name>
<etal/>
</person-group> <article-title>Scikit-learn: machine learning in Python</article-title>, <source>J. Mach. Learn. Res.</source> <volume>12</volume> (<year>2011</year>). <fpage>2825</fpage>.</citation>
</ref>
<ref id="B16">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Reinders</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2007</year>). <source>Intel threading building blocks: outfitting C&#x2b;&#x2b; for multi-core processor parallelism</source>. <publisher-loc>Sebastopol, CA</publisher-loc>: <publisher-name>O&#x2019;Reilly Media, Inc.</publisher-name>.</citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Rigaux</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Scholl</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Voisard</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2001</year>). <source>Spatial databases: with application to GIS</source>. <publisher-loc>Amsterdam, Netherlands</publisher-loc>: <publisher-name>Elsevier</publisher-name>.</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodriguez</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Laio</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Clustering by fast search and find of density peaks</article-title>. <source>Science</source> <volume>344</volume>, <fpage>1492</fpage>. <pub-id pub-id-type="doi">10.1126/science.1242072</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zenker</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Worpitz</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Widera</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Huebl</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Juckeland</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Kn&#xfc;pfer</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2016</year>). &#x201c;<article-title>Alpaka&#x2013;an abstraction library for parallel kernel acceleration</article-title>,&#x201d; in <conf-name>IEEE international parallel and distributed processing symposium workshops (IPDPSW)</conf-name>, <conf-loc>Chicago, IL, USA</conf-loc>, <conf-date>May 23&#x2013;27, 2016</conf-date>, pp. <fpage>631</fpage>&#x2013;<lpage>640</lpage>.</citation>
</ref>
</ref-list>
</back>
</article>
