Abstract
Clustering algorithms are at the basis of several technological applications, and are fueling the development of rapidly evolving fields such as machine learning. In the recent past, however, it has become apparent that they face challenges stemming from datasets that span more spatial dimensions. In fact, the best-performing clustering algorithms scale linearly in the number of points, but quadratically with respect to the local density of points. In this work, we introduce qCLUE, a quantum clustering algorithm that scales linearly in both the number of points and their density. qCLUE is inspired by CLUE, an algorithm developed to address the challenging time and memory budgets of Event Reconstruction (ER) in future High-Energy Physics experiments. As such, qCLUE marries decades of development with the quadratic speedup provided by quantum computers. We numerically test qCLUE in several scenarios, demonstrating its effectiveness and proving it to be a promising route to handle complex data analysis tasks – especially in high-dimensional datasets with high densities of points.
1 Introduction
Clustering is a data analysis technique that is crucial in several fields, owing to its ability to uncover hidden patterns and structures within large datasets (). It is essential for simplifying complex data, improving data organization, and enhancing decision-making processes (; ; ; Wu et al., 2021). For instance, clustering has been applied in marketing (; ), where it helps segment customers for targeted advertising (Wu et al., 2009), and in biology, for classifying genes and identifying protein interactions (; ; Wang et al., 2010; ). In the realm of computer science and artificial intelligence, it is invaluable for speech recognition (; ), image segmentation (), as well as for recommendation systems (Shepitsen et al., 2008; Schickel-Zuber and Faltings, 2007) used for personalizing user content. Finally, clustering techniques are pivotal for Event Reconstruction (ER), where data points that originated from the same “event” are to be grouped together. In High-Energy Physics, for instance, clustering algorithms are used to reconstruct the trajectories of subatomic particles in collider experiments. High volumes of data are expected at the endcap High Granularity CALorimeter (HGCAL) () which is currently being built for the CMS detector at the High Luminosity Large Hadron Collider (HL-LHC). This must be tackled by new generations of clustering algorithms such as CLUE (Rovere et al., 2020). The discovery of the Higgs boson (), awarded the Nobel prize in 2012, was made possible by such algorithms.
ER enables the interpretation of data obtained from particle collision events, including those occurring at the Large Hadron Collider (LHC) at CERN. Several clustering algorithms like DBScan, K-Means, and Hierarchical Clustering among others (; ; Rodenko et al., 2019) can be employed for ER. Our work is based on CERN’s CLUstering of Energy (CLUE) algorithm (Rovere et al., 2020; ), which is adopted by the CMS collaboration (; ; Tumasyan et al., 2023). It is designed for the future HGCAL detector due to the limitations of the currently employed algorithms. Despite these limitations, such algorithms are already at the basis of several discoveries, such as the doubly charged tetraquark (), the study of rare B meson decays to two muons (Tumasyan et al., 2023) and the observation of four-top quark production in proton-proton collisions ().
The efficiency of clustering algorithms, as illustrated by the CLUE algorithm (Rovere et al., 2020), is crucial for handling large datasets. Initially designed for two-dimensional datasets, CLUE reduces the search complexity from to through the use of local density and a tiling procedure, where represents the (average) number of points (per tile).
In the context of CLUE, where the datasets in question are limited to two dimensions, is small, making this approach to ER particularly effective. However, as the dimensionality of the dataset is incremented, the value of generally increases exponentially. This is highlighted by Figure 1A, where for a -dimensional lattice with points per edge, follows the relation . This is a serious challenge to CLUE and classical clustering algorithms in general.
FIGURE 1
A first step towards extending CLUE to more dimensions is 3D-CLUE (Rovere et al., 2020;
Quantum computers provide a route to mitigate the complexity blow-up arising from higher-dimensional datasets. Wei et al. (2020) addresses the task of jet clustering in High-Energy Physics, while
In this work we develop qCLUE, a CLUE-inspired quantum algorithm. Similarly to other quantum algorithms (
Overall, we find that qCLUE performs well in a wide range of scenarios. With ER-inspired datasets as a specific example, we demonstrate that clusters are correctly reconstructed in typical experimental settings. Similar to other quantum approaches to clustering that rely on Grover Search (
The specific advantages of qCLUE are its CLUE-inspired approach to cluster reconstruction (which demonstrated to be extremely successful (
This paper is structured as follows. In Section 2, we describe our algorithm qCLUE. Specifically, we provide a general overview of its subroutines – namely the Compute Local Density, Find Nearest Higher, and the Find Seeds, Outliers and Assign Clusters steps. We describe the results of our simulated version of qCLUE on a classical computer in Section 3. In more detail, we explain the scoring metrics we use to quantify our results, and describe qCLUE performance when the dataset is subject to noise and different clusters overlap. Conclusions and outlook are finally presented in Section 4.
2 qCLUE
qCLUE is a quantum adaptation of CERN’s CLUE and 3D-CLUE algorithms (Rovere et al., 2020;
In Section 2.1, we offer an overview of the algorithm and its different subroutines. Section 2.2 is dedicated to the first subroutine of qCLUE, namely, calculating the Local Density. We then explain how to determine the Nearest Highers , Seeds, and Outliers in Section 2.3. Finally, Section 2.4 delves into the conclusive Cluster Assignment subroutine, where the points in the dataset are effectively heirarchically clustered.
2.1 Overview and setting
As for CLUE and 3D-CLUE (Rovere et al., 2020;
In dimensions, the spatial coordinates for point are , that are promptly generalized for larger values of. Both CLUE and qCLUE first perform tiling over the dataset to reduce the search and therefore enhance the efficiency of the algorithm. Tiling is the process of partitioning the dataset into a grid of rectangular tiles , where is the tile index (see Figure 2). Therefore, our input dataset comprises of point and tile indices and , respectively, the coordinates , and a parameter associated to each point. Following CLUE’s notation, we call the weight, yet this should be considered as a label that can be employed to improve the clustering quality for any given dataset. The tiling procedure of qCLUE and CLUE enables searching only over Search Spaces marked by the tiles in green in Figure 2A as opposed to the full dataset. In case of CLUE, this allowed for an improvement in scaling from to . The scaling of qCLUE is investigated below.
FIGURE 2

Pictorial representation of the main subroutines of qCLUE. In (A), the Local Density computation subroutine is represented. The consideration circle of radius (light blue) centered at the base point (black) contains all points (green) that satisfy . This consideration circle intersects 2 tiles (indexed by tile index ), highlighted in blue, that form the search space . As per Equation 2, the Local Density computation step determines the set of green points from all points in the search space (green and grey) and then computes the local density. In (B), we pictorially present the Find Nearest Higher subroutine. The consideration circle (green) around base point (black) has radius . This consideration circle, containing the green points as well as the Nearest Higher (pink), intersects the 4 tiles highlighted in green, which form the search space . In (C), we describe the Find Seeds, Outliers and Assign Clusters subroutines. The seeds (red) and outliers (blue) are determined via Grover search on the dataset. In this specific example there are two clusters in the dataset whose non-seed points are in orange and purple, respectively. Followers (see main text) in these clusters are connected by dashed arrows. The Cluster Assignment subroutine is shown to be working on the orange cluster where the cluster currently consists of the seed (red, dashed border) and the first of its followers (orange, dotted border). Followers are being found within the Dynamic Search Space (DSS, light red box with solid red border). The DSS is formed as the set of tiles covered partially or fully by the minimum bounding box of the square windows that contains all the search spaces of the points within .
In this work, we employ a qRAM to store and access data, which is an essential building block for quantum computers. Following
The qCLUE algorithm consists of the following steps:
2.1.1 Local density
The first step is to calculate the local density of all points [e.g., black point in Figure 2A] that is defined byand it is indicative of the weight in a neighborhood of point . As can be seen from Equation 2; Figure 2A, is a weighted sum over the weights of all points whose distance from the base point is within a user-specified critical radius that characterizes the consideration circle for the Local Density computation subroutine (light blue circle in the figure). As such, is the weight of the point which is away from point . The choice of weight factor for in the definition of in Equation 2 is empirically found to yield better performances for CLUE (Rovere et al., 2020).
2.1.2 Find nearest higher
After calculating the local densities, we determine the nearest highers. The Nearest Higher of a point is the point nearest to with a higher local density . As better explained in Section 2.4, the Nearest Higher are used to heirarchically cluster points together in the Cluster Assignment process at the end of qCLUE. In Figure 2B, the Nearest Higher of the base point (black point) is the pink point.
2.1.3 Find seeds, outliers and assign clusters
As schematically represented in Figure 2C, seeds (red points) are the points whose distance from their Nearest Higher and whose local density are lower bounded by user defined thresholds. Outliers (blue points) are the points whose distance from Nearest Higher is similarly lower bounded but whose Local Density has an upper threshold. As such a point is
Here, is the Outlier Delta Factor that determines the upper bound on the allowed local density for outliers. Furthermore, is the critical density threshold – the lowest local density a point can have to be classified as a seed. Both and are user-specified and can be varied to enhance the quality of the output. Seeds are generally located in areas of high weight density, and will be employed as starting points to build clusters. Outliers are points that are likely to be noise in the dataset and are therefore discarded.
Once seeds and outliers are determined, the clusters are constructed. From the seeds, we iteratively combine “followers.” If point is the Nearest Higher of point , then point is termed as ’s follower. The follower of a point is most likely generated by the same process as the point itself (in the context of ER, by the same particle), and as such shall be included in the same cluster. In Figure 2C, the orange and purple points form two different clusters, and the followers of the points in the purple one are indicated by arrows.
2.2 Local density computation
In this section, we describe the subroutine (schematically represented in Figure 3) that computes the Local Density of the point , as defined in Equation 2. To perform the computation, all points whose distance from point is smaller than the threshold need to be determined from the search space . This search space is the smallest set of tiles required to cover the consideration circle. In Figure 2A, is highlighted in light blue.
FIGURE 3

Algorithm flow for Local Density computation and for Assigning Clusters. The quantum state is initialized in the green “Initialize” box. For Local Density Computation (Cluster Assignment), it comprises all points in the DSS (in the DSS). The “Grover” (light blue) block performs and in succession times, and returns all points satisfying the required condition. The inset considers the case of Local Density computation where the condition is . For the cluster assignment step, we check if points in the DSS are followers of the points in the cluster (see Section 2.4). The output of the Grover subroutine is then measured to yield an index that is checked for validity in the grey “Valid?” diamond. If the point satisfies the chosen condition, the branch is executed. Within the “Update” (light blue) step this point is then removed from either or the DSS and stored to be returned in the “Return” orange box. Once all points are found, the “Valid?” condition triggers the branch to terminate the algorithm. Depending on the chosen subroutine, the returned indices are employed to compute the Local Density from Equation 2, or to construct .
We shall refer to as the local dataset that, as explained above, can be efficiently prepared with the qRAM (
At this stage, we must find the points [green dots in Figure 2A] that are within a radius of from the base point [black point in Figure 2A]. As shown in Figure 3, we perform Grover Search
Here, the first register of the Grover output contains all points characterized by indices such that . As shown in the inset of the figure, the Grover Search consists of repetitions (where is the number of points in ) of the and operators. is the diffusion operator and is the unitary associated with the oracle of Grover Search (
When the algorithm is run, measurement either yields a point that satisfies this distance condition, or (if there are no valid indices left) an index that does not satisfy this condition. This is verified by the grey “Valid?” diamond in Figure 3. The branched logic following this block ensures that the algorithm loops until all the required points are returned by the algorithm in the “Return” block.
Once we have obtained all indices of points satisfying the distance condition , we perform the summation in Equation 2. This is computed and stored in the original dataset for each point. The database is now updated using qRAM with local density values for all points where the point in the database has the corresponding computed local density .
The scaling of the subroutine that determines the local density of a single point is given by the number of points in the blue consideration circle in Figure 2A such that . If we say this number is , runs are required. This is therefore a algorithm as opposed to the classical iterative algorithm for the Unstructured Search Problem.
As a final remark, we highlight that it is in principle possible to design a unitary that computes the Local Density directly and stores the output in a quantum register. This unitary would remove the requirement of finding individually the indices such that , thus removing the overhead of in . However, designing this circuit is non-trivial and its depth may be large. This is therefore left for future investigations.
2.3 Find nearest higher
Here, we describe qCLUE’s subroutine for finding the Nearest Highers introduced in Section 2.1. As a reminder, is the nearest point to the base point whose local density is more than the local density of the base point, see Equation 3a.
Similar to the initialization carried out for the Local Density Computation step, we use qRAM to initialize the quantum state
Here, the indices are within the tiles , as in Equation 4, and is the considered search space, schematically represented by the light green box in Figure 2B. This search space is determined from as opposed to , which is the user-defined threshold that is set to be . Note that the weight , employed for determining the densities in Section 2.2, is hereon not required.
To find the Nearest Higher, we use a Grover-Enhanced Binary Search (GEBS) where each search step is enhanced by Grover’s algorithm (Equation 5). The output of every Grover run,is a superposition over all points whose distance from the base point lies between the thresholds and . Furthermore, their local density should be higher than that of the base . At each step, and are updated based on whether a point satisfying the conditions in the grey diamond of Figure 4A is found. Ancilla registers are used here as detailed in Supplementary Appendix SA.
FIGURE 4

(A) Diagrammatic representation of the algorithm. GEBS determines successive candidates for the “Nearest Higher” until the proper one is found. The quantum state in Equation 6 is prepared in the “Initialize” step (green box). Grover Search (larger diamond) is then performed to find the points satisfying . If this condition is satisfied (“” branch), is updated and Grover run again. If not (“” branch), control flows to the “?” diamond. The branch is entered if the “?” condition is being checked for the first time or if branch was just run. Branch is entered if branch was just run. (B) The algorithm’s working is shown step-by-step (numbers at the bottom) for the search space in the inset in the top right corner. The points are mapped to a line where the height represents the distance from the base point (black dot at the bottom). The grey (orange) points are outside (inside) the green consideration circle with radius [see also Figure 2B]. At each step of GEBS, the thresholds and are updated according to the logic in panel (A). The dot with the red border indicates the current candidate for ; when filled (empty) it is (not) found by Grover Search at that step. The yellow point is the Nearest Higher that is found at the end of GEBS.
To better understand the algorithm, we provide a step-by-step walkthrough of the example in Figure 4B. The search space is schematically represented by the inset in the right hand side, where each dot represents a point with a size that is proportional to its local density. The consideration circle (light green, dotted border) highlights all points within a radius . In this work, we set the outlier delta factor to 2. The consideration circle in the inset corresponds to and , shown in step (I). In the main panel, vertical lines refers to the steps (I–VI) of GEBS that are reported below, and schematically represent the distances of all points (coloured dots) from the base point (black one at the bottom).
GEBS starts with the higher threshold set as and the lower threshold as shown in vertical line (I) of Figure 4B. Following the probabilistic nature of quantum mechanics, assume that the point with a red border indexed is found after measuring the output of the Grover Search in Equation 7. This triggers the updates in the branch in the diagram of Figure 4A, such that we assign and update . The point indexed is then removed from the search space, as can be seen in (II). Now, since no point satisfies the conditions in the diamond of the flow diagram [see (II)] and was just set to , the branch is carried out. This updates the thresholds and for the next iteration of the algorithm, see (III).
Now, assume that the new point with a red border is found [step (III)]. Updates in the branch of Figure 4A are carried out again with a new index and the search region is reduced to contain a single point. In the next step (IV), that point (yellow) is found and, for the third and last time, the nearest higher and the thresholds are triggered according to the branch. Next, since no point is found in (V), qCLUE executes the updates in the branch of the diagram. In the last iteration (VI), no points satisfy the desired conditions. The parameter was just set to , i.e., the subroutine just ran which means that the branch is now executed and is returned.
The runtime complexity of the GEBS procedure, with points in the search space , is as opposed to classically. The term is due to the binary search procedure and depends on the size of the quantum register used to encode the distance. Specifically, for a chosen precision used for the positions of the points in the datasets, .
2.4 Find seeds, outliers, and assign clusters
Once the Nearest Highers are determined for all points in the dataset, Seeds and Outliers are found via another Grover Search over all points in the dataset. As per the definition in Equation 3a, Seeds [red points in Figure 2C] are the points with highest local density within a neighbourhood. Outliers [blue points in Figure 2C] are mathematically described by Equation 3b, are most likely noise, and therefore do not belong to any cluster.
Similar to the previous subroutines, the quantum registers for these procedures are initialized via qRAM. Seeds and outliers are then determined based on the corresponding conditions via Grover Search. Two quantum registers, the first marking whether a point is an outlier and the second to store the seed number – which is also the cluster number – are added to the quantum database.
The final subroutine of qCLUE is the assignment of points to clusters. At this stage, outliers have been removed from the input dataset, as they have been already identified. The algorithm flow is the same as that of the Local Density step in Figure 3. For a chosen seed , we define to be the set containing the indices of all points determined to be in the associated cluster at the end of this subroutine. To assign points to , we follow a procedure similar to that of the Local Density step in Figure 3. In the “Initialize” step, is initialized to and the quantum registers are initialized via qRAM to the state.
In the “Grover” block, we search over a superposition of points in the dataset which we call the Dynamic Search Space (DSS) created by qRAM as shown in Equations 8a, 8b. The DSS differs from the search space in the Local Density step as it is dynamic. This is because it depends on the points in , which are updated at each iteration. In Figure 2C, for instance, the red seed and the orange point both with black borders are the elements of the current . To find the DSS, a square window of edge is first opened for every point in (in the figure, the squares with the same border style as the corresponding points). A rectangular region (red box) is then obtained by finding the axis-aligned minimum bounding box for these windows. The set of tiles covered partially or fully by this minimum bounding box is the DSS. For example, in Figure 2C, it comprises the 9 tiles highlighted in light red.
With a similar procedure as for the Local Density subroutine, the “Grover” block now systematically identifies all followers of all points within set . Here, in the “Update” step in Figure 3, as the point found by the “Grover” block has passed the “Valid” condition, it is appended to . Once no more points are found, the “Return” block yields , following the same flow as the Local Density computation subroutine.
The complexity of the Cluster Assignment step is similar to the one of the Local Density Computation subroutine. The quantum advantage stems from the quadratic speedup provided by the Grover algorithm, which allows determining the follower faster if compared to CLUE. If there are points in a cluster and points in the corresponding DSS, the classical complexity of the Cluster Assignment step is , while the quantum algorithm has a runtime of .
3 Results
In this section, we test qCLUE in multiple scenarios, each designed to investigate its performance for different settings. In Section 3.1, we introduce the scoring metrics used for our analysis. In Section 3.2, we describe the performance of the algorithm applied on a single cluster in a uniform noisy environment. In Section 3.3, we study the performance on overlapping clusters. Finally, in Section 3.4, we study the performance of qCLUE on non-centroidal clusters with and without a weight profile.
3.1 Scoring metrics: homogeneity and completeness scores
It is more important to correctly classify high-weight points such as seeds as compared to low-weight points such as outliers. Since we would like our metric to be cognizant to this, we use modified, weight-aware versions (
As discussed in (
qCLUE applied to an input dataset yields homogeneity if all of the predicted clusters only contain data points that are members of a single true cluster. On the other hand, is obtained if all the data points that are members of a given true cluster are elements of the same reconstructed cluster. Therefore, these metrics are better suited to different scenarios. The impacts of noise and cluster overlap investigated in Sections 3.2, 3.3 are better captured by . Indeed, if qCLUE incorrectly classifies noise points into predicted clusters, is unaffected. On the other hand, shall be employed when studying non-centroidal clusters in Section 3.4, since if one true cluster is divided by qCLUE into many sub-clusters.
3.2 Noise
Here, we study the performance of qCLUE for a single cluster in a noisy environment. We vary the number of noise points sampled from a uniform distribution over a square region of fixed size. A cluster of points with coordinates is sampled from the multivariate Gaussian distribution.where is the mean of the distribution (set to in our case) and the covariance matrix. Here, we choose , with being the identity matrix and a positive real number.
Examples of the generated clusters (in orange) and noise (in blue) are given in Figures 5A, B for at and at , respectively. The weight assigned to each point in the cluster is given by [see Equation 10] with . The weight of each noise point is randomly sampled between zero and one. This choice resembles the typical scenarios in ER tasks for which CLUE (Rovere et al., 2020) was designed.
FIGURE 5

Numerical results from qCLUE simulated on a classical machine. (A–C) qCLUE’s performance in noisy environments. The dataset generated for these experiments and visualized in panels (A, B) consists of a cluster (noise) with points sampled from the Gaussian distribution in Equation 10 (uniform distribution) over a square of size 500. The weight of noise points is sampled uniformly between zero and one, while each cluster point is assigned a weight that is the probability of being sampled multiplied by a factor . (A, B) Computed clusters at , and , respectively. In (C), is plotted against for the in the legend. (D–F) Performance for overlapping clusters. In (D), vs. is shown for and different ratios . Here, is the distance between the centers of two clusters with and points, and we assign to each point a weight that is equal to its sampling probability in Equation 10. (E, F) Computed clusters at , and , respectively. The shadowed regions in (C, D) represent the standard deviations of over 30 iterations. (G–J) Performance over non-centroidal clusters of 500 points each generated from (
In Figure 5C, we show the variation of homogeneity score with respect to the ratio . We employ the values of reported in the legend, associated to different colors in the plot. As can be seen, the clustering performance is inversely proportional to both and . When these parameters are small, the typical distance between cluster points is much smaller than that between noise points, and approaches unity. With a higher chance of labeling noise points as within the cluster, however, is lowered. As such, the degradation of is proportional to the probability of a noise point being in the cluster region, which increases with both and .
3.3 Overlap
Here, we consider the case of two circular clusters with and points respectively, each sampled from the multivariate Gaussian distribution in Equation 10 and with . The weight profile is determined by for coordinates . The centers and (two instances of ) are chosen to be and , respectively, such that the distance between the cluster centers is .
In Figure 5D, we study the variation of homogeneity score as a function of for several values of . The computed clusters for at and at are shown in panels (e) and (f), respectively, to showcase the typical scenarios considered here.
For all , is zero for low (high overlap). There is then a region where increases with and then saturates at unity for high (little to no overlap). When the two clusters are too close, i.e., , they are in fact indistinguishable and qCLUE labels all points together. Increasing the ratio makes the clusters move away from each other and thus qCLUE can discern them. This behavior can be observed in Figures 5E, F. Importantly, large values of are already attained when the clusters still have a significant overlap. In this scenario, employing the weight labels and the weight density considerably contributes to accurate cluster assignment. In fact, the nearest higher points are more likely to connect the points near or on the decision boundary with the more energetic core, thus separating the clusters better.
The performance of qCLUE is also affected by the ratio . When one cluster contains more points than the other, it is more likely to “capture” points from the smaller. The resulting loss in homogeneity score for low ratios is evident from Figure 5D, where it can be seen that clusters of similar sizes are better distinguished from each other.
3.4 Non-centroidal clusters
Finally, we study the performance of qCLUE on non-centroidal clusters. For this purpose, we use the Moons and Circles datasets in Figures 5G–J, generated using (
In the latter case, we assign the highest value of the weight for each cluster to a single point and lower the weights of all other points proportionally to their coordinate. In the case of the moon dataset, for the upper moon (so the top point of the upper moon has the maximum weight in the cluster) and for the lower moon (so the bottom point has the highest weight in the cluster). For the circles, for the inner circle and for the outer one.
Since these datasets are noiseless and well separated, is always one and we employ to characterize the performance of qCLUE. As in Figures 5G, H the weight profile is uniform, and several points satisfy the seed condition. Therefore, qCLUE groups each circle into several clusters, such that we obtain limited values for . On the contrary, cases with a weight profile assigned [Figures 5I, J] results in fewer seeds that are better recognized by qCLUE, and the completeness score is considerably enhanced.
4 Conclusion and outlook
We introduced qCLUE, a novel quantum clustering algorithm designed to address the computational challenges associated with high-dimensional datasets. qCLUE’s significance lies in its potential to efficiently cluster data by effectively leveraging quantum computing, mitigating the escalating computational complexity encountered by classical algorithms upon increasing dimensionality of datasets. The algorithm’s ability to navigate high-dimensional spaces is particularly promising on datasets with high point density, where local searches become too demanding for classical computers. Therefore, qCLUE will be beneficial in multiple scenarios, ranging from quantum-enhanced machine learning (
According to our numerical results, qCLUE works well and its performance is significantly enhanced when a weight profile is assigned. Specifically, we study qCLUE in noisy environments, on overlapping clusters, and on non-centroidal datasets that are commonly used to benchmark clustering algorithms (
As a first outlook, we identify the implementation of qCLUE on NISQ hardware (
Statements
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://zenodo.org/records/12655189.
Author contributions
DG: Writing–original draft, Writing–review and editing. LD: Writing–original draft, Writing–review and editing. AD: Writing–original draft, Writing–review and editing. WR: Writing–original draft, Writing–review and editing. FP: Writing–original draft, Writing–review and editing. MM: Writing–original draft, Writing–review and editing.
Funding
The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. CERN Quantum Initiative. Wolfgang Gentner Programme of the German Federal Ministry of Education and Research (grant no. 13E18CHA). EPSRC quantum career development grant EP/W028301/1. NTT PHI Lab. Government of Canada through Innovation, Science and Economic Development Canada (ISED). Province of Ontario through the Ministry of Colleges and Universities.
Acknowledgments
We thank the CERN Quantum Initiative, Fabio Fracas for creating the fertile ground for starting this project and Andrew J. Jena as well as Priyanka Mukhopadhyay for theoretical support. WR acknowledges the Wolfgang Gentner Programme of the German Federal Ministry of Education and Research (grant no. 13E18CHA). LD acknowledges the EPSRC quantum career development grant EP/W028301/1. DG and MM acknowledge the NTT PHI Lab for funding. Research at IQC is further supported by the Government of Canada through Innovation, Science and Economic Development Canada (ISED). Research at Perimeter Institute is supported in part by the Government of Canada through ISED and by the Province of Ontario through the Ministry of Colleges and Universities.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frqst.2024.1462004/full#supplementary-material
References
1
AadG.AbajyanT.AbbottB.AbdallahJ.Abdel KhalekS.AbdelalimA.et al (2012). Observation of a new particle in the search for the standard model higgs boson with the ATLAS detector at the LHC. Phys. Lett. B716 (1), 1–29. 10.1016/j.physletb.2012.08.020
2
AaijR.AbdelmottelebA.Abellan BetetaC.AbudinénF.AckernleyT.AdevaB.et al (2023). First observation of a doubly charged tetraquark and its neutral partner. Phys. Rev. Lett.131 (4), 041902. 10.1103/PhysRevLett.131.041902
3
AïmeurE.BrassardG.GambsS. (2007). “Quantum clustering algorithms,” in Proceedings of the 24th International Conference on Machine Learning. ICML ’07, Corvalis, Oregon, June 20–24, 2007 (New York, NY: Association for Computing Machinery), 1–8. 10.1145/1273496.1273497
4
AmaroF. D.AntoniettiR.BaracchiniE.BenussiL.BiancoS.BorraF.et al (2023). Directional iDBSCAN to detect cosmic-ray tracks for the CYGNO experiment. Meas. Sci. Technol.34 (12), 125024. 10.1088/1361-6501/acf402
5
AruteF.AryaK.BabbushR.BaconD.BardinJ. C.BarendsR.et al (2019). Quantum supremacy using a programmable superconducting processor. Nature574, 505–510. 10.1038/s41586-019-1666-5
6
AsurS.UcarD.SrinivasanP. (2007). An ensemble framework for clustering protein–protein interaction networks. Bioinformatics23.13, i29–i40. 10.1093/bioinformatics/btm212
7
AuW.-H.ChanK. C. C.WongA. K. C.WangY. (2005). Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinforma.2 (2), 83–101. 10.1109/TCBB.2005.17
8
BernienH.SchwartzS.KeeslingA.LevineH.OmranA.PichlerH.et al (2017). Probing many-body dynamics on a 51-atom quantum simulator. Nature551, 579–584. 10.1038/nature24622
9
BrassardG.HøyerP.MoscaM.TappA. (2002). Quantum amplitude amplification and estimation. arxiv53, 74. 10.1090/conm/305/05215
10
BrondolinE. (2022). CLUE a clustering algorithm for current and future experiments. Tech. Rep. 10.1088/1742-6596/2438/1/012074
11
CarusoG.Antonio GattoneS.FortunaF.Di BattistaT. (2018). “Cluster analysis as a decision-making tool: a methodological review,” in Decision economics: in the tradition of herbert A. Simon’s heritage. Editors BucciarelliE.ChenS.-H.CorchadoJ. M. (Cham: Springer International Publishing), 48–55.
12
CeliA.VermerschB.ViyuelaO.PichlerH.LukinM. D.ZollerP. (2020). Emerging two-dimensional gauge theories in rydberg configurable arrays. Phys. Rev. X10 (2), 021057. 10.1103/PhysRevX.10.021057
13
ChangJ.WangL.MengG.XiangS.PanC. (2017). “Deep adaptive image clustering,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, October 22–29, 2017 (ICCV).
14
CMS Collaboration (2022). The TICL (v4) reconstruction at the CMS phase-2 high granularity calorimeter endcap.
15
CMS Collaboration (2023). Development of the CMS detector for the CERN LHC run.
16
CMS Collaboration (2024). Review of top quark mass measurements in CMS.
17
ColemanG. B.AndrewsH. C. (1979). Image segmentation by clustering. Proc. IEEE67 (5), 773–785. 10.1109/PROC.1979.11327
18
CórcolesA. D.MagesanE.SrinivasanS. J.CrossA. W.SteffenM.GambettaJ. M.et al (2015). Demonstration of a quantum error detection code using a square lattice of four superconducting qubits. Nat. Commun.6 (1), 6979. 10.1038/ncomms7979
19
DalitzC.AyyadY.WilbergJ.AymansL.BazinD.MittigW. (2019). Automatic trajectory recognition in Active Target Time Projection Chambers data by means of hierarchical clustering. Comput. Phys. Commun.235, 159–168. 10.1016/j.cpc.2018.09.010
20
DebnathS.LinkeN. M.FiggattC.LandsmanK. A.WrightK.MonroeC. (2016). Demonstration of a small programmable quantum computer with atomic qubits. Nature536, 63–66. 10.1038/nature18648
21
DidierC.AustinB. (2017). The phase-2 upgrade of the CMS endcap calorimeter. CERN LHC Experiments Committee. 10.17181/CERN.IV8M.1JY2
22
DuarteM.BuffoniL.OmarY. (2023). Quantum density peak clustering. Quantum Mach. Intell.5 (1), 9. 10.1007/s42484-022-00090-0
23
DuttaP.SahaS.PaiS.KumarA. (2020). A protein interaction information-based generative model for enhancing gene clustering. Sci. Rep.10 (1), 665. 10.1038/s41598-020-57437-5
24
FujitaK. (2021). Approximate spectral clustering using both reference vectors and topology of the network generated by growing neural gas. PeerJ Comput. Sci.7, e679. 10.7717/peerj-cs.679
25
GaffeyM. J. (2010). Space weathering and the interpretation of asteroid reflectance spectra. Icarus209(2), 564–574. 10.1016/j.icarus.2010.05.006
26
GalluccioL.MichelO.BendjoyaP.SlezakE.Bailer-JonesC. A. (2008). “Unsupervised clustering on astrophysics data: asteroids reflectance spectra surveys and hyperspectral images,” in Classification and discovery in large astronomical surveys. Editor Bailer-JonesC. A. L. (American Institute of Physics Conference Series), 1082, 165–171. 10.1063/1.3059034
27
GaoA.RasmussenB.KulitsP.SchellerE. L.GreenbergerR. N.EhlmannB. L. (2021). “Generalized unsupervised clustering of hyperspectral images of geological targets in the near infrared,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Nashville, TN, June 19–25, 2021, 4294–4303.
28
GiovannettiV.LloydS.MacconeL. (2008). Quantum random access memory. Phys. Rev. Lett.100, 160501. 10.1103/physrevlett.100.160501
29
GongC.ZhouN.XiaS.HuangS. (2024). Quantum particle swarm optimization algorithm based on diversity migration strategy. Future Gener. comput. Syst.157.C, 445–458. 10.1016/j.future.2024.04.008
30
GongL.-H.DingW.LiZ.WangY.-Z.ZhouN.-R. (2024a). Quantum K-nearest neighbor classification algorithm via a divide-and-conquer strategy. Adv. Quantum Technol.7.6, 2300221. 10.1002/qute.202300221
31
GongL.-H.PeiJ.-J.ZhangT.-F.ZhouN.-R. (2024b). Quantum convolutional neural network based on variational quantum circuits. Opt. Commun.550, 129993. 10.1016/j.optcom.2023.129993
32
GongL.-H.XiangL.-Z.LiuS.-H.ZhouN.-R. (2022). Born machine model based on matrix product state quantum circuit. Phys. A Stat. Mech. its Appl.593, 126907. 10.1016/j.physa.2022.126907
33
GopalakrishnanD.DellantonioL.Di PilatoA.RedjebW.PantaleoF.MoscaM. (2024). QLUE-algo/qlue: frontiers-paper. Version frontiers-paper. 10.5281/zenodo.12655189
34
GuZ.HübschmannD. (2022). SimplifyEnrichment: a bioconductor package for clustering and visualizing functional enrichment results. Genomics, Proteomics Bioinforma.21 (1), 190–202. 10.1016/j.gpb.2022.04.008
35
HaugT.SelfC. N.KimM. S. (2023). Quantum machine learning of large datasets using randomized measurements. Mach. Learn. Sci. Technol.4 (1), 015005. 10.1088/2632-2153/acb0b4
36
HayrapetyanA.TumasyanA.Adamw.AndrejkovicJ. W.BergauerT.ChatterjeeS.et al (2024). Search for new physics with emerging jets in proton-proton collisions at √s=13\TeV. JHEP07, 142. 10.1007/JHEP07(2024)142
37
HayrapetyanA.TumasyanA.AdamW.AndrejkovicJ.BergauerT.ChatterjeeS.et al (2023). Observation of four top quark production in proton-proton collisions at √s=13TeV. Phys. Lett. B847, 138290. 10.1016/j.physletb.2023.138290
38
HuangJ.-J.TzengG.-H.OngC.-S. (2007). Marketing segmentation using support vector clustering. Expert Syst. Appl.32.2, 313–317. 10.1016/j.eswa.2005.11.028
39
JekaterinaJ. (2023). A new trackster linking algorithm based on graph neural networks for the CMS experiment at the large Hadron collider at CERN. Present. 14 Jul 2023. Prague, Tech. U.
40
KarimM. R.BeyanO.ZappaA.CostaI. G.Rebholz-SchuhmannD.CochezM.et al (2020). Deep learning-based clustering approaches for bioinformatics. Briefings Bioinforma.22 (1), 393–415. 10.1093/bib/bbz170
41
KerenidisI.LandmanJ. (2021). Quantum spectral clustering. Phys. Rev. A103 (4), 042415. 10.1103/PhysRevA.103.042415
42
KerenidisI.LandmanJ.LuongoA.PrakashA. (2019). “q-means: a quantum algorithm for unsupervised machine learning”. in Advances in neural information processing systems. Editor WallachH.et al (Red Hook, New York: Curran Associates, Inc.), 32
43
Kishore KumarR.BirlaL.Sreenivasa RaoK. (2018). A robust unsupervised pattern discovery and clustering of speech signals. Pattern Recognit. Lett.116, 254–261. 10.1016/j.patrec.2018.10.035
44
LabuhnH.BarredoD.RavetsS.de LéséleucS.MacrìT.LahayeT.et al (2016). Tunable two-dimensional arrays of single Rydberg atoms for realizing quantum Ising models. Nature534.7609, 667–670. 10.1038/nature18274
45
LanyonB. P.HempelC.NiggD.MüllerM.GerritsmaR.ZähringerF.et al (2011). Universal digital quantum simulation with trapped ions. Science334, 57–61. 10.1126/science.1208001
46
LovK. G. (1996). “A fast quantum mechanical algorithm for database search,” in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing. STOC ’96, Philadelphia, Pennsylvania, USA, May 22–24, 1996 (New York, NY: Association for Computing Machinery), 212–219. 10.1145/237814.237866
47
MaganoD.KumarA.KālisM.LocānsA.GlosA.PratapsiS.et al (2022). Quantum speedup for track reconstruction in particle accelerators. Phys. Rev. D.105 (7), 076012. 10.1103/PhysRevD.105.076012
48
NashB.GheorghiuV.MoscaM. (2020). Quantum circuit optimizations for NISQ architectures. Quantum Sci. Technol.5.2, 025010. 10.1088/2058-9565/ab79b1
49
NgH. P.OngS. H.FoongK. W. C.GohP. S.NowinskiW. L. (2006). “Medical image segmentation using K-means clustering and improved watershed algorithm,” in 2006 IEEE Southwest Symposium on Image Analysis and Interpretation, Denver, CO, March 26–28, 2006, 61–65. 10.1109/SSIAI.2006.1633722
50
NicotraD.Lucio MartinezM.de VriesJ.MerkM.DriessensK.WestraR.et al (2023). A quantum algorithm for track reconstruction in the LHCb vertex detector. J. Instrum.18, P11028. 10.1088/1748-0221/18/11/p11028
51
NielsenM. A.ChuangI. L. (2010). Quantum computation and quantum information. 10th Anniversary Edition. Cambridge, England: Cambridge University Press.
52
OyeladeJ.IsewonI.OladipupoF.AromolaranO.UwoghirenE.AmehF.et al (2016). Clustering algorithms: their application to gene expression data. Bioinform. Biol.10BBI.S38316. 10.4137/BBI.S38316
53
OyeladeJ.IsewonI.OladipupoO.EmeboO.OmogbadegunZ.AromolaranO.et al (2019). “Data clustering: algorithms and its applications,” in 2019 19th International Conference on Computational Science and Its Applications (ICCSA), St. Petersburg, Russia, July 01–04, 2019, 71–81. 10.1109/ICCSA.2019.000-1
54
PedregosaF.VaroquauxG.GramfortA.MichelV.ThirionB.GriselO.et al (2018). Scikit-learn: machine learning in Python. Research Gate.
55
PiresD.BargassaP.SeixasJ.OmarY. (2021). A digital quantum algorithm for jet clustering in high-energy physics. Research Gate. 10.48550/arXiv.2101.05618
56
PunjG.StewartD. W. (1983). Cluster analysis in marketing research: review and suggestions for application. J. Mark. Res.20 (2), 134–148. 10.1177/002224378302000204
57
QaqishB. F.O’BrienJ. J.HibbardJ. C.ClowersK. J. (2017). Accelerating high-dimensional clustering with lossless data reduction. Bioinformatics33.18, 2867–2872. 10.1093/bioinformatics/btx328
58
RodenkoS. A.MayorovA. G.MalakhovV. V.TroitskayaI. K.on behalf of PAMELA collaboration (2019). Track reconstruction of antiprotons and antideuterons in the coordinate-sensitive calorimeter of PAMELA spectrometer using the Hough transform. J. Phys. Conf. Ser.1189 (1), 012009. 10.1088/1742-6596/1189/1/012009
59
RosenbergA.HirschbergJ. (2007). “V-measure: a conditional entropy-based external cluster evaluation measure,” in Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Prague, Czech Republic, June 28–30, 2007, 410–420.
60
RovereM.ChenZ.Di PilatoA.PantaleoF.SeezC. (2020). CLUE: a fast parallel clustering algorithm for high granularity calorimeters in high-energy physics. Front. Big Data3, 591315. 10.3389/fdata.2020.591315
61
Schickel-ZuberV.FaltingsB. (2007). “Using hierarchical clustering for learning theontologies used in recommendation systems,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’07, San Jose, California, USA, August 12–15, 2007 (New York, NY: Association for Computing Machinery), 599–608. 10.1145/1281192.1281257
62
SeidelR.TcholtchevN.BockS.Kai-Uwe BeckerC.HauswirthM. (2021). Efficient floating point arithmetic for quantum computers. Research Gate.
63
ShepitsenA.GemmellJ.MobasherB.BurkeR. (2008). “Personalized recommendation in social tagging systems using hierarchical clustering,” in Proceedings of the 2008 ACM Conference on Recommender Systems. RecSys ’08, Lausanne, Switzerland, October 23–25, 2008 (New York, NY: Association for Computing Machinery), 259–266. 10.1145/1454008.1454048
64
SinayskiyI.SchuldM.PetruccioneF. (2015). An introduction to quantum machine learning. Contemp. Phys.56.2, 172–185. 10.1080/00107514.2014.964942
65
TiwariP.DehdashtiS.Karim ObeidA.MelucciM.BruzaP. (2020). Kernel method based on non-linear coherent state. Quantum Physics. 10.48550/arXiv.2007.07887
66
TumasyanA.AdamW.AndrejkovicJ.BergauerT.ChatterjeeS.DamanakisK.et al (2023). Measurement of the decay properties and search for the decay in proton-proton collisions at √s=13TeV. Phys. Lett. B842, 137955. 10.1016/j.physletb.2023.137955
67
TüysüzC.CarminatiF.DemirközB.DobosD.FracasF.NovotnyK.et al (2020). “Particle track reconstruction with quantum algorithms” in The European Physical Journal Conferences. Editor DoglioniC., 09013.
68
TüysüzC.RiegerC.NovotnyK.DemirközB.DobosD.PotamianosK.et al (2021). Hybrid quantum classical graph neural networks for particle track reconstruction. Quantum Mach. Intell.3.2, 29. 10.1007/s42484-021-00055-9
69
WangJ.LiM.DengY.PanYi (2010). Recent advances in clustering methods for protein interaction networks. BMC Genomics11 (3), S10. 10.1186/1471-2164-11-S3-S10
70
WeiA. Y.NaikP.HarrowA. W.ThalerJ. (2020). Quantum algorithms for jet clustering. Phys. Rev. D.101 (9), 094015. 10.1103/PhysRevD.101.094015
71
WuJ.ChenX.-Y.ZhangH.XiongL.-D.LeiH.DengS.-H. (2019). Hyperparameter optimization for machine learning models based on bayesian optimizationb. J. Electron. Sci. Technol.17.1, 26–40. 10.11989/JEST.1674-862X.80904120
72
WuT.LiuX.QinJ.HerreraF. (2021). Balance dynamic clustering analysis and consensus reaching process with consensus evolution networks in large-scale group decision making. IEEE Trans. Fuzzy Syst.29 (2), 357–371. 10.1109/TFUZZ.2019.2953602
73
WuX.YanJ.LiuN.YanS.ChenY.ChenZ. (2009). “Probabilistic latent semantic user segmentation for behavioral targeted advertising,” in Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising. ADKDD ’09, Paris, France, June 28, 2009 (New York, NY: Association for Computing Machinery), 10–17. 10.1145/1592748.1592751
74
ZeguendryA.JarirZ.MohamedQ. (2023). Quantum Machine Learning: A Review and Case Studies. Entropy (Basel)25. 287. 10.3390/e25020287
75
ZhouJ.ZhaiL.PantelousA. A. (2020). Market segmentation using high-dimensional sparse consumers data. Expert Syst. Appl. Expert Syst. Appl.145, 113136. 10.1016/j.eswa.2019.113136
76
ZhouN.-R.LiuX.-X.ChenY.-L.DuN.-S. (2021). Quantum K-Nearest-Neighbor image classification algorithm based on K-L transform. Int. J. Theor. Phys.60 (3), 1209–1224. 10.1007/s10773-021-04747-7
77
ZlokapaA.AnandA.VlimantJ. R.DuarteJ. M.JobJ.LidarD.et al (2021). Charged particle tracking with quantum annealing optimization. Quantum Mach. Intell.3 (2), 27. 10.1007/s42484-021-00054-w
Summary
Keywords
clustering, cern, high energy physics (HEP), quantum, machine learning and artificial intelligence, quantum computation (QC)
Citation
Gopalakrishnan D, Dellantonio L, Di Pilato A, Redjeb W, Pantaleo F and Mosca M (2024) qCLUE: a quantum clustering algorithm for multi-dimensional datasets. Front. Quantum Sci. Technol. 3:1462004. doi: 10.3389/frqst.2024.1462004
Received
09 July 2024
Accepted
19 September 2024
Published
11 October 2024
Volume
3 - 2024
Edited by
Fedor Jelezko, University of Ulm, Germany
Reviewed by
Prasanta Panigrahi, Indian Institute of Science Education and Research Kolkata, India
Nanrun Zhou, Shanghai University of Engineering Sciences, China
Updates

Check for updates
Copyright
© 2024 Gopalakrishnan, Dellantonio, Di Pilato, Redjeb, Pantaleo and Mosca.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Dhruv Gopalakrishnan, dhruv.gopalakrishnan@gmail.com; Luca Dellantonio, l.dellantonio@exeter.ac.uk; Antonio Di Pilato, tony.dipilato@cern.ch; Wahid Redjeb, wahid.redjeb@cern.ch; Felice Pantaleo, felice.pantaleo@cern.ch; Michele Mosca, michele.mosca@uwaterloo.ca
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.