<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Neurorobot.</journal-id>
<journal-title>Frontiers in Neurorobotics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Neurorobot.</abbrev-journal-title>
<issn pub-type="epub">1662-5218</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fnbot.2021.680613</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Yan</surname> <given-names>An</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wang</surname> <given-names>Wei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1193456/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Ren</surname> <given-names>Yi</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Geng</surname> <given-names>HongWei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/708776/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Xinjiang Agricultural University</institution>, <addr-line>&#x000DC;r&#x000FC;mqi</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Anyang Institute of Technology</institution>, <addr-line>Anyang</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Peng Li, Dalian University of Technology, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Haifeng Li, Dalian University of Technology, China; Daohua Liu, Xinyang Normal University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: HongWei Geng <email>hw-geng&#x00040;163.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>06</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>15</volume>
<elocation-id>680613</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>03</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>04</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Yan, Wang, Ren and Geng.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Yan, Wang, Ren and Geng</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>The problems of data abnormalities and missing data are puzzling the traditional multi-modal heterogeneous big data clustering. In order to solve this issue, a multi-view heterogeneous big data clustering algorithm based on improved Kmeans clustering is established in this paper. At first, for the big data which involve heterogeneous data, based on multi view data analyzing, we propose an advanced Kmeans algorithm on the base of multi view heterogeneous system to determine the similarity detection metrics. Then, a BP neural network method is used to predict the missing attribute values, complete the missing data and restore the big data structure in heterogeneous state. Last, we ulteriorly propose a data denoising algorithm to denoise the abnormal data. Based on the above methods, we construct a framework namely BPK-means to resolve the problems of data abnormalities and missing data. Our solution approach is evaluated through rigorous performance evaluation study. Compared with the original algorithm, both theoretical verification and experimental results show that the accuracy of the proposed method is greatly improved.</p></abstract>
<kwd-group>
<kwd>multi-view</kwd>
<kwd>BP neural network</kwd>
<kwd>missing attributes</kwd>
<kwd>Kmeans</kwd>
<kwd>noise reduction processing</kwd>
<kwd>data integrity</kwd>
</kwd-group>
<counts>
<fig-count count="9"/>
<table-count count="4"/>
<equation-count count="18"/>
<ref-count count="27"/>
<page-count count="9"/>
<word-count count="5508"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>As the carrier of information, data must accurately and reliably reflect the objective things in the real world (Murtagh and Pierre, <xref ref-type="bibr" rid="B15">2014</xref>; Brzezi&#x00144;ska and Hory&#x00144;, <xref ref-type="bibr" rid="B3">2020</xref>). How to extract effective information on a large number of data sets for data mining, in addition to effective data analysis technology, good data quality is the basic condition of various data mining (Adnan et al., <xref ref-type="bibr" rid="B1">2020</xref>). In the era of big data, data quality is a key issue that restricts the development of the data industry (Zeng et al., <xref ref-type="bibr" rid="B26">2021</xref>). Therefore, how to effectively ensure the integrity and accuracy of the data and improve the data quality has become an urgent problem to be solved. As data collection and data expression methods become more and more diversified, it has become more convenient to obtain a large amount of multi-source heterogeneous data (Rashidi et al., <xref ref-type="bibr" rid="B16">2020</xref>; Wu et al., <xref ref-type="bibr" rid="B22">2020</xref>). The emergence of multi-source heterogeneous data and the need to mine the inherent information on such data naturally gave rise to modeling learning for multi-source heterogeneous data. Currently, there are two main forms of multi-source heterogeneous data: multi-modal data and multi-view data. Multi-view data refers to the data obtained by describing the same thing from different ways or different angles (Kaur et al., <xref ref-type="bibr" rid="B9">2019</xref>). The meaning of multi-view includes multi-modality, multi-view can express a wider range of practical problems (Ma X. et al., <xref ref-type="bibr" rid="B14">2021</xref>). At present, data labels are usually difficult to obtain, the manifestation of heterogeneous data itself is extremely different. In addition, the noise and outliers contained in the original data put forward higher requirements on the robustness of the algorithm (Ma et al., <xref ref-type="bibr" rid="B13">2020</xref>; Yang et al., <xref ref-type="bibr" rid="B24">2020</xref>). In particular, there are often more noise and outliers in multi-source heterogeneous data, which greatly affects the performance of the algorithm in practical applications. Therefore, unsupervised learning for multi-source heterogeneous data has important theoretical research value and broader application scenarios (Li et al., <xref ref-type="bibr" rid="B10">2018</xref>).</p>
<p>In multi-view data, the information contained in different views usually complements each other (Sang, <xref ref-type="bibr" rid="B18">2020</xref>). Fully mining the data of each view to obtain more comprehensive information is the main goal of multi-source heterogeneous data learning. The earliest multi-source heterogeneous data learning model can be traced back to the two-source data learning model based on canonical correlation analysis (Ruan et al., <xref ref-type="bibr" rid="B17">2020</xref>), which mines the consistent structure information of the data on the basis of the correlation between the two-source data. In addition, Bickel and Scheffer proposed a k-means-based multi-view clustering algorithm and used it to analyze data with two conditionally independent views for text clustering. Referring to the existing literature, the model proposed by (Bickel and Scheffer, <xref ref-type="bibr" rid="B2">2004</xref>) in 2004 is the first literature to study multi-view clustering. (De Sa, <xref ref-type="bibr" rid="B5">2005</xref>) proposed a simple and effective spectral clustering algorithm in the literature, and used the algorithm to process web page data containing two views. This method first uses the similarity matrix to fuse the feature information of the two views, and then uses the classical spectral clustering algorithm to perform clustering and obtain the final clustering result (Zhou et al., <xref ref-type="bibr" rid="B27">2020</xref>).</p>
<p>Self-organizing map (SOM) is an algorithm that uses artificial neural networks for clustering (Ma J. et al., <xref ref-type="bibr" rid="B12">2021</xref>). This method processes all the sample points one by one, and maps the cluster centers to a two-dimensional space to realize visualization. (Yu et al., <xref ref-type="bibr" rid="B25">2015</xref>) proposed an intuitionistic fuzzy kernel clustering algorithm based on particle swarm optimization. (Wu and Huang, <xref ref-type="bibr" rid="B23">2015</xref>) proposed a new DP-DBS can clustering algorithm based on differential privacy protection, which implements a differential protection mechanism. Deep learning is another leap-forward development of artificial neural networks (Ventura et al., <xref ref-type="bibr" rid="B20">2019</xref>). At present, clustering algorithms based on deep learning have also become a hot topic of research. The above-mentioned neural network-based Kmeans algorithm improves the clustering effect in the optimization process after the clustering center is given, but it does not clearly provide a method to determine the lack of cluster numerical attributes and data abnormalities. At the same time, there are many types and attributes of structured data and data collection is becoming more and more complex, resulting in more and more data shortages and data abnormalities. However, it is not that missing data is worthless and often it may be the value of missing data.</p>
<p>Aiming at the above problems, this paper proposes a BPK-means algorithm to improve the BP neural network. The algorithm first predicts the missing attribute values based on the BP neural network, which greatly improves the integrity and reliability of the data; then demises the abnormal data, and finally clusters the same data through different views to verify the difference relevance to attributes and clustering research.</p>
<p>The rest of the paper is organized as follows: Section Preliminaries discusses the basic algorithm. Section Problem Formalization presents the formally define the BPK-means algorithm. Section Optimization of Kmeans clustering algorithm proposes optimization of Kmeans clustering algorithm namely BPK-means algorithm. Finally, we analyze the proposed algorithm through experimental results in Section Experiment and Result Analysis.</p>
</sec>
<sec id="s2">
<title>Preliminaries</title>
<p>In this part, it mainly provides the basic algorithm of Kmeans algorithm and BP neural network, which are studied in this paper.</p>
<sec>
<title>Kmeans Algorithm</title>
<p>The traditional Kmeans algorithm is an unsupervised learning algorithm, that is to cluster the unlabeled data set. Algorithm main idea is as follows: firstly, <italic>K</italic> initial cluster centers are randomly selected, and each cluster center represents a data set cluster. Then the data points are divided into the nearest data cluster. Finally, the data cluster center is recalculated until the clustering criterion function converges. The convergence function is defined as follows:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msubsup><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mo stretchy="false">&#x02016;</mml:mo><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where <italic>E</italic> is the minimum square error of <italic>C</italic> &#x0003D; {<italic>C</italic><sub>1</sub>, <italic>C</italic><sub>2</sub>, &#x02026;, <italic>C</italic><sub><italic>k</italic></sub>} obtained by K-means algorithm for <italic>D</italic> &#x0003D; {<italic>x</italic><sub>1</sub>, <italic>x</italic><sub>2</sub>, &#x02026;, <italic>x</italic><sub><italic>k</italic></sub>}, and <italic>u</italic><sub><italic>i</italic></sub> is:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:msub><mml:mrow><mml:mstyle displaystyle="true"><mml:mo>&#x02211;</mml:mo></mml:mstyle></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mi>x</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where <italic>u</italic><sub><italic>i</italic></sub> is the mean vector of the <italic>C</italic><sub><italic>i</italic></sub>. Intuitively, the above formula describes the closeness of the samples in the cluster around the cluster mean vector to a certain extent. Generally, the smaller the <italic>E</italic> is, the higher the similarity of samples in the cluster is.</p>
<p><bold>Definition 1</bold> Euclidean distance is the linear distance between two points in Euclidean space. The Euclidean distance between sample <italic>x</italic><sub><italic>i</italic></sub> and <italic>x</italic><sub><italic>j</italic></sub> in m-dimensional space is as follows:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>d</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msqrt></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The original K-means algorithm determines the similarity of samples according to Euclidean distance.</p>
</sec>
<sec>
<title>BP Neural Networks</title>
<p>The more layers of neural network, the stronger learning ability. Generally, the learning ability of multi-layer network is better than that of single-layer network, but multi-layer network needs stronger learning algorithm. BP neural network algorithm is one of them, which is widely used.</p>
<p>BP neural network algorithm is a kind of multilayer feedforward network. Firstly, the difference between the output values and the expected value of the network is calculated. Then the partial derivative of the difference is obtained by using the function derivative method, and the feedback processing is carried out along the opposite direction of signal transmission of the system.</p>
<p>The basic idea of BP neural network learning algorithm is as follows (Hosseini and Azar, <xref ref-type="bibr" rid="B6">2016</xref>; Li et al., <xref ref-type="bibr" rid="B11">2017</xref>; Kanaan-Izquierdo et al., <xref ref-type="bibr" rid="B7">2018</xref>; Wu et al., <xref ref-type="bibr" rid="B21">2019</xref>):firstly, the data are input into the neural network of the selected samples. Then the results are processed and calculated in the hidden layer and the output results are taken as the input signals of the next layer, thus the error between the results of the output layers and the expected value of the neural network is obtained. Finally, the connection weights of the interconnected neurons in the neural network are adjusted to the direction of the minimum value of the error surface, and the error solving process is repeated until the output error of the whole neural network reaches the accuracy required.</p>
<p>The learning rule of BP neural network adopts the steepest descent method. Through the back propagation of the network, the weights and thresholds of the network are adjusted continuously to minimize the output error of the network. The topological structure of BP neural network models includes input layer, hidden layer and output layer. The BP neuron model is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Schematic diagram of BP neuron model.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0001.tif"/>
</fig>
<p>Let the input signal of BP neuron be <italic>P</italic>, the weight and threshold be <italic>w</italic> and <italic>b</italic>, respectively, and the processing result be <italic>y</italic>. Transfer functions are commonly used as <italic>logsig</italic> and <italic>tansig</italic>. The formula of <italic>logsig</italic> function is as follows:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mtext>log&#x000A0;</mml:mtext><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mstyle mathvariant='bold'><mml:mtext>b</mml:mtext></mml:mstyle></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The structure of BP neural network is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, including input layer, hidden layer and output layer.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Schematic diagram of BP neural network structure.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0002.tif"/>
</fig>
</sec>
</sec>
<sec id="s3">
<title>Problem Formalization</title>
<p>In this part, we introduce the models for scientific workflows. Then, we formulate the scheduling problem. To improve the readability, we sum up the main notations used throughout this paper in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>The main notations.</p></caption>
<table border="all">
<thead><tr>
<th valign="top" align="left"><bold>Symbols</bold></th>
<th valign="top" align="left"><bold>Definitions</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M5"><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msubsup><mml:mrow><mml:mstyle displaystyle="true"><mml:munder><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mo stretchy="false">&#x02016;</mml:mo><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
<td valign="top" align="left">Convergence function</td>
</tr>
<tr>
<td valign="top" align="left"><italic>D</italic> &#x0003D; {<italic>x</italic><sub>1</sub>, <italic>x</italic><sub>2</sub>, &#x02026;, <italic>x</italic><sub><italic>k</italic></sub>}</td>
<td valign="top" align="left">Original data samples</td>
</tr>
<tr>
<td valign="top" align="left"><italic>C</italic> &#x0003D; {<italic>C</italic><sub>1</sub>, <italic>C</italic><sub>2</sub>, &#x02026;, <italic>C</italic><sub><italic>k</italic></sub>}</td>
<td valign="top" align="left">Clusters</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M6"><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mi>x</mml:mi></mml:math></inline-formula></td>
<td valign="top" align="left">The mean vector of the <italic>C</italic><sub><italic>i</italic></sub></td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M7"><mml:mi>d</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msqrt></mml:math></inline-formula></td>
<td valign="top" align="left">Euclidean distance</td>
</tr>
<tr>
<td valign="top" align="left"><italic>a</italic> &#x0003D; log<italic>sig</italic>(<italic>w</italic> &#x0002B; <italic>b</italic>)</td>
<td valign="top" align="left">Transfer function</td>
</tr>
<tr>
<td valign="top" align="left"><italic>p</italic> &#x0222A; <italic>q</italic> &#x0003D; &#x003D5;</td>
<td valign="top" align="left">itp is missing data, and itq is noisy data</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M35"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></inline-formula></td>
<td valign="top" align="left">Represents an indicator function</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M8"><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
<td valign="top" align="left">Clustering precision</td>
</tr>
<tr>
<td valign="top" align="left"><italic>B</italic> &#x0003D; {<italic>b</italic><sub>1</sub>, <italic>b</italic><sub>2</sub>, &#x02026;, <italic>b</italic><sub><italic>n</italic></sub>}</td>
<td valign="top" align="left">Data samples after attribute completion</td>
</tr>
<tr>
<td valign="top" align="left"><italic>A</italic> &#x0003D; {<italic>a</italic><sub>1</sub>, <italic>a</italic><sub>2</sub>, <italic>a</italic><sub>3</sub>, &#x02026;, <italic>a</italic><sub><italic>n</italic></sub>}</td>
<td valign="top" align="left">Data samples after noise reduction processing</td>
</tr>
<tr>
<td valign="top" align="left"><italic>R</italic> &#x0003D; {<italic>r</italic><sub>1</sub>, <italic>r</italic><sub>2</sub>, <italic>r</italic><sub>3</sub>, &#x02026;, <italic>r</italic><sub><italic>k</italic></sub>}</td>
<td valign="top" align="left">The divided clusters of the cluster</td>
</tr>
<tr>
<td valign="top" align="left">{<italic>v</italic><sub>1</sub>, <italic>v</italic><sub>2</sub>, <italic>v</italic><sub>3</sub>, &#x02026;, <italic>v</italic><sub><italic>k</italic></sub>}</td>
<td valign="top" align="left">The initial cluster center</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M9"><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">&#x02016;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td valign="top" align="left">The distance from <italic>x</italic><sub><italic>j</italic></sub> to each vector <italic>v</italic><sub><italic>i</italic></sub></td>
</tr>
<tr>
<td valign="top" align="left"><italic>r</italic><sub><italic>j</italic></sub> &#x0003D; arg min<sub><italic>i</italic>&#x02208;{1, 2, &#x02026;, <italic>k</italic>}</sub><italic>d</italic><sub><italic>ji</italic></sub></td>
<td valign="top" align="left">Mark the nearest center <italic>x</italic><sub><italic>j</italic></sub></td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M10"><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
<td valign="top" align="left">The S-type transfer function</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M11"><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munder><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
<td valign="top" align="left">The error function</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M12"><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msqrt><mml:mo>&#x0002B;</mml:mo><mml:mi>a</mml:mi></mml:math></inline-formula></td>
<td valign="top" align="left">The number of hidden layers</td>
</tr>
<tr>
<td valign="top" align="left">itTraingdx</td>
<td valign="top" align="left">The training function</td>
</tr>
<tr>
<td valign="top" align="left">itmse</td>
<td valign="top" align="left">The performance function</td>
</tr>
<tr>
<td valign="top" align="left">itlr</td>
<td valign="top" align="left">The learning rate</td>
</tr>
<tr>
<td valign="top" align="left"><italic>I</italic> &#x0003D; (<italic>i</italic><sub>1</sub>, <italic>i</italic><sub>2</sub>, &#x02026;, <italic>i</italic><sub><italic>k</italic></sub>)</td>
<td valign="top" align="left">Outlier points</td>
</tr>
<tr>
<td valign="top" align="left"><inline-formula><mml:math id="M13"><mml:mi>&#x003B5;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula></td>
<td valign="top" align="left">Threshold of outliers error range</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title>System Models</title>
<sec>
<title>Scientific Model</title>
<p>A model of multi-modal heterogeneous big data with abnormal data in this paper is data aggregation clustering, i.e., <italic>D</italic> &#x0003D; {<italic>x</italic><sub>1</sub>, <italic>x</italic><sub>2</sub>, &#x02026;, <italic>x</italic><sub><italic>k</italic></sub>}. Firstly, we assume that there is no missing and noise data in the initial data <italic>D</italic>, then for the multi-view clustering of <italic>D</italic>, the final clustering accuracy is evaluated according to the combination of different attributes of <italic>D</italic> data. However, for some abnormal data, such as missing and noise data, the clustering effect will be greatly reduced for the same clustering algorithm. Now we assume that for the original data <italic>D</italic>, there are <italic>p</italic> records containing missing values and <italic>q</italic> records containing noise data.</p>
</sec>
</sec>
<sec>
<title>Problem Formulations</title>
<p>Assuming the data <italic>D</italic>, the attribute field is missing <italic>p</italic> data, and <italic>q</italic> data are noisy data. Suppose <italic>p</italic> &#x0222A; <italic>q</italic> &#x0003D; &#x003D5;, The highest clustering accuracy of the traditional Kmeans algorithm is (<italic>M-p-q</italic>)<italic>/M</italic>. Of course, it&#x00027;s basically impossible.</p>
<p>The goal of our proposed algorithm is that the accuracy is greater than (<italic>M-p-q</italic>)<italic>/M</italic>. In addition, in the multi-modal data view, the traditional Kmeans algorithm has limited fault tolerance for different modal data. The data fault tolerance of our proposed algorithm can be improved. The final result is that the proposed algorithm has a higher clustering accuracy than the Kmeans algorithm. Further, we express it through formal language &#x003B4;(<italic>x, y</italic>) and <italic>ca</italic> as follows:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn><mml:mtext>&#x000A0;</mml:mtext><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E6"><label>(6)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>&#x003B4;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>t</italic><sub><italic>i</italic></sub> is real labels, <italic>r</italic><sub><italic>i</italic></sub> is labels after clustering, n is the total number of data, &#x003B4;(<italic>x, y</italic>) represents an indicator function. map function represents the optimal reproduction allocation of class labels.</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext>Objective</mml:mtext><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mtext>max&#x000A0;</mml:mtext><mml:mi>c</mml:mi><mml:mi>a</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E8"><label>(8)</label><mml:math id="M17"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext>Subjectto</mml:mtext><mml:mo>:</mml:mo><mml:mi>p</mml:mi><mml:mo>&#x0222A;</mml:mo><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x003D5;</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
</sec>
<sec id="s4">
<title>Optimization of Kmeans Clustering Algorithm</title>
<p>Generally, for the traditional Kmeans algorithm, the incomplete data is preprocessed and checked, and the preprocessed data is clustered. In addition, a lot of noise data appear due to the variety of acquisition methods. How to fill the missing attribute data and filter the noise data, the traditional Kmeans algorithm does not give a good solution. In view of the above problems, according to the characteristics that BP neural network can well-predict and detect unknown data, this paper proposes a BPK-means algorithm based on BP neural network to improve the Kmeans algorithm. BPK-means algorithm flow chart is described by <xref ref-type="fig" rid="F3">Figure 3</xref>. We use the BP neural network attribute completion algorithm to detect missing attributes and by using Data denoising algorithm to process noise data.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Flow Diagram of BPK-means algorithm.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0003.tif"/>
</fig>
<sec>
<title>BPK-Means Algorithm</title>
<p>The improvement of BPK-means algorithm and traditional Kmeans algorithm lies in the optimization of data attribute missing and denoising. Some of the missing attribute data is still meaningful. If it is discarded randomly, the result of data clustering will be inaccurate. BP neural network model can predict the missing data, and the prediction accuracy of the data can reach more than 90%, which greatly ensures the integrity of the data. In addition, outliers are used to analyze some noise data, and then BP neural network is used to judge whether the data is valid. BPK-means algorithm flow is described by Algorithm 1. Besides this, We define several functions in Algorithm 1.</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M18"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">&#x02016;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x02016;</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where <italic>d</italic><sub><italic>ij</italic></sub>(<italic>x</italic><sub><italic>j</italic></sub>, <italic>v</italic><sub><italic>i</italic></sub>) is the distance of <italic>x</italic><sub><italic>j</italic></sub> and <italic>v</italic><sub><italic>i</italic></sub>.</p>
<disp-formula id="E10"><label>(10)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo class="qopname">arg</mml:mo><mml:msub><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo class="qopname">&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:msub><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><italic>r</italic><sub><italic>j</italic></sub> is the minimum distance of <italic>d</italic><sub><italic>ij</italic></sub>(<italic>x</italic><sub><italic>j</italic></sub>, <italic>v</italic><sub><italic>i</italic></sub>),which can determines cluster markers of <italic>x</italic><sub><italic>j</italic></sub>.</p>
<table-wrap position="float">
<label>Algorithm 1</label>
<caption><p>BPK-means algorithm.</p></caption>
<table border="all">
<tbody>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;<bold>Input:</bold> Sample set <italic>D</italic> &#x0003D; {<italic>x</italic><sub>1</sub>, <italic>x</italic><sub>2</sub>, &#x02026;, <italic>x</italic><sub><italic>k</italic></sub>}; Number of clusters <italic>k</italic>;</td></tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;<bold>Output:</bold> The divided clusters of the cluster <italic>R</italic> &#x0003D; {<italic>r</italic><sub>1</sub>, <italic>r</italic><sub>2</sub>, <italic>r</italic><sub>3</sub>, &#x02026;, <italic>r</italic><sub><italic>k</italic></sub>};</td></tr>
<tr>
<td align="left" valign="top">&#x000A0;&#x000A0;1:Use BP neural network to complete the missing attributes of data set <italic>D</italic>;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;2: Using outliers and BP neural network to denoise the data, the processed data set sample is:
<disp-formula id="E11"><mml:math id="M20"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">A=</mml:mtext><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">a</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">a</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">a</mml:mtext></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext><mml:mo>&#x02026;</mml:mo><mml:mtext class="textrm" mathvariant="normal">,</mml:mtext><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">a</mml:mtext></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mo>;</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;3: Randomly select k samples from A as the initial vector, that is, the initial cluster center is recorded as the vector:
<disp-formula id="E12"><mml:math id="M21"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mo>;</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;4: Order <italic>C</italic><sub><italic>i</italic></sub> &#x0003D; &#x02205;(1 &#x02264; <italic>i</italic> &#x02264; <italic>k</italic>);</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;5: Loop <italic>j</italic> &#x0003D; 1, 2, &#x02026;, <italic>n</italic>;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;6: Calculate <italic>a</italic><sub><italic>j</italic></sub>the distance to each vector <italic>v</italic><sub><italic>i</italic></sub>(1 &#x02264; <italic>i</italic> &#x02264; <italic>k</italic>) and record it as<italic>d</italic><sub><italic>ij</italic></sub>(<italic>x</italic><sub><italic>j</italic></sub>, <italic>v</italic><sub><italic>i</italic></sub>);</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;7: The cluster mark determined according to the nearest center <italic>x</italic><sub><italic>j</italic></sub> point <italic>r</italic><sub><italic>j</italic></sub>;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;8: Group the samples <italic>x</italic><sub><italic>j</italic></sub> into corresponding clusters:
<disp-formula id="E13"><mml:math id="M22"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x0222A;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;9: Circulation order <italic>i</italic> &#x0003D; 1, 2, 3, &#x02026;, <italic>k</italic>;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;10: Calculate the new cluster vector<inline-formula><mml:math id="M23"><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;11: If<inline-formula><mml:math id="M24"><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x02260;</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, at this time, it is necessary to update the cluster class vector to<inline-formula><mml:math id="M25"><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;12: Otherwise, keep the current cluster vector<italic>v</italic><sub><italic>i</italic></sub> unchanged;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;13: End the loop until the cluster vector is not changed.</td></tr> 
</tbody>
</table>
</table-wrap>
<p>The BPK-means algorithm adds the integrity recovery of the data set and the detection of noise, which cannot guarantee the integrity of the data, and prevents the loss of important attributes of the data and low clustering accuracy. In order to show the execution process of the BPK-means algorithm more intuitively.</p>
</sec>
<sec>
<title>BP Neural Network Attribute Completion Algorithm</title>
<p>In BPK-means algorithm, BP neural network is used to complete the missing attributes of data set <italic>D</italic> in the first step. Next, the implementation process of the algorithm is described in detail. BP neural network attribute completion algorithm flow is described by Algorithm 2. What&#x00027;s more, We have made some functions to illustrate the application in Algorithm 2.</p>
<disp-formula id="E14"><label>(11)</label><mml:math id="M26"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext class="textrm" mathvariant="normal">=</mml:mtext><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The function <italic>f</italic>(<italic>x</italic>) is the S-type transfer function, we use it in BP net.</p>
<disp-formula id="E15"><label>(12)</label><mml:math id="M27"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where <italic>E</italic> is the error function. <italic>t</italic><sub><italic>i</italic></sub>is the expected output and <italic>O</italic><sub><italic>i</italic></sub> is the computational output of the network.</p>
<disp-formula id="E16"><label>(13)</label><mml:math id="M28"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msqrt><mml:mo>&#x0002B;</mml:mo><mml:mi>a</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><italic>l</italic> is the number of hidden layers, Where m is the number of neurons in the input layer, <italic>n</italic> is the number of neurons in the output layer, and a is a constant between (Murtagh and Pierre, <xref ref-type="bibr" rid="B15">2014</xref>; Ma et al., <xref ref-type="bibr" rid="B13">2020</xref>). According to a large number of experimental data, this algorithm sets <italic>a</italic>=<italic>3</italic>.</p>
<table-wrap position="float">
<label>Algorithm 2</label>
<caption><p>BP neural network attribute completion algorithm.</p></caption>
<table border="all">
<tbody>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;<bold>Input:</bold> sample set <italic>D</italic> &#x0003D; {<italic>x</italic><sub>1</sub>, <italic>x</italic><sub>2</sub>, &#x02026;, <italic>x</italic><sub><italic>k</italic></sub>};</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;<bold>Output:</bold> the whole data set <italic>B</italic> &#x0003D; {<italic>b</italic><sub>1</sub>, <italic>b</italic><sub>2</sub>, &#x02026;, <italic>b</italic><sub><italic>n</italic></sub>};</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;1: Scan the data set once. Find out the record number of the data set and record the data set with incomplete attributes as <italic>Q</italic> &#x0003D; {<italic>q</italic><sub>1</sub>, <italic>q</italic><sub>2</sub>, <italic>q</italic><sub>3</sub>, &#x02026;, <italic>q</italic><sub><italic>m</italic></sub>};</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;2: Judge the size of <italic>N</italic>. If <italic>N</italic> is more than 100000 records, then randomly select 20% as the training sample of neural network. If <italic>N</italic> is &#x02264;100,000 records, then select 60% of the data set as the training sample set;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;3: Three layers BP neural network model is constructed, which are input layer, hidden layer and output layer;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;4: The S type transfer function is set <italic>f</italic>(<italic>x</italic>).</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;5: The inverse error output is set, and the network weight and threshold are adjusted continuously to minimize the error function <italic>E</italic>. According to all the samples selected in the second step, the network is modeled. In this model, the attribute of data set is used as input, and the number of output nodes is set to 1, the <italic>l</italic> is used in the design of hidden layer.</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;6: The excitation function of hidden layer and output layer is <italic>Tansig</italic> and <italic>Logsig</italic>, respectively. The training function is <italic>Traingdx</italic>. The performance function is <italic>mse</italic>. The number of iterations is 50,000. The expected error goal is 0.000000001, and the learning rate <italic>lr</italic> is 0.01.</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;7: According to the setting of the network model in the previous steps, the network model is constructed and trained. In this way, the missing data set in <italic>Q</italic> &#x0003D; {<italic>q</italic><sub>1</sub>, <italic>q</italic><sub>2</sub>, <italic>q</italic><sub>3</sub>, &#x02026;, <italic>q</italic><sub><italic>m</italic></sub>} is predicted, and a complete data set is constructed and recorded as <italic>B</italic> &#x0003D; {<italic>b</italic><sub>1</sub>, <italic>b</italic><sub>2</sub>, &#x02026;, <italic>b</italic><sub><italic>n</italic></sub>}.</td></tr>
</tbody>
</table>
</table-wrap>
 
 <p>BP neural network attribute completion algorithm makes use of the characteristics of BP neural network. It uses effective data to build sample set for model training, and makes prediction and evaluation for missing attribute data. It not only ensures the integrity of the data, but also the artificial data is more scientific and accurate to a certain extent.</p>
</sec>
<sec>
<title>Data Denoising Algorithm</title>
<p>BP neural network is used to denoise data set <italic>D</italic> in the second step. Next, in the third part, the implementation process of the algorithm is described in detail. Data denoising algorithm is described by Algorithm 3. We define the error range function &#x003B5; in Algorithm 3.</p>
<disp-formula id="E18"><label>(14)</label><mml:math id="M30"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>&#x003B5;</mml:mi><mml:mtext class="textrm" mathvariant="normal">=</mml:mtext><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Data denoising can keep the smoothness of data. In this way, the data can be directly clustered after processing to improve the clustering accuracy of the data.</p>
<table-wrap position="float">
<label>Algorithm 3</label>
<caption><p>Data denoising algorithm.</p></caption>
<table border="all">
<tbody>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;<bold>Input:</bold> Complete sample set <italic>B</italic> &#x0003D; {<italic>b</italic><sub>1</sub>, <italic>b</italic><sub>2</sub>, &#x02026;, <italic>b</italic><sub><italic>n</italic></sub>};</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;<bold>Output:</bold> Data set after noise reduction <italic>A</italic> &#x0003D; {<italic>a</italic><sub>1</sub>, <italic>a</italic><sub>2</sub>, <italic>a</italic><sub>3</sub>, &#x02026;, <italic>a</italic><sub><italic>n</italic></sub>};</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;1: <italic>B</italic> &#x0003D; {<italic>b</italic><sub>1</sub>, <italic>b</italic><sub>2</sub>, &#x02026;, <italic>b</italic><sub><italic>n</italic></sub>}, the data are clustered by initial algorithm using Kmeans algorithm;</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;2: Finding out the points outside the cluster set. It is called outlier points <italic>I</italic> &#x0003D; (<italic>i</italic><sub>1</sub>, <italic>i</italic><sub>2</sub>, &#x02026;, <italic>i</italic><sub><italic>k</italic></sub>);</td></tr>
<tr><td align="left" valign="top">&#x000A0;&#x000A0;3: For each outlier, BP neural network is used to predict the corresponding attribute value, which is compared with the existing values. We define an error range function&#x003B5;.<break/> If &#x003B5; is greater than the given threshold, it is considered as a noise point for noise processing. Finally, a noise free data set is formed:
<disp-formula id="E17"><mml:math id="M29"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x02026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></td></tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>BPK-Means Algorithm Performance Analysis</title>
<p>The time complexity of traditional Kmeans algorithm depends on the number of attributes per record, the size of data, the number of iterations and the type of clustering. Its time complexity is expressed as O (<italic>I</italic><sup>&#x0002A;</sup><italic>n</italic><sup>&#x0002A;</sup><italic>k</italic> <sup>&#x0002A;</sup><italic>m</italic>), where <italic>I</italic> is the number of iterations, <italic>k</italic> is the type of clustering, <italic>m</italic> is the scale of data volume, and <italic>n</italic> is the number of attributes of each record. The BPK-means algorithm adds missing attribute field completion and attributes data prediction on the basis of Kmeans algorithm.</p>
<p>Suppose that p data are missing in attribute field and q data are noise data, then (<italic>p</italic> &#x0002B; <italic>q</italic>) neural network data processing is needed. At present, BP neural network has three layers. Assuming that the number of neurons in each layer is <italic>n1, n2</italic>, and <italic>n3</italic>, respectively. For example, if a sample (<italic>n1</italic> <sup>&#x0002A;</sup><italic>1</italic>) carries out feedforward calculation, it needs to carry out two matrix operations and two matrix multiplication. <italic>n1</italic><sup>&#x0002A;</sup><italic>n2</italic> and <italic>n2</italic><sup>&#x0002A;</sup><italic>n3</italic> calculations are performed respectively. The number of nodes (<italic>n1</italic> and <italic>n3</italic>) in input layer and final output layer is determined and can be regarded as constant. The hidden layer <italic>n2</italic> in the middle can be set by oneself. The time complexity of feedforward computation for a sample should be O(<italic>n1</italic><sup>&#x0002A;</sup><italic>n2</italic> &#x0002B; <italic>n2</italic><sup>&#x0002A;</sup><italic>n3</italic>) = O(<italic>n2</italic>). The time complexity of back propagation is the same as that of feedforward. Suppose there are m training samples in total, and each sample is trained only once, then the time complexity of training a neural network should be O(<italic>m</italic><sup>&#x0002A;</sup><italic>n2</italic>). Similarly, if a sample is predicted, the time complexity should be O(<italic>n2</italic>).</p>
<p>The total time of calculating each data result is (<italic>m</italic>&#x0002B;<italic>2</italic>) O (<italic>n</italic><sub>2</sub>). Since the training samples <italic>m</italic> and <italic>n</italic><sub>2</sub> can be determined basically, the time of each neural network is also linear. Finally, the total time complexity of BP neural network is(<italic>p</italic>&#x0002B;<italic>q</italic>)(<italic>m</italic>&#x0002B;<italic>2</italic>)O(<italic>n</italic><sub>2</sub>). The BPK-means algorithm focuses more on the accuracy of the algorithm, so that the time complexity is slightly higher than the original algorithm. For some scenes with high accuracy requirements, the cost is worth it.</p>
</sec>
</sec>
<sec id="s5">
<title>Experiment and Result Analysis</title>
<sec>
<title>Experimental Setup and Experimental Environment</title>
<p>In order to verify the effectiveness of the algorithm, four groups of experiments are carried out.</p>
<list list-type="simple">
<list-item><p>Experiment 1: verify the missing attribute completion algorithm through UCI data set to verify the approximation analysis of missing data and simulation prediction data, and verify the effectiveness of the algorithm.</p></list-item>
<list-item><p>Experiment 2: test the denoising algorithm in this paper through UCI data set to verify the denoising effect of this algorithm.</p></list-item>
<list-item><p>Experiment 3: the clustering accuracy of BPK-means algorithm and traditional K-means algorithm is compared and analyzed through different data sets.</p></list-item>
<list-item><p>Experiment 4: the clustering accuracy of BPK-means algorithm and traditional K-means algorithm is compared and analyzed through different multiple views.</p></list-item>
</list>
<p>The running environment of the experiment is Windows 7 operating system, 4 GB physical memory, CPU speed of 3.10 GHZ and programming language is Python3.8.</p>
</sec>
<sec>
<title>Experiment and Result Analysis</title>
<sec>
<title>A Completion Algorithm for Verifying Missing Attributes</title>
<p>Iris data (UCI Iris, <xref ref-type="bibr" rid="B19">2021</xref>) set of UCI is selected as the sample data set of the experiment. The data contains four attributes and 150 records. Randomly select 100 records as the training sample set, and then the rest of the data set, randomly select 10, making some attributes missing for the experiment. Then randomly select 10 items in the remaining data set to make some attributes missing for experiment.</p>
<p>The above experiments compare the approximation analysis of the predicted value and the real value of the missing attribute data. From <xref ref-type="fig" rid="F4">Figure 4</xref>, the BP neural network can be used to predict the missing attribute data well and achieve the effect of data completion.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Approximation analysis graph of missing sample attributes.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0004.tif"/>
</fig>
</sec>
<sec>
<title>Validation of Data Denoising Algorithm</title>
<p>Select the complete data set in Experiment 1, and then add noise artificially to verify whether the denoising algorithm is feasible.</p>
<p><xref ref-type="fig" rid="F5">Figures 5</xref>&#x02013;<xref ref-type="fig" rid="F7">7</xref> shows the effect drawing of data completion, the effect picture after noise addition and noise removal, respectively.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>The clustering effect diagram of the completion data of experiment one.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Clustering effect map of the data after noise.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0006.tif"/>
</fig>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>The clustering effect map of the data after noise removal.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0007.tif"/>
</fig>
</sec>
<sec>
<title>Validation of BPK-Means Algorithm</title>
<p>The standard data set is selected for clustering verification experiment. The data set is from the public UCI data (UCI Iris, <xref ref-type="bibr" rid="B19">2021</xref>) set. The selected data set randomly set 3&#x02013;10% missing data attributes to verify the clustering effect of BPK-means algorithm and K-means algorithm.</p>
<p>Through the comparison of clustering effect of different algorithms and the analysis of UCI data set clustering results from <xref ref-type="fig" rid="F8">Figure 8</xref>, we can see that it is verified that this algorithm has the advantage of high clustering accuracy in the process of clustering analysis for a small number of data with missing attributes about <xref ref-type="fig" rid="F9">Figure 9</xref>.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption><p>Comparison chart of clustering effect of different data sets.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0008.tif"/>
</fig>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption><p>View clustering effect diagram.</p></caption>
<graphic xlink:href="fnbot-15-680613-g0009.tif"/>
</fig>
</sec>
<sec>
<title>Validation the BPK-Means Algorithm in Multiple Views</title>
<p>The IRIS data set is selected as the experimental data set. The IRIS data set contains 4 columns of data sets and 1 column of label data. The data is randomly added with noise and missing. Sepal length and petal width are selected as data1, petal length and petal width are data2, sepal length and petal length are data3, and sepal width and petal width are data 4 for the accuracy of the multi-view verification algorithm.</p>
<p>By comparing the BPK-means algorithm under different views in <xref ref-type="fig" rid="F9">Figure 9</xref>, the accuracy of the algorithm is improved in different modes, which further shows that the BPK-means algorithm proposed in this paper can be more accurate for some multi-view clustering problems than traditional algorithms.</p>
<p>Through the above four experiments, the traditional algorithm clustering accuracy is performed from attribute missing, data noise, BPK-means algorithm and Kmeans algorithm, and the clustering accuracy of BPK-means algorithm and Kmeans algorithm is verified against Algorithm to verify.</p>
</sec>
</sec>
</sec>
<sec sec-type="conclusions" id="s6">
<title>Conclusion</title>
<p>This paper proposes an improved Kmeans algorithm of BP neural network. The algorithm uses the characteristics of BP neural network to predict and analyze the missing data attribute values. After the completion of the data set, perform data noise reduction processing to remove suspected noise Points are eliminated using BP neural network to ensure the smoothness of the clustering effect, so that the problem of missing data attribute values due to inconsistencies in the form and type of collected data can be solved. Four sets of experiments verify that the BPK-means algorithm proposed in this article can solve the problem of missing attributes, and can solve the clustering accuracy of the data, especially in the multi-view scenario, which is more accurate than the traditional algorithm. However, when the algorithm proposed in this paper is used to process larger data sets, a problem of high time complexity will be generated. Therefore, it is necessary to study how to better improve the accuracy while reducing the time complexity. At the same time dealing with missing data and noisy data reasonably and effectively is also the future improvement and research direction of the algorithm in this paper. This is also the future improvement and research direction of the algorithm in this paper.</p>
</sec>
<sec sec-type="data-availability-statement" id="s7">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found at: <ext-link ext-link-type="uri" xlink:href="http://archive.ics.uci.edu/ml/datasets/Iris">http://archive.ics.uci.edu/ml/datasets/Iris</ext-link>.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>WW and AY designed the research, performed the experiments, analyzed the data, and wrote the manuscript. YR gave a lot of suggestions in the design of the experiment and the modification of the article. HG performed the experiments. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adnan</surname> <given-names>R. M.</given-names></name> <name><surname>Khosravinia</surname> <given-names>P.</given-names></name> <name><surname>Karimi</surname> <given-names>B.</given-names></name> <name><surname>Kisi</surname> <given-names>O.</given-names></name></person-group> (<year>2020</year>). <article-title>Prediction of hydraulics performance in drain envelopes using Kmeans based multivariate adaptive regression spline</article-title>. <source>Appl. Soft Comput.</source> <volume>100</volume>:<fpage>107008</fpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2020.107008</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bickel</surname> <given-names>S.</given-names></name> <name><surname>Scheffer</surname> <given-names>T.</given-names></name></person-group> (<year>2004</year>). <article-title>&#x0201C;Multi-view clustering,&#x0201D;</article-title> in <source>Proceedings of the IEEE International Conference on Data Mining</source>, <fpage>19</fpage>&#x02013;<lpage>26</lpage>.</citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brzezi&#x00144;ska</surname> <given-names>A. N.</given-names></name> <name><surname>Hory&#x00144;</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Outliers in rules - the comparision of LOF, COF and KMEANS algorithms</article-title>. <source>Proc. Comput. Sci.</source> <volume>176</volume>, <fpage>1420</fpage>&#x02013;<lpage>1429</lpage>. <pub-id pub-id-type="doi">10.1016/j.procs.2020.09.152</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>J.</given-names></name> <name><surname>Huang</surname> <given-names>G. B.</given-names></name></person-group> (<year>2021</year>). <article-title>Dual distance adaptive multiview clustering</article-title>. <source>Neurocomputing</source> <volume>441</volume>, <fpage>311</fpage>&#x02013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2021.01.132</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>De Sa</surname> <given-names>V. R.</given-names></name></person-group> (<year>2005</year>). <article-title>&#x0201C;Spectral clustering with two views,&#x0201D;</article-title> in <source>Proceedings of the ICML Workshop on Learning With Multiple Views</source> (<publisher-loc>Bonn</publisher-loc>), <fpage>20</fpage>&#x02013;<lpage>27</lpage>.</citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hosseini</surname> <given-names>M.</given-names></name> <name><surname>Azar</surname> <given-names>F. T.</given-names></name></person-group> (<year>2016</year>). <article-title>A new eigenvector selection strategy applied to develop spectral clustering</article-title>. <source>Multidimens. Syst. Signal Process.</source> <volume>28</volume>, <fpage>1227</fpage>&#x02013;<lpage>1248</lpage>. <pub-id pub-id-type="doi">10.1007/s11045-016-0391-6</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kanaan-Izquierdo</surname> <given-names>S.</given-names></name> <name><surname>Ziyatdinov</surname> <given-names>A.</given-names></name> <name><surname>Perera-Lluna</surname> <given-names>A.</given-names></name></person-group> (<year>2018</year>). <article-title>Multiview and multifeature spectral clustering using common eigenvectors</article-title>. <source>Pattern Recognit. Lett.</source> <volume>102</volume>, <fpage>30</fpage>&#x02013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1016/j.patrec.2017.12.011</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kang</surname> <given-names>Z.</given-names></name> <name><surname>Zhao</surname> <given-names>X.</given-names></name> <name><surname>Peng</surname> <given-names>C.</given-names></name> <name><surname>Zhu</surname> <given-names>H.</given-names></name> <name><surname>Zhou</surname> <given-names>J. T.</given-names></name> <name><surname>Peng</surname> <given-names>X.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Partition level multiview subspace clustering</article-title>. <source>Neural Netw.</source> <volume>122</volume>, <fpage>279</fpage>&#x02013;<lpage>288</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2019.10.010</pub-id><pub-id pub-id-type="pmid">31731045</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaur</surname> <given-names>A.</given-names></name> <name><surname>Pal</surname> <given-names>S. K.</given-names></name> <name><surname>Singh</surname> <given-names>A. P.</given-names></name></person-group> (<year>2019</year>). <article-title>Hybridization of chaos and flower pollination algorithm over K-means for data clustering</article-title>. <source>Appl. Soft Comput.</source> <volume>97</volume>:<fpage>105523</fpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2019.105523</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Yang</surname> <given-names>L. T.</given-names></name> <name><surname>Gao</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name> <name><surname>Deen</surname> <given-names>M. J.</given-names></name></person-group> (<year>2018</year>). <article-title>An incremental deep convolutional computation model for feature learning on industrial big data</article-title>. <source>IEEE Trans. Indust. Inform.</source> <volume>15</volume>, <fpage>1341</fpage>&#x02013;<lpage>1349</lpage>. <pub-id pub-id-type="doi">10.1109/TII.2018.2871084</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>P.</given-names></name> <name><surname>Chen</surname> <given-names>Z.</given-names></name> <name><surname>Yang</surname> <given-names>L. T.</given-names></name> <name><surname>Zhao</surname> <given-names>L.</given-names></name> <name><surname>Zhang</surname> <given-names>Q.</given-names></name></person-group> (<year>2017</year>). <article-title>A privacy-preserving high-order neuro-fuzzy c-means algorithm with cloud computing</article-title>. <source>Neurocomputing</source> <volume>256</volume>, <fpage>82</fpage>&#x02013;<lpage>89</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2016.08.135</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>L.</given-names></name></person-group> (<year>2021</year>). <article-title>Discriminative subspace matrix factorization for multiview data clustering</article-title>. <source>Pattern Recognit.</source> <volume>111</volume>:<fpage>107676</fpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2020.107676</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>W.</given-names></name> <name><surname>Li</surname> <given-names>Q.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Ding</surname> <given-names>L.</given-names></name> <name><surname>Yu</surname> <given-names>Q.</given-names></name></person-group> (<year>2020</year>). <article-title>A method for weighing broiler chickens using improved amplitude-limiting filtering algorithm and BP neural networks</article-title>. <source>Informat. Proc. Agric</source>. <pub-id pub-id-type="doi">10.1016/j.inpa.2020.07.001</pub-id> Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/science/article/pii/S2214317320301888">https://www.sciencedirect.com/science/article/pii/S2214317320301888</ext-link></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname> <given-names>X.</given-names></name> <name><surname>Guan</surname> <given-names>Y.</given-names></name> <name><surname>Mao</surname> <given-names>R.</given-names></name> <name><surname>Zheng</surname> <given-names>S.</given-names></name> <name><surname>Wei</surname> <given-names>Q.</given-names></name></person-group> (<year>2021</year>). <article-title>Modeling of lead removal by living <italic>Scenedesmus obliquus</italic> using backpropagation (BP) neural network algorithm</article-title>. <source>Environ. Technol. Innovat.</source> <volume>22</volume>:<fpage>101410</fpage>. <pub-id pub-id-type="doi">10.1016/j.eti.2021.101410</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murtagh</surname> <given-names>F.</given-names></name> <name><surname>Pierre</surname> <given-names>L.</given-names></name></person-group> (<year>2014</year>). <article-title>Wards hierarchical agglomerative clustering method: which algorithms implement wards criterion?</article-title> <source>J. Classicat</source>. <volume>31</volume>, <fpage>274</fpage>&#x02013;<lpage>295</lpage>. <pub-id pub-id-type="doi">10.1007/s00357-014-9161-z</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rashidi</surname> <given-names>R.</given-names></name> <name><surname>Khamforoosh</surname> <given-names>K.</given-names></name> <name><surname>Sheikhahmadi</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>An analytic approach to separate users by introducing new combinations of initial centers of clustering</article-title>. <source>Phys. A Stat. Mech. Applic.</source> <volume>551</volume>:<fpage>124185</fpage>. <pub-id pub-id-type="doi">10.1016/j.physa.2020.124185</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ruan</surname> <given-names>X.</given-names></name> <name><surname>Zhu</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>J.</given-names></name> <name><surname>Cheng</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Predicting the citation counts of individual papers via a BP neural network</article-title>. <source>J. Informet</source>. <volume>14</volume>:<fpage>101039</fpage>. <pub-id pub-id-type="doi">10.1016/j.joi.2020.101039</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sang</surname> <given-names>B.</given-names></name></person-group> (<year>2020</year>). <article-title>Application of genetic algorithm and BP neural network in supply chain finance under information sharing</article-title>. <source>J. Comput. Appl. Math.</source> <volume>384</volume>:<fpage>113170</fpage>. <pub-id pub-id-type="doi">10.1016/j.cam.2020.113170</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="web"><person-group person-group-type="author"><collab>UCI Iris</collab></person-group> (<year>2021</year>). Available online at: <ext-link ext-link-type="uri" xlink:href="http://archive.ics.uci.edu/ml/datasets/Iris">http://archive.ics.uci.edu/ml/datasets/Iris</ext-link></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ventura</surname> <given-names>C.</given-names></name> <name><surname>Varas</surname> <given-names>D.</given-names></name> <name><surname>Vilaplana</surname> <given-names>V.</given-names></name> <name><surname>Giro-i-Nieto</surname> <given-names>X.</given-names></name> <name><surname>Marques</surname> <given-names>F.</given-names></name></person-group> (<year>2019</year>). <article-title>Multiresolution co-clustering for uncalibrated multiview segmentation</article-title>. <source>Signal Process. Image Commun.</source> <volume>76</volume>:<fpage>151</fpage>&#x02013;<lpage>166</lpage>. <pub-id pub-id-type="doi">10.1016/j.image.2019.04.010</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>H.</given-names></name> <name><surname>Jin</surname> <given-names>Q.</given-names></name> <name><surname>Zhang</surname> <given-names>C.</given-names></name> <name><surname>Guo</surname> <given-names>H.</given-names></name></person-group> (<year>2019</year>). <article-title>A selective mirrored task based fault tolerance mechanism for big data application using cloud</article-title>. <source>Wirel. Commun. Mobile Comput.</source> <volume>2019</volume>:<fpage>4807502</fpage>. <pub-id pub-id-type="doi">10.1155/2019/4807502</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>L.</given-names></name> <name><surname>Peng</surname> <given-names>Y.</given-names></name> <name><surname>Fan</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Huang</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>A novel kernel extreme learning machine model coupled with K-means clustering and firefly algorithm for estimating monthly reference evapotranspiration in parallel computation</article-title>. <source>Agric. Water Manage.</source> <volume>245</volume>:<fpage>106624</fpage>. <pub-id pub-id-type="doi">10.1016/j.agwat.2020.106624</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>W.</given-names></name> <name><surname>Huang</surname> <given-names>H.</given-names></name></person-group> (<year>2015</year>). <article-title>Research on DP-DB Scan clustering algorithm based on differential privacy protection</article-title>. <source>Comput. Eng. Sci.</source> <volume>37</volume>, <fpage>830</fpage>&#x02013;<lpage>834</lpage>.</citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Z.</given-names></name> <name><surname>Mao</surname> <given-names>L.</given-names></name> <name><surname>Yan</surname> <given-names>B.</given-names></name> <name><surname>Wang</surname> <given-names>J.</given-names></name> <name><surname>Gao</surname> <given-names>W.</given-names></name></person-group> (<year>2020</year>). <article-title>Performance analysis and prediction of asymmetric two-level priority polling system based on BP neural network</article-title>. <source>Appl. Soft Comput.</source> <volume>99</volume>:<fpage>106880</fpage>. <pub-id pub-id-type="doi">10.1016/j.asoc.2020.106880</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname> <given-names>X.</given-names></name> <name><surname>Lei</surname> <given-names>Y.</given-names></name> <name><surname>Yue</surname> <given-names>S.</given-names></name> <name><surname>Wang</surname> <given-names>R.</given-names></name></person-group> <article-title>Research on intuitionistic fuzzy kernel clustering algorithm based on particle swarm optimization</article-title>. <source>J. Commun</source>. (2015) <volume>36</volume>, <fpage>78</fpage>&#x02013;<lpage>84</lpage>.</citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zeng</surname> <given-names>P.</given-names></name> <name><surname>Sun</surname> <given-names>F.</given-names></name> <name><surname>Liu</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>G.</given-names></name> <name><surname>Che</surname> <given-names>Y.</given-names></name></person-group> (<year>2021</year>). <article-title>Mapping future droughts under global warming across China: a combined multi-timescale meteorological drought index and SOM-Kmeans approach</article-title>. <source>Weather Clim. Extrem.</source> <volume>31</volume>:<fpage>100304</fpage>. <pub-id pub-id-type="doi">10.1016/j.wace.2021.100304</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname> <given-names>H.</given-names></name> <name><surname>Yin</surname> <given-names>H.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Chai</surname> <given-names>Y.</given-names></name></person-group> (<year>2020</year>). <article-title>Multiview clustering via exclusive non-negative subspace learning and constraint propagation</article-title>. <source>Informat. Sci.</source> <volume>552</volume>:<fpage>102</fpage>&#x02013;<lpage>117</lpage>. <pub-id pub-id-type="doi">10.1016/j.ins.2020.11.037</pub-id></citation></ref>
</ref-list>
</back>
</article> 
