ORIGINAL RESEARCH article

Front. Phys., 28 July 2023

Sec. Optics and Photonics

Volume 11 - 2023 | https://doi.org/10.3389/fphy.2023.1240555

Sensor data reduction with novel local neighborhood information granularity and rough set approach

  • 1. School of Information Science and Technology, Nantong University, Nantong, China

  • 2. Department of Respiratory Medicine, The Sixth People’s Hospital of Nantong, Affiliated Nantong Hospital of Shanghai University, Nantong, China

  • 3. Jiangsu Vocational College of Business, Nantong, China

  • 4. School of Transportation and Civil Engineering, Nantong University, Nantong, China

Article metrics

View details

1,4k

Views

441

Downloads

Abstract

Data description and data reduction are important issues in sensors data acquisition and rough sets based models can be applied in sensors data acquisition. Data description by rough set theory relies on information granularity, approximation methods and attribute reduction. The distribution of actual data is complex and changeable. The current model lacks the ability to distinguish different data areas leading to decision-making errors. Based on the above, this paper proposes a neighborhood decision rough set based on justifiable granularity. Firstly, the rough affiliation of the data points in different cases is given separately according to the samples in the neighborhood. Secondly, the original labels are rectified using pseudo-labels obtained from the label noise data that has been found. The new judgment criteria are proposed based on justifiable granularity, and the optimal neighborhood radius is optimized by the particle swarm algorithm. Finally, attribute reduction is performed on the basis of risky decision cost. Complex data can be effectively handled by the method, as evidenced by the experimental results.

1 Introduction

In sensor data processing systems, researchers are often confronted with large amounts of multimodal and complex sensing data. To deal with these sensing data, data description and data reduction are pivotal process. For data acquisition, rough sets based models are considered as effective approaches in recent years [1, 2]. Rough set theory [3] was proposed in 1982 by Pawlak as a mathematical tool for analyzing and handling imprecise, inconsistent, and incomplete information. Traditional rough set theory lacks fault tolerance and does not take errors in the classification process into account at all. Pawlak et al. proposed the probabilistic rough set model to improve rough set theory using probabilistic threshold [4]. A probabilistic rough set model has been introduced to Bayesian decision theory by [5]. Further, Yao proposed a three-way decision theory on the basis of decision rough set theory [6].

Currently, many scholars have been improving the research on decision rough sets from different aspects. [7] proposed the theoretical framework of local rough set. [8] proposed local neighborhood rough set, which integrated the neighborhood rough set and local rough set. [9] combined Lebesgue and entropy measure, and proposed a novel attribute reduction approach. [10] introduced the pseudo-label into rough set, and proposed a pseudo-label neighborhood relationship, which can distinguish samples by distance measure and pseudo-labels.

As mentioned above, scholars proposed equivalent modifications to the neighborhood decision rough set approach from multiple perspectives. However, for complex sensor data processing, neighborhood decision rough set methods still face some challenges. For example, in practical applications, complex data distribution is often uneven. In addition, the presence of abnormal data can also greatly weaken the performance of rough models and cannot correctly classify abnormal data points. For the issues mentioned above, this paper proposes a local strategy to improve the calculation process of rough membership. Additionally, the neighborhood of sample is optimized by the particle swarm optimization method (PSO algorithm) to offer the optimal neighborhood granularity for the model and carry out attribute reduction.

The remainder of this paper is structured as follows. Section 2 introduces the relevant basic theories. Section 3 presents a decision rough calculation method based on justifiable granularity. Six datasets are chosen in Section 4 to evaluate the suggested methodology. Section 5 summarizes the full text.

2 Preliminary notion

2.1 Neighborhood relation and rough set

The construction of equivalence relations for numerical type data first requires the discretization of the original data, and this method will inevitably cause the loss of information. On the basis of neighborhood relations, a neighborhood rough set model was proposed by Hu et al. [11–13].

Assume that information system is expressed as .Among them, U = {x1, x2, … , xn} represents a collection of non-empty limited objects, AT stands for the set of attributes, containing conditional attribute set C and decision attribute set D.

Definition 1Suppose the information system is , , , the -neighborhood of x in B is defined as:where dis (•) represents the distance between any objects, using Euclidean distance commonly.

Definition 2Suppose the information system is , , , , the rough affiliation of x to X in B is defined as:where represents the conditional probability of classification, and represents the number of elements in the combination.

Definition 3Suppose the information system is , the lower and upper approximations of the decision D in B are defined as:The following definitions apply to the positive, negative, and boundary regions of X in B:From the above definition, it can be found that the conditions on which the neighborhood rough set is based in taking both acceptance and rejection decisions are too severe and lack a certain degree of fault tolerance. Only elements that are completely correctly classified are grouped into the positive domain. Alternatively, only elements that are completely misclassified are classified in the negative domain. The result of such a definition makes the boundary domain too large.

2.2 Rough set with neighborhood decision

The rough set model for decision-making put forth by Yao et al. [5] lacks the ability to directly process numerical data. In order to address this weakness, a rough set model of decision theory based on neighborhood was proposed by Li et al. [14] through the integration of the neighborhood rough set and the decision rough set.

The decision rough set has two important elements: and . When different decision-making actions are taken, different losses will occur. , , respectively represent the cost of and when X owns the object, , , respectively represent the cost of and when X is not the owner of the object. Through cost risk analysis, the solution formula of is given [5] as follows:

In addition, Yao proposed three decision theories based on decision rough set model [5], including P rule, N rule and B rule.

Definition 4Suppose the information system , , , then the P, B, and N rules of X on -neighborhood under attribute set B are defined as:P rule: if , , then ;B rule: if , , then ;N rule: if , , then .

3 Neighborhood decision rough set model based on justifiable granularity

To solve the problems discussed above, this article first introduces the local neighborhood rough set model to eliminate the interference of some noise data on the approximate set.

3.1 Local rough neighborhood decision model

Definition 5Suppose the information system , , , then the X of the attribute set B is related to the upper and lower approximation sets of the -neighborhood based local rough set, which are defined as:The following definitions apply to the positive, negative, and boundary regions of X in B:The most significant difference between the local neighborhood rough set model and the neighborhood rough set model is the different search scope when finding the upper and lower approximation sets. In the neighborhood rough set model, finding the approximation set for each decision category requires traversing all the data points in the data set. However, in the local neighborhood rough set model, the focus is on the data points of the same category, and only the data points of the same decision category need to be traversed. This greatly reduces the computational effort and increases the computational speed [14]. This model not only improves computational efficiency, but also eliminates the interference of noisy points.In addition, the traditional method of calculating rough affiliation does not take into account the complexity of the data. In this paper, the calculation process of affiliation degree is improved for the affiliation degree, and the process is as follows:

Suppose

is a decision system,

is the decision attribute of all objects

U

in the decision attribute set

D

,

, the neighborhood of

x

is expressed as

, the decision value of the information system is

. Now suppose that the decision value of the sample

x

to be investigated is

q

.

  • (1) (N represents a small positive integer), this paper sets rough membership degree to .

  • (2) , , and , this paper sets rough membership degree to , where represents the initial probability value and N represents the minimum number of neighborhoods, s represents the search step.

  • (3) , , and , rough membership degree is set to .

Depending on which of the data points in the neighborhood information granularity are specifically situated, above rules is used to define the rough membership function for each category of data points.Based on the above discussion, this paper designs the following Algorithm 1 to calculate the upper and lower approximation sets and identify the anomalous data. Different from the classical method that only considers the upper and lower approximation sets, Algorithm 1 not only identifies label noise data points and outlier data points based on the neighborhood information, making the upper and lower approximation sets more accurate. It also appends category information to the label noise data, which is referred to as pseudo-tagging in this paper.

Algorithm 1

  • Input: , neighborhood radius , cost matrix .

  • Output: lower approximate , upper approximate , outlier points set , labeled noise points set , and predicted pseudo-labels set .

  • 1:  Segmentation of the entire dataset by tag categories .

  • 2:  Using the cost matrix, the threshold value and are calculated according to Eqs 8, 9.

  • 3:  For

  • 4:   Compute the -neighborhood of on the conditional attribute set C and obtain the label category .

  • 5:  end

  • 6:  If

  • 7:   .

  • 8:   .

  • 9:  End

  • 10:  If &

  • 11:   .

  • 12:   .

  • 13:   

  • 14:  End

  • 15:  If

  • 16:   .

  • 17:  End

  • 18:  If

  • 19:   .

  • 20:   If

  • 21:    .

  • 22:   End

  • 23:  End

  • 24:  Return , , , , .

.

Algorithm 1 detects outliers and labeled noisy points, as well as enables the detection of data points for high-density areas. In fact, some samples are not always considered as outlier data or noise, and their decisions sometimes depend on the choice of neighborhood radius.

3.2 Selection of neighborhood information granularity based on justifiable granularity

According to the above-mentioned rough set model, a smaller neighborhood radius contains very little information, while a larger radius may cause the next approximate set to be an empty set. This paper introduces the justifiable granularity criterion [15, 16]. There are generally two functions in the construction of information granules, namely, covering function and particularity function.

The coverage function describes how much data is in the constructed information granule. This paper designs the coverage index function as shown below:where and (|POS()|-|BND()|).

The coverage index function mentioned above is considered from two perspectives, namely, neighborhood information granularity and approximate set. In terms of specificity criteria, the smaller the neighborhood radius, the better. Therefore, the specificity function can be designed as: .

Obviously, the two are contradictory. Therefore, the function for optimized performance can be written as the multiplication of specificity and coverage, which is: .

In this way, the optimal neighborhood about can be obtained. To further elaborate, the cumulative behavior can be represented in terms of the decision partition set as follows: , where , ,…, correspond to the optimized value of each decision class.

To achieve the optimal value and the corresponding optimal neighborhood radius. In this paper, PSO algorithm is used for optimization [17, 18], which is an evolutionary algorithm based on population intelligence, proposed by Drs. Kennedy and Eberhart in 1995. In this paper, we use the PSO algorithm to intelligently optimize the selection of neighborhoods and select the appropriate granularity as a way to improve the accuracy of decision making.

Moreover, to update the dataset, one can utilize the noise identification strategy along with the set of predicted pseudo-decision labels. The main steps are described in Algorithm 2.

Algorithm 2

  •  1: Obtain the optimal neighborhood radius using PSO optimization algorithm;

  •  2: Execute Algorithm 1 to obtain the approximation set, the set of outlier points, the set of labeled noise points, and the pseudo-tags of labeled noise points;

  •  3: Updating decision labels for noisy data based on pseudo-labels;

  •  4: Update the approximation set using the modified decision system.

.

3.3 Attribute reduction based on neighborhood decision rough set model

In this paper the risky decision cost will be used to reduce the attributes. It comes from the Bayesian decision process, which is comparable to the classical rough set. Risky decision costs for P, N and B rule can be separately expressed as:

As discussed above, the cost of making a risky decision for all decision rules can be obtained as:

Obviously, the higher , the greater the significance of the attribute becomes evident.

Definition 6Suppose the information system , , , the significance of an attribute is defined as:A scheme based on neighborhood decision rough sets is designed for forward search to achieve the optimal reduction. Its specific steps are shown in Algorithm 3.

Algorithm 3

  •  1:  RED = .

  •  2:  For

  •  3:   Calculate .

  •  4:  End

  •  5:  Select which satisfies .

  •  6:  If

  •  7:   .

  •  8:  Else

  •  9:   Break.

  •  10:  End

  •  11:  Return RED

.

3.4 Evaluation index

To assess the effectiveness of the suggested approach, this article discusses the following two evaluation indicators: the lower approximation and information granularity.

Approximation quality (AQ): Given decision information system , , the approximate quality of A relative to D [19] is defined as:

The value is expressed as the ratio of the number of objects correctly classified by the conditional attribute set A to the number of all objects in the decision information system. The performance of the proposed granularity description is evaluated in terms of the lower approximation.

Neighborhood number(NN): , suppose . is the set of data points with decision label q in the neighborhood of x. Therefore, the categories of similar decision label data and different data in the neighborhood can be described as:

The larger value of indicates that the information granularity provides greater information value to the decision maker and more reasonable granularity.

4 Experiment analysis

In this section, six UCI datasets are utilized to illustrate the feasibility and validity of the suggested methodology. Table 1 describes the relevant information of the datasets.

TABLE 1

No.DatasetsSampleAttributeClass
1Banknote Authentication137252
2Cardiotocography21262310
3Glass Identification214107
4Ionosphere351342
5Sonar208602
6WDBC569312

Dataset description.

Parameter setting of PSO algorithm, initialize the particle swarm size to 300, a maximum of 100 iterations is permitted, the individual experience learning factor , the social experience learning factor , the top flight speed of the particle is 0.5 and the allowable error is set to 0.1. For the purpose of assessing the effectiveness of the inertia weight w, consider the use of a linear differential decreasing inertia weight [20], which is expressed as:where represents the initial inertia weight, represents the inertia weight when the iteration reaches the maximum number, k represents the current iteration number, and is the maximum iteration number. Set and .

Figure 1 show the performance of and NN respectively. The neighborhood decision rough set model based on reasonable granularity proposed in this paper is abbreviated as JGNDTRS, and NDTRS stands for traditional neighborhood decision rough set. Various noise ratios are represented on the x-axis of each subfigure, which corresponds to a dataset. It can be seen intuitively from the figure that as the noise ratio increases, the approximate quality and NN of NDTRS both show a downward trend. Regarding various noise ratios, the JGNDTRS can obtain the best and relatively stable values of and NN in all datasets. Furthermore, JGNDTRS has remarkable performance in identifying anomalous data such as high-density and sparse-density region data points as well as label noise points.

FIGURE 1

Figure 2 shows the comparison of the cost of JGNDTRS and NDTRS when performing attribute reduction. A dataset is represented by each subplot, and various Universe sizes are shown on the x-axis. Through closer observation, we can conclude that the decision cost of both JGNDTRS and NDTRS shows a decreasing trend as the size of Universe increases. In each dataset, the decision cost of JGNDTRS is always lower than that of NDTRS, regardless of the value of the Universe size. This indicates that JGNDTRS has a superior performance with less cost used in performing attribute reduction.

FIGURE 2

5 Conclusion

The proposed neighborhood decision rough set model compensates the lack of fault tolerance of classical rough sets. However, there are some challenges in the existing models when dealing with complex data. In this paper, we propose a neighborhood decision rough set model based on justifiable granularity. Firstly, the calculation of rough affiliation is improved according to the number of data points in the neighborhood and the corresponding decision label categories. Secondly, to rectify the original labels, provide pseudo-labels for the noisy data points that are found. A justifiable granularity criterion is introduced and the optimal neighborhood radius is obtained by PSO algorithm. Finally, the risky decision cost is used for attribute reduction. The results of the experiments demonstrate that the neighborhood decision rough set model based on justifiable granularity has significant performance in identifying abnormal data points and can enhance classification performance. In the future work, the attribute reduction of the neighborhood decision rough set based on justifiable granularity will be further investigated.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author contributions

The idea was proposed by PG and HJ; XF and XM simulated the algorithm, wrote the paper and polish the English, TC and YS analysed the data designed the experiments. All authors contributed to the article and approved the submitted version.

Funding

This work was supported the National Natural Science Foundation of China under Grant 62006128, Jiangsu Innovation and Entrepreneurship Program.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1.

    Liu J, Lin Y, Du J, Zhang H, Chen Z, Zhang J. Asfs: A novel streaming feature selection for multi-label data based on neighborhood rough set. Appl Intell (2023) 53:1707–24. 10.1007/s10489-022-03366-x

  • 2.

    WangWGuoMHanTNingS. A novel feature selection method considering feature interaction in neighborhood rough set. Intell Data Anal (2023) 27:345–59. 10.3233/IDA-216447

  • 3.

    PawlakZ. Rough sets. Int J Parallel Program (1982) 11:341–56. 10.1007/BF01001956

  • 4.

    PawlakZWongSZiarkoW. Rough sets: Probabilistic versus deterministic approach. Int J Man Mach Stud (1988) 29:81–95. 10.1016/S0020-7373(88)80032-4

  • 5.

    YaoYWongS. A decision theoretic framework for approximating concepts. Int J Man Mach Stud (1992) 37:793–809. 10.1016/0020-7373(92)90069-W

  • 6.

    YaoY. Three-way decisions with probabilistic rough sets. Inf Sci (2010) 180:341–53. 10.1016/j.ins.2009.09.021

  • 7.

    QianYLiangXWangQLiangJLiuBSkowronAet alLocal rough set: A solution to rough data analysis in big data. Int J Approx Reason (2018) 97:38–63. 10.1016/j.ijar.2018.01.008

  • 8.

    WangQQianYLiangXGuoQLiangJ. Local neighborhood rough set. Knowl Based Syst (2018) 153:53–64. 10.1016/j.knosys.2018.04.023

  • 9.

    SunLWangLDingWQianYXuJ. Neighborhood multi-granulation rough sets-based attribute reduction using lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst (2020) 192:105373. 10.1016/j.knosys.2019.105373

  • 10.

    YangXLiangSYuHGaoSQianY. Pseudo-label neighborhood rough set: Measures and attribute reductions. Int J Approx Reason (2019) 105:112–29. 10.1016/j.ijar.2018.11.010

  • 11.

    HuQLiuJYuD. Mixed feature selection based on granulation and approximation. Knowl Based Syst (2008) 21:294–304. 10.1016/j.knosys.2007.07.001

  • 12.

    HuQYuDLiuJWuC. Neighborhood rough set based heterogeneous feature subset selection. Inf Sci (2008) 178:3577–94. 10.1016/j.ins.2008.05.024

  • 13.

    LinYHuQLiuJChenJDuanJ. Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput (2016) 38:244–56. 10.1016/j.asoc.2015.10.009

  • 14.

    LiWHuangZJiaXCaiX. Neighborhood based decision-theoretic rough set models. Int J Approx Reason (2016) 69:1–17. 10.1016/j.ijar.2015.11.005

  • 15.

    PedryczWHomendaW. Building the fundamentals of granular computing: A principle of justifiable granularity. Appl Soft Comput (2013) 13:4209–18. 10.1016/j.asoc.2013.06.017

  • 16.

    WangDLiuHPedryczWSongWLiH. Design Gaussian information granule based on the principle of justifiable granularity: A multi-dimensional perspective. Expert Syst Appl (2022) 197:116763. 10.1016/j.eswa.2022.116763

  • 17.

    CuiYMengXQiaoJ. A multi-objective particle swarm optimization algorithm based on two-archive mechanism. Appl Soft Comput (2022) 119:108532. 10.1016/j.asoc.2022.108532

  • 18.

    DengHLiuLFangJYanL. The application of SOFNN based on PSO-ILM algorithm in nonlinear system modeling. Appl Intell (2023) 53:8927–40. 10.1007/s10489-022-03879-5

  • 19.

    HuXCerconeN. Learning in relational databases: A rough set approach. Comput Intell (1995) 11:323–38. 10.1111/j.1467-8640.1995.tb00035.x

  • 20.

    SalgotraRSinghUSinghSMittalN. A hybridized multi-algorithm strategy for engineering optimization problems. Knowl Based Syst (2021) 217:106790. 10.1016/j.knosys.2021.106790

Summary

Keywords

justifiable granularity, sensor data, local neighborhood decision rough set model, attribute reduction, granular computing

Citation

Fan X, Mao X, Cai T, Sun Y, Gu P and Ju H (2023) Sensor data reduction with novel local neighborhood information granularity and rough set approach. Front. Phys. 11:1240555. doi: 10.3389/fphy.2023.1240555

Received

15 June 2023

Accepted

10 July 2023

Published

28 July 2023

Volume

11 - 2023

Edited by

Xukun Yin, Xidian University, China

Reviewed by

Jing Ba, Jiangsu University of Science and Technology, China

Ke Lu, Nanjing University of Information Science and Technology, China

Heng Du, Nanjing Institute of Technology (NJIT), China

Updates

Copyright

*Correspondence: Xiaojuan Mao, ; Pingping Gu, ; Hengrong Ju,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics