A Neighborhood Rough Set Attribute Reduction Method Based on Attribute Importance

Abstract

Attribute reduction is a hot topic in rough set research. As an extension of rough sets, neighborhood rough sets can effectively solve the problem of information loss after data discretization. However, traditional greedy-based neighborhood rough set attribute reduction algorithms have a high computational complexity and long processing time. In this paper, a novel attribute reduction algorithm based on attribute importance is proposed. By using conditional information, the attribute reduction problem in neighborhood rough sets is discussed, and the importance of attributes is measured by conditional information gain. The algorithm iteratively removes the attribute with the lowest importance, thus achieving the goal of attribute reduction. Six groups of UCI datasets are selected, and the proposed algorithm SAR is compared with L2-ELM, LapTELM, CTSVM, and TBSVM classifiers. The results demonstrate that SAR can effectively improve the time consumption and accuracy issues in attribute reduction.

Share and Cite:

Su, P. , Qin, F. and Li, F. (2023) A Neighborhood Rough Set Attribute Reduction Method Based on Attribute Importance. American Journal of Computational Mathematics, 13, 578-593. doi: 10.4236/ajcm.2023.134031.

1. Introduction

Rough set theory [1] is an analytical theory for dealing with uncertain information, but it has significant limitations as it is built on strict equivalence relations. Many scholars have conducted research and extensions, proposing probability rough sets [2] , variable precision rough sets [3] , neighborhood rough sets [4] , and more. Neighborhood rough set effectively addresses the issue of missing information and handling numerical data caused by discretizing data through the granulation of the domain based on neighborhood relations.

Attribute reduction [5] [6] , as an important problem in rough set theory, focuses on deleting redundant attributes while ensuring that the partitioning ability of the attribute set on the domain remains unchanged. Redundant attributes are crucial factors that affect the speed and accuracy of attribute reduction algorithms. Many scholars have conducted research on attribute reduction algorithms based on this foundation. For example, Yao Sheng [7] removes redundant attributes by calculating the rank correlation coefficient between conditional attributes. Liu Fang [8] introduces a tolerance relation matrix and uses the boundary region as inspiration to obtain the optimal attribute reduction set through matrix operations. Wei Bipeng [9] proposed an attribute reduction method for incomplete decision information tables based on -dominance relation. Jing Yunchao [10] presents an incremental matrix reduction algorithm that increases attributes and refines attribute values. Jing [11] designs an incremental attribute reduction method for dynamic data mining, considering changes in both instance objects and attribute sets. In the process of attribute reduction, attribute importance represents the degree of attribute classification change before and after removing certain attributes and has also received widespread attention. For instance, Wei [12] proposed attribute reduction based on granular computing to calculate attribute importance and facilitate attribute reduction. Hu [13] and others propose a forward greedy attribute reduction algorithm based on attribute importance. Liu [14] and others improved the traditional attribute reduction algorithm proposed by Hu et al. and put forward the FHARA algorithm. Entropy, as a measure of disorder and uncertainty, is used to calculate conditional information measures in literature [15] [16] .

The attribute reduction algorithm proposed in literature [12] is a greedy-based approach that consistently performs high-dimensional operations during attribute selection. It also involves repetitive calculations, resulting in a significant time consumption. Therefore, in this paper, conditional information entropy is used as a measure to calculate the importance of conditional attributes. The obtained importance values are then used to rank all the conditional attributes. Redundant attributes are subsequently removed until the final reduction result is obtained. The structure of the paper is as follows: the first section reviews some basic concepts of neighborhood rough sets and attribute reduction; the second section presents the neighborhood rough set attribute reduction method based on attribute importance; the third section compares the proposed method with L2-ELM, LapTELM, CTSVM, and TBSVM classifiers; and finally, the fourth section provides a summary.

2. Preliminary Knowledge

To facilitate the research work, this section introduces the basic knowledge of neighborhood rough sets, attribute reduction methods, and other related concepts.

2.1. Neighborhood Rough Sets

The classical rough set must be discretized before processing the actual sample data, in which the integrity of the data and the authenticity of the discretization results cannot be guaranteed. Neighborhood rough sets, as an extension of rough sets, can handle continuous or mixed-type data in real-life situations and effectively address the problem of information loss after data discretization, which is a common issue in classical rough set models.

Let S = { U , A T = A D , δ } be a neighborhood rough set decision system, where U is the set of all objects, A is the set of conditional attributes, and D is the set of decision attributes, with A D = .

Definition 1 [4] : Given a decision system S and a neighborhood radius δ, for x i U , its δ neighborhood is defined as:

δ ( x i ) = { x U | d ( x , x i ) δ } (1)

Here, δ 0 , d ( x , x i ) represents the distance between x and xi, and common distance calculation methods include Manhattan distance, Euclidean distance, and norm distance functions. In this paper, the Euclidean distance function is used to calculate d ( x , x i ) . The δ-neighborhood information granule, denoted as δ ( x i ) , δ-neighborhood granule, is denoted as xi neighborhood granule.

Property 1: Given a decision system S and a neighborhood radius δ, for any subset B A , of conditional attributes, if δ 1 < δ 2 ,then δ 1 ( x ) δ 2 ( x ) .

Proof: For any x i δ 1 ( x ) , we have d ( x , x i ) < δ 1 < δ 2 . If d ( x , x i ) < δ 1 , then δ 1 ( x ) δ 2 ( x ) .

Definition 2 [4] : Given a decision system S and the decision attribute set D, the decision class set { X 1 , X 2 , , X n } is obtained. Then, the lower and upper approximations of the decision attribute D with respect to the attribute subset B are defined as follows:

R _ B ( D ) = i = 1 n R _ B ( X i ) R ¯ B ( D ) = i = 1 n R ¯ B ( X i ) (2)

where:

R _ B ( X ) = { x i U | δ ( x i ) X i } R ¯ B ( X ) = { x i U | δ ( x i ) X i } (3)

The boundary of the decision system is defined as:

B N ( D ) = R ¯ B ( D ) R _ B ( D ) (4)

The positive region and negative region of the neighborhood decision system are defined as:

P o s B ( D ) = R _ B ( D ) , N e g B ( D ) = U R ¯ B ( X ) (5)

The dependency degree of decision attribute D on conditional attribute B is defined as:

γ B ( D ) = | P o s B ( D ) | | U | (6)

Property 2: Given a decision system S and a neighborhood radius δ, for any subset B A of conditional attributes, the following properties hold:

1) For any B 1 B 2 , R δ _ B 1 ( X ) R δ _ B 2 ( X ) ;

2) If δ 1 < δ 2 , then P o s B δ 2 P o s B δ 1 ;

3) For any B 1 B 2 A , have P o s B 1 ( D ) P o s B 2 ( D ) P o s B A ( D ) , then γ B 1 γ B 2 γ B A .

2.2. Attribute Reduction

Attribute reduction is a core problem in rough set theory research. Its core idea is to remove redundant attributes while maintaining the classification ability of the information system, thereby extracting key attributes, and simplifying the information system. Depending on the measurement method of attribute importance, commonly used measurement criteria for attribute reduction include attribute dependency-based and conditional information entropy-based attribute reduction.

Definition 3 [6] : Given a decision system D S = { U , A T , V , f } , for any conditional attribute subset B A and B , the indistinguishable relationship of attribute subset B on the domain U is defined as:

I N D ( B ) = { ( x , y ) U × U | a B , f ( x , a ) = f ( y , a ) } (7)

Here, x and y are any two samples on the domain U, and the indistinguishable relationship determined by the attribute subset can be seen as dividing the domain U into multiple partitions, denoted as U / I N D ( B ) .

Definition 4 [6] : Given a knowledge base K = ( U , R ) and an equivalence relation P R on the knowledge base, for any G P , if G satisfies the following conditions:

1) G is independent, meaning that every element in G is indispensable.

2) I N D ( D ) = I N D ( P ) , meaning that it does not affect the partition of the knowledge base.

Then G is called a reduction of P, denoted as G R e d ( P ) . Here, R e d ( P ) represents the set of all reductions of P. If G satisfies the following condition:

I N D ( P { G } ) I N D ( P ) (8)

Then G is called indispensable in P. The set of all indispensable knowledge sets in P is called the core of P, denoted as C o r e ( P ) .

Definition 5 [6] : Given a decision system D S = { U , A T , V , f } , where A T = A D , B A , if the conditional attribute subset B satisfies the following conditions:

1) γ B ( D ) = γ A ( D ) , meaning P o s B ( D ) P o s A ( D ) ;

2) For any a B , γ B ( D ) > γ B { a } ( D ) .

Then, attribute subset B is called a relative reduction of the conditional attribute set A. To represent attribute reduction based on dependency functions using information measures, Hu defined the notion of information droplets in decision systems.

Definition 6 [15] [16] : Given a decision system S, U = { x 1 , x 2 , , x n } , and B A , the conditional information entropy of D with respect to B is defined as:

H ( D | B ) = 1 n i = 1 n log | [ x i ] B [ x i ] D | | [ x i ] B | (9)

Here, [ x i ] B represents the fuzzy equivalence class, [ x i ] D represents the decision class. For a B , B is a dependency-based reduction if and only if H ( D | B ) = H ( D | A ) and H ( D | B ) > H ( D | B { a } ) .

3. Attribute Reduction Method for Neighborhood Rough Sets Based on Attribute Importance

As an effective measure of uncertainty, information entropy can be improved and optimized on the basis of combining with classical rough sets, which can deal with mixed data effectively. During the attribute reduction process based on attribute importance, the conditional information entropy is considered as a measure of attribute importance, which can effectively enhance the discriminative power of attributes. The larger the conditional information entropy, the higher the attribute importance in the neighborhood rough set, and vice versa.

Definition 7: Given a neighborhood decision system S, for B A , if H ( D | B ) = H ( D | A ) and H ( D | B ) > H ( D | B { a } ) , then the conditional attribute subset B is a reduction of the conditional attribute set A. B 1 , B 2 , , B n are all the possible reductions of A.

Definition 8: In a neighborhood rough set decision system S = { U , A T = A D , δ } , for B A , the conditional information entropy with respect to the conditional attribute B is defined as:

H ( B ) = 1 n i = 1 n log 1 δ B ( x i ) (10)

Definition 9: In the neighborhood rough set decision system S = { U , A T = A D , δ } , for any decision attribute set B, the process of selecting a subset B that has the same discernibility effect on objects as the attribute set AT is called attribute reduction. The conditional information entropy of D with respect to B is defined as follows:

H ( D | B ) = 1 n i = 1 n log δ B ( x i ) [ x i ] D δ B ( x i ) (11)

Definition 10: Given a neighborhood decision system S, for a B , the attribute importance of a with respect to B is defined as follows:

S i g ( a , B , D ) = H ( D | B ) H ( D | B { a } ) (12)

Conclusion 1: In a neighborhood rough set decision system S = { U , A T = A D , δ } , for any decision attribute sets B, if B 1 B 2 , then H ( D | B 1 ) H ( D | B 2 ) .

Proof: Since B 1 B 2 , then δ B 1 ( x i ) δ B 2 ( x i ) . It follows from Equation (11) that H ( D | B 1 ) H ( D | B 2 ) 0 , which implies H ( D | B 1 ) H ( D | B 2 ) .

Conclusion 2: In a neighborhood rough set decision system S = { U , A T = A D , δ } , for a B , the attribute importance of a with respect to B, denoted as S i g ( a , B , D ) , is higher when the conditional information entropy H ( D | B ) is larger. Conversely, when H ( D | B ) is smaller, the attribute a is considered less important.

Example 1: Given a neighborhood decision information system S = { U , A T = A D , δ } , you can see this table (Table 1), where U = { x 1 , x 2 , x 3 , x 4 , x 5 , x 6 } , A = { a 1 , a 2 , a 3 } , D = { d 1 , d 2 , d 3 } , δ a 1 = 0.134 , δ a 2 = 0.172 , δ a 3 = 0.154 . The neighborhood decision information table is as follows:

According to the decision attribute D, the domain U can be divided into D 1 = { x 1 , x 5 } , D 2 = { x 2 , x 4 } , and D 3 = { x 3 , x 6 } . The conditional information entropy is calculated as follows:

H ( D | B ) = 1 n i = 1 n log δ B ( x i ) [ x i ] D δ B ( x i ) = 3.2749 (13)

Then H ( D | B { a 1 } ) = 2.5874 , H ( D | B { a 2 } ) = 2.3854 , H ( D | B { a 3 } ) = 3.1028 . The formula S i g ( a , B , D ) = H ( D | B ) H ( D | B { a } ) has been used to determine the attribute importance S i g ( a 1 , B , D ) = 0.6875 , S i g ( a 2 , B , D ) = 0.8895 , S i g ( a 3 , B , D ) = 0.1721 of conditional attributes a 1 , a 2 , a 3 . Therefore, based on the attribute importance values, the order of importance is a 2 > a 1 > a 3 , and attribute a 3 can be considered as a redundant attribute in the reduction process.

The traditional neighborhood rough set attribute reduction algorithm uses greedy search strategy to select attributes, and adds new candidate condition attributes with cycles to produce new attribute subsets. However, this method requires many cycles and has high time complexity. To address neighborhood rough set attribute sets, the attribute reduction algorithm based on attribute importance is proposed, utilizing conditional information entropy as a measure of importance. The algorithm uses forward search and heuristic information, and the detailed description is as follows:

Table 1. Neighborhood decision information system.

Algorithm 1: Attribute Reduction Method for Neighborhood Rough Sets Based on Attribute Importance (SAR).

Input: Decision information system S = { U , A T = A D , δ } , redius δ.

Output: Reduction set B

Step 1: Set B = .

Step 2: Calculate the conditional information entropy H ( D ) | B for the entire neighborhood decision system.

Step 3: DO

1) Calculate the conditional information entropy H ( D | B { a i } ) for each conditional attribute a i ( i = 1 , 2 , , n ) ;

2) Calculate the attribute importance of a i , S i g ( a i , B , D ) = H ( D | B ) H ( D | B { a i } ) ;

3) Select the attribute with the highest attribute importance S i g ( a i , B , D ) = max ( a i ) ;

4) If S i g ( a i , B , D ) > S i g ( B , D ) , add a i to the reduction set B and go to Step 3. Otherwise, go to Step 4.

Step 4: Output the reduction result.

This paper utilizes conditional information entropy to calculate the attribute importance. In the reduction process, attribute importance is calculated only once. The algorithm improves efficiency and reduces computational complexity when dealing with datasets with a large number of attributes, while also enhancing the accuracy of reduction.

4. Experimental Analysis

To validate the effectiveness of the proposed SAR algorithm, 8 UCI datasets were selected for experimentation. You can see this table (Table 2). The datasets were downloaded from the UCI Machine Learning Repository. The experimental environment consisted of a computer running Windows 10, with an Intel Core i5-6300HQ CPU @ 2.30 GHz processor, and the MATLAB R2021a programming language.

A 10-fold cross-validation method was employed in the experiments. Ten different radii were selected: 0.02,0.04, ,0.2 . The data was divided into 10 subsets, with each subset used as the test set while the remaining 9 subsets were used as the training set in a rotating manner. The average classification accuracy of the 10 experiments was calculated as the performance metric for the attributes. The SAR algorithm proposed in this paper was compared with L2-ELM, LapTELM, CTSVM, and TBSVM classifiers in terms of classifying the test set samples.

Initially, 100 artificial datasets were divided into positive and negative classes, represented by + and *, respectively. To examine the impact of outliers on the classification performance, 6 outliers were introduced into the artificial datasets, with an equal distribution of 3 positive and 3 negative outliers; you can see this figure (Figure 1).

Table 2. Dataset description.

Figure 1. Distribution of artificial dataset with outliers.

4.1. Time Comparison for Attribute Reduction

During the experiment, the time consumption for attribute reduction was recorded for SAR, L2-ELM, LapTELM, CTSVM, and TBSVM classifiers. The comparison results are shown in the following table, you can see this table (Table 3).

4.2. Accuracy Comparison for Attribute Reduction

To demonstrate the effectiveness of SAR in terms of classification accuracy, SAR was compared with L2-ELM, LapTELM, CTSVM, and TBSVM classifiers on the test set samples. The experimental results are as follows, you can see this figure (Figure 2).

The experimental results show that SAR achieves higher classification accuracy than L2-ELM, LapTELM, CTSVM, and TBSVM classifiers for most radii. Therefore, SAR can effectively improve the classification accuracy of attribute reduction. To further validate the advantages of SAR, experiments were conducted on artificial datasets using the five algorithms. You can see this figure (Figure 3), the results are as follows.

Table 3. Time comparison for five algorithms’ attribute reduction.

(a)(b)(c)(d)(e)(f)

Figure 2. Accuracy comparison of five algorithms at different radii. (a) WDBC; (b) Vote; (c) QSAR; (d) Cancer; (e) Spectf; (f) Pima.

(a)(b)(c)(d)(e)

Figure 3. Accuracy comparison of five algorithms at different radii. (a) L2-ELM; (b) LapTELM; (c) CTSVM; (d) TBSVM; (e) S3AR.

From the above figure, it can be observed that SAR still exhibits good classification performance on artificial datasets with outliers. To better illustrate the stability of SAR in the classification process, experiments were conducted to analyze the variation rate of SAR, L2-ELM, LapTELM, CTSVM, and TBSVM classifiers at radii of δ = 0.1 and δ = 0.2 . You can see this figure (Figure 4), the results are as follows:

The above experiments provide a comparison of the classification accuracy and precision of the five algorithms on six datasets. Additionally, the variation rates at radii of δ = 0.1 and δ = 0.2 were compared across different datasets. The experimental results indicate that, under the same conditions of algorithm, radius, and metric, SAR performs as well as L2-ELM, LapTELM, CTSVM, and TBSVM classifiers in terms of classification performance.

(a)(b)(c)(d)(e)(f)

Figure 4. Comparison of the change rates of δ = 0.1 and δ = 0.2 under different UCI datasets. (a) WDBC; (b) Vote; (c) QSAR; (d) Cancer; (e) Spectf; (f) Pima.

5. Conclusion

This paper proposes a neighborhood rough set attribute reduction method based on attribute importance. It utilizes conditional information entropy to calculate attribute importance and obtain the final attribute reduction results. By comparing the time consumption and classification accuracy of SAR, L2-ELM, LapTELM, CTSVM, and TBSVM classifiers, the proposed SAR method shows good classification performance with lower time consumption. Compared with other algorithms, S3AR algorithm has a faster running speed, but the time cost is still very high. In the future, we will focus on how to optimize the algorithm to reduce the time cost.

Funding

This work was supported by the National Natural Science Foundation of China under grant numbers 11971210 and 12371459.

NOTES

*Corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Pawlak, Z. (1982) Rough Sets. International Journal of Computer and Information Sciences, 11, 341-356.
https://doi.org/10.1007/BF01001956
[2] Pawlak, Z., Wong, S.K.M. and Ziarko, W. (1988) Rough Sets: Probabilistic versus Deterministic Approach. International Journal of Man-Machine Studies, 29, 81-95.
https://doi.org/10.1016/S0020-7373(88)80032-4
[3] Ziarko, W. (1993) Variable Precision Rough Set Model. Journal of Computer and System Sciences, 46, 39-59.
https://doi.org/10.1016/0022-0000(93)90048-2
[4] Lin, T.Y. (1989) Neighborhood Systems and Approximation in Relational Databases and Knowledge Bases. Proceedings of the Fourth International Symposium on Methodologies of Intelligent Systems, Charlotte, February 1988, 75-86.
[5] Ju, H., et al. (2017) Cost-Sensitive Rough Set: A Multi-Granulation Approach. Knowledge-Based Systems, 123, 137-153.
https://doi.org/10.1016/j.knosys.2017.02.019
[6] Wang, G., Yao, Y. and Yu, H. (2009) A Survey on Rough Set Theory and Its Applications. Journal of Computer Science, 32, 1229-1246.
https://doi.org/10.3724/SP.J.1016.2009.01229
[7] Yao, S., Wang, J., Xu, F., et al. (2018) Uncertainty Measurement and Attribute Reduction Group of Inconsistent Neighborhood Rough Sets. Journal of Chinese Computer Systems, 39, 700-706.
[8] Liu, F. and Li, T. (2016) Attribute Reduction Method for Incomplete Information System Based on Boundary Region. Computer Science, 43, 242-245, 284.
[9] Wei, B., Lv, Y. and Li, J. (2014) Attribute Reduction of Rough Set Model under 4-Dominance Relation. Journal of Chinese Intelligent Systems, 9, 251-258.
[10] Jing, Y., Jing, L. and Wang, B. (2020) Incremental Attribute Reduction Algorithm Based on Attribute Value and Attribute Change. Journal of Shandong University (Natural Science), 55, 62-68.
[11] Jing, Y., Li, T., Fujita, H., et al. (2018) An Incremental Attribute Reduction Method for Dynamic Data Mining. Information Fusion, 465, 202-218.
https://doi.org/10.1016/j.ins.2018.07.001
[12] Wei, W., Wu, X., Liang, J., et al. (2018) Discernibility Matrix Based Incremental Attribute Reduction for Dynamic Data. Knowledge-Based Systems, 140, 142-157.
https://doi.org/10.1016/j.knosys.2017.10.033
[13] Hu, Q., Yu, D. and Xie, Z. (2008) Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation. Journal of Software, 19, 640-649.
https://doi.org/10.3724/SP.J.1001.2008.00640
[14] Liu, Y., et al. (2014) Quick Attribute Reduction Algorithm for Neighborhood Rough Set Model. Information Sciences, 271, 65-81.
https://doi.org/10.1016/j.ins.2013.08.022
[15] Hu, Q., Yu, D., Xie, Z., et al. (2006) Fuzzy Probabilistic Approximations Spaces and Their Information Measures. IEEE Transactions on Fuzzy Systems, 14, 191-201.
https://doi.org/10.1109/TFUZZ.2005.864086
[16] Hu, Q., Yu, D. and Xie, Z. (2006) Information-Preserving Hybrid Data Reduction Based on Fuzzy Rough Techniques. Pattern Recognition Letters, 27, 414-423.
https://doi.org/10.1016/j.patrec.2005.09.004

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.