TITLE:
Optimization of Data Structure in Classification and Clustering Problems
AUTHORS:
Vladimir N. Shats
KEYWORDS:
Feature Vector Ordering, Functional Dependencies of Features and Classes, Objects Closeness Concept
JOURNAL NAME:
Journal of Intelligent Learning Systems and Applications,
Vol.17 No.3,
July
4,
2025
ABSTRACT: The paper is devoted to the optimization of data structure in classification and clustering problems by mapping the original data onto a set of ordered feature vectors. When ordering, the elements of each feature vector receive new numbers such that their values are arranged in non-decreasing order. For update structure, the main volume of computational operations is performed not on multidimensional quantities describing objects, but on one-dimensional ones, which are the values of objects individual features. Then, instead of a rather complex existing algorithm, the same simplest algorithm is repeatedly used. Transition from original to ordered data leads to a decrease in the entropy of data distribution, which allows us to reveal their properties. It was shown that the classes differ in the functions of feature values for ordered object numbers. The set of these functions displays the information contained in the training sample and allows one to calculate class of any object in the test sample by values of its features using the simplest total probability formula. The paper also discusses the issues of using ordered data matrix to solve problems of partitioning a set into clusters of objects that have common properties.