1. Introduction
In recent years, researchers from various areas of vision have focused their efforts on developing algorithms that are capable of imitating human vision. To do this, they developed artificial vision systems in which cameras seek to recognize patterns [1], monitor [2], analyze trajectories [3], or infer motion [4]. Among these activities, shape descriptors play an important role in describing the topology of an object and determining its general and local characteristics.
A shape descriptor is an effective tool for recognition [5], classification [6] or identification [7], because the shape is inherent in the understanding of an image, which remains stable despite changes in the lighting, color, or texture of an object. Due to these advantages, shape features have been widely used in various methods, such as: polygon approximation [8], spatial correlation features [9], moments [10], spatial scaling methods [11] and shape transformations [12]. The idea of robust shape descriptors is still a hard problem, and the challenges that make shape recognition difficult are due to transformations such as rotation, scaling, deformation caused by noise or occlusion [13], or undesirable effects caused by shadows [14].
Mainly, global descriptors can only identify shapes that differ greatly due to their simple properties. However, local descriptors focus more on aspects of the form to use them as discriminants. A shape can be described by different aspects such as the center of gravity [15], eccentricity [16], circularity ratio [17], elliptical variance [18] - [22], rectangularity [23], or area ratio [24], to name a few. In general, descriptors attempt to quantify a given shape using a set of numbers due to their characteristic aspects, so a good robust descriptor must satisfy the following requirements:
1) Preservation of similarity between the same types of forms.
2) Invariance to similar transformations, such as translation, rotation, and scaling; and related transformations.
3) Noise resistance.
4) Concealment invariance.
5) Statistical independence (compactness).
6) Reliability of the extracted features.
The techniques related to shape representation are broad and are classified into contour-based methods [25] and region-based methods [14]. Contour-based methods are subject to the detection of features found in the shape’s contour, these can refer to points, edges, and lines [26]. Whereas, region-based methods obtain a set of feature vectors of the regions occupied by objects [27]. In this paper, the analysis of shape curvature is presented, where non-smooth curves represent feature elements used to train a multi-perceptron neural network to classify the shape of a scene. This paper is organized as follows, including this section. Section 2 describes the related work. Section 3 shows the features extraction. The experiments and results obtained by this method are shown in Section 4. Finally, the conclusions of the methodology are exposed in Section 5.
2. Related Work
The general process of the traditional object recognition algorithm is as follows: to evaluate the shape of the object, first, the descriptors of the characteristics of the shape are extracted. Then, a matching algorithm is applied to obtain the distance between the descriptors. Finally, the form recognition classification is completed through classification and metric learning. During this process, the descriptor’s ability to distinguish shape directly affects the recognition result; thus, most of the work of shape recognition has concentrated on descriptors.
An example is the skeleton, which is a structure that describes the topology and geometry of the shape of an object, such representation, and also allow a simpler and computationally effective way to analyze the properties of the shape, the techniques currently for taxonomy analysis, they have properties such as: Homotopy, Invariance and Smoothing [12], that is, the object can be represented as a continuous function, which is invariant at scale; Reconstruction and Details [21], the object can be reconstructed from its structure, finding the main details of the object, reducing the amount of processing required to analyze it.
In this type of techniques, [1] obtains the taxonomy of the object through an iterative process to label the neighboring pixels of the border that is formed between the foreground and the background of the image, which are removed and re-labeled to generate a new border until thinning to one pixel. Similarly, an algorithm called ZS was developed in [5], in which they combined the subdivision method of the image into two sets of odd and even coordinates. At the same time [19], a mean axis-based parallel thinning algorithm was developed for contour extraction, from which they suppressed external constraints until a thinning of 1 pixel width was achieved. For their part, in [21] was proposed an improved slimming algorithm for detecting moving objects, confirming the speed of the algorithm evaluated in their study.
Although skeletons provide an alternative to reduce processing complexity and work directly with the object’s topology, the methods are not highly regarded because of the sensitivity to detail, and they can modify the object’s taxonomy, so other techniques seek locate characteristic elements in the structure of the stage, that define the objects present under conditions of variation and that explicitly represent the scales at which the different events occur. The complexity of these techniques is due to the fact that the changing conditions of spatial-scale image representations are caused by illumination changes, shadows, inherent noise in the camera, and scale changes. In [22] extracts features of large structures using a family of smoothed signals, representing data at multiple scales, by solving a diffusion equation defined as
, where L is the convolution between an image and a Gaussian operator, in order to construct a representation of invariant space at the scale of the image structure, on the other hand, in method [18] analyze the correlation between typical conditions on stage and the entropy patterns of a sequence of images, to provide an abstraction of a space-scale blob with a gray-level blob at a single scale level, that is, such an abstraction involves a projection from a Four-dimensional scale-space blob to a three-dimensional gray level blob and a projection of the three-dimensional blob to its two-dimensional support region.
In recent years, many methods have attempted to find an optimal one-to-one correspondence between two dense point shape boundaries, and from there compute the measure of dissimilarity. Among these methods, Shape context [13] is a classical method that is very popular among researchers. The shape context constructs a histogram for each contour point to describe the distribution of the remaining points relative to that point, and these histograms are used to find point-to-point correspondences. The well-known shape description method proposed in [14] uses inner distance instead of Euclidean distance to construct the context of the shape, where the inner distance is the shortest path between two contour points within the constraints of the shape. They report a recognition rate of 94.13% in the Swedish Leaf dataset, which is higher than the 88.12% obtained by the contextual form [13], owing to IDSC’s high classification accuracy. [6] has successfully applied it to develop a computer vision system to aid in the identification of plant species. The height functions [20] are another description of the shape and a matching method for optimal one-to-one matching. In this method, for each contour point of the object, a height function is constructed based on the distance of other contour points to its tangent. After smoothing the height function, a compact and robust shape descriptor is available for each contour point. In the shape matching stage, a dynamic programming algorithm is used to find the optimal match between the two shapes. The height function method has excellent discriminative power.
There are other techniques for shape analysis that focus on singular value decomposition (SVD) for object recognition, especially in people and cars, which was developed by [26] and has been extensively studied. Some of the applications developed have been in the area of image measurement and compression, facial recognition, blurring, or image enhancement, to name a few. Therefore, despite a lot of research in the field of image processing, very little has been done on the use of singular values for the identification.
3. Feature Extraction
Consider a 2D binary image, which is represented by I(x) with dimensions m x n, where x = [xm, yn] which represents the position of a particular pixel, then a uniform region A is a collection of pixels with values [0, 1] and which defines a binary object (blob) in picture I. Since it is not considered overlap in the forms A as in (1):
(1)
Now it is possible to determine the edge of region A as in (2)
(2)
The operation ? is defined as
such that
, where
for all
and
.
Now consider Figure 1 in which the radius of curvature and the center of curvature C of a segment of the curve in Figure 2 is shown, the vector Ti represents the tangential vector to the curve s, while Ti+1 is the displaced tangential vector, while C is the center of curvature.
The distance between Ti and the center of curvature C, is the radius of curvature d, which is defined as in (3).
(3)
Figure 2. Determination of characteristic points of the shape. Source: own elaboration.
where
(4)
(5)
So, the Equation (3) becomes as in (6).
(6)
The radius of curvature is defined as in (7).
(7)
Figure 2 shows a curve s in R2, where
, since the curvature of a curve is always zero on a line or segments of a line, a local maximum or minimum is a feature of the curvature of the shape.
To determine a candidate for a characteristic, three points (a, b and c) of the curve s are selected through which a circumference (C) is drawn, the closer these points are to P the radius of curvature (r) will be smaller, and the curvature will be larger and larger, this procedure is carried out until reaching the convergence of C in P.
Because some edge contours may vary from the actual shape, a threshold angle α is used. If the estimated angle
. If β is greater than the angle threshold α, then the candidate features are removed from the feature set. For the calculation of the angle β, it is done using the tangents aT1 and aT2 on both sides of the feature P.
If the three points are collinear and the radius of curvature is large, the feature will be discarded. The proposed method can be determined by the following algorithm:
The proposed method can be determined by the following algorithm:
1) Extract the outline of the shape by erosion or binary dilation or any method for extracting edges.
2) Determine the limit circumference or osculating circumference to find the curvature of each pixel of the shape.
3) Calculate the radius of curvature r to determine the candidate features.
4) Calculate the angle β at each candidate feature obtained in step 3 and compare it to the angle threshold α to eliminate possible false candidates.
5) A vector resulting from the non-removed candidate features is constructed, which is used to quantify the shape (see Figure 3).
The feature vector is shown in Figure 3(a), it can see the shape and the superimposed features found, which correspond to the feature vector shown in Figure 3(b), the calculation of the feature vector serves to quantify the shape and that this compared with other types of forms, while Figure 4 shows the characteristics found in comparison with the form signature.
The quantification of the vectors is presented in n dimensions (
) and by means of the k-means algorithm these characteristic vectors are associated with k groups where the distance is minimized between the classes
its centroid.
(8)
To do this grouping, perform the following steps:
(a) (b)
Figure 3. Feature vector. Source: own elaboration.
Figure 4. Feature comparison with the signature shape. Source: own elaboration.
● Initialize k groups and set k random centroids.
● Assign each datum to the group with the closest centroid.
● Calculate the new centroid for each group, taking the position of the mean of the data belonging to the k groups as the new centroid.
Repeat the steps of assigning data to groups and computing new centroids until convergence is reached.
4. Experiments and Results
In this section the experiments and performance of the descriptor are shown, first the results are shown before scale and rotation tests, followed by the evaluation with respect to other descriptors for which 5 sets of 20 images were evaluated, using MPEG7 CE-Shape-1 data sets (25). The MPEG7 CE-Shape data set consists of 1400 images of 70 classes, with each class containing 20 shapes. Examples of object shapes from the MPEG7 Part B dataset are shown in Figure 5.
Repeatability tests are shown in Figure 6, of which a subset of 20 images was taken, the tests include image scaling every 20%, rotations from 0˚ to 180˚, in the test results the vector of features, from which the features obtained decrease as the scale decreases, however, even though the algorithm is not designed for a space-scale, most of the features detected prevail, as an improvement is possible to consider the scale-space to reduce the error generated.
Figure 7 shows the tests carried out to evaluate the detection performance based on the repeatability of the characteristics belonging to the shape. Different sets of images have been used to assess the average repeatability, including “Camel-1”, “Beetle-1”, “Deer-1” and “Elephant-1”. The transformed images were implemented by applying the following six different types of transformations on each original image:
Figure 5. Example of objects selected from each class from the MPEG7 Part B dataset [25].
Figure 6. Rotation and scaling of the form. Source: own elaboration.
Experimental results showed that the proposed descriptor has strong discriminability (see Table 1). In this section, the parameters involved in the algorithm are α = 165, Bx = 3.
Figure 7. Repeatability of characteristic elements in different sets. Source: own elaboration.
Table 1. Quantified values of shapes.
Table 2. Scores of different descriptors on the MPEG-7.
The BER analysis rates of different forms representation techniques found in the literature are related to the proposed method and they are shown in Table 2. The proposed descriptor is among the first with the best percentage of classification of the approaches.
5. Conclusion
This paper proposes a method to distinguish the shapes of different classes of objects and compare them with similar objects. The results show that the accuracy of feature similarity is 85.4%, but the method lacks shape recovery. The main weakness of this algorithm is the error in describing the shape that increases as the image is scaled. The reason for this error is that some objects have few features, making it difficult to compare shapes. To solve this problem, spatial scale analysis should be considered to obtain eigenvectors.
Acknowledgements
This work has been partially carried out thanks to the support of the scholarship granted by Consejo Nacional de Ciencia y Tecnología (CONACyT).