Color Thresholding, Detection and Recognition of the Road Signs Using the Information Set Theory ()
1. Introduction
Road sign detection is not only a very important functionality of an Advanced Driver Assistance System (ADAS), but also a very challenging task. The various challenges faced by the road sign detection methods are due to variations in light and weather conditions. The complex and cluttered background also makes the detection of road signs a difficult task. As the road sign detector must be deployed in the autonomous vehicles, it must operate under the real-time conditions that pose the computational issues. Cognizant of these problems, a very efficient and a simple detector is devised for the detection of the road signs. In this work, we attempt to define the concept of pervasive information set for the representation of certainty/uncertainty in road signs resulting from the varied sizes and lighting conditions. This work gains importance because it can capture road signs of all sizes in a very short time.
1.1. Related Work
A lot of approaches are put forward under ADAS, of which the salient ones deserve our attention. A few to mention are: The Integral Channel Features (ICFs) are explored in [1] for the detection of the road signs using the sliding window approach. The performance of ICF is compared with that of the Aggregate Channel Features (ACFs) on the US road signs in [2]. The deep learning network-based OverFeat model is used in [3] on the Chinese dataset, Tsinghua-Tencent 100K for both the detection and classification of the road signs. In [4], the various object-detection systems comprising Faster Region-based Convolutional Neural Network (FRCNN), Region-based Fully Convolutional Networks (R-FCNs), Single Shot Detector (SSD) and You Only Look Once version 2 (YOLOv2) are compared for the road sign detection. FRCNN is used in [5] for the detection of road signs using images from both the Türkiye dataset and the German Traffic Sign Detection Benchmark (GTSDB). A deep learning network is trained on the road signs to detect them at different scales in [6]. A new Chinese Traffic Sign Detection Benchmark (CTSDB) is created in [7], with an additional 4000 real traffic sign images and annotations. The YOLOv5 along with Feature Pyramid Network (FPN) is replaced with AF-FPN that combines the Feature Enhancement Module (FEM) and the Adaptive Attention Module (AAM) to improve the multi-scale road sign detection in [8]. The architecture of Faster RCNN is changed in [9] to detect the small-size road signs in real time using the Online Hard Examples Mining (OHEM) approach. The functionalities of Segmentation Network (SegNet) and Encoder-Decoder network termed as U-Net are combined in [10] for the road sign detection. A comprehensive review of the tricks devised for the road sign detection is given in [11]. Some of the recent work on road sign recognition is listed below. The recent advances in the recognition of the traffic signs are surveyed in [12]. The use of YOLOv5 for the road sign recognition is reported in [13]. The transfer learning-based hybrid 2D-3D CNN models are applied to both the traffic sign detection and recognition as a prelude to ADAS in [14].
A comprehensive review of the use of 5G technology is presented in [15] for the detection of accidents. The hand tracking of sign language-based gestures of the deaf and those with the hearing impairment using the Kalman filter is reported in [16]. With a view to improving the detection time of the road signs, the ghost module and multi-scale attention module are fused into the YOLOv8 trained on CTSDB and implemented on Raspberry Pi in [17]. The backbone feature network of YOLOv5 is replaced with MobileNetV3 to minimize the computational time and reduce the model size of the lightweight detection network in [18].
1.2. Motivation
As revealed from the literature survey, it has become imperative to redress firstly the uncertainty in the road signs due to the varied lighting conditions, sizes and their shapes and secondly, the inaccurate fuzzy modelling due to the prevalence of the random distributions of color pixel intensities. The features from the color histograms of the normalized pixel intensities of the road signs are found to be insensitive to the lighting conditions in addition to their sizes and shapes, so we can use colors as a cue in demarcating a road sign dominated by the red pixels from a world scene that in turn is dominated by the green pixels. It is therefore invigorating to design a color-based detector using not only the normalized color histogram features but also the Histogram of Gradients (HOGs) that describe the shapes. As the distributions of color intensities are random, the conversion of the color features into the fuzzy hesitancy features is a wise attempt to account for the inaccurate fuzzy modelling. On this count, we will investigate the intuitionistic fuzzy entropy functions patterned after fuzzy entropy functions to derive the fuzzy hesitancy features that help isolate either Region of Interest (ROI) or Regions of Interests (ROIs) purported to be a road sign from a world scene. We can use the Support Vector Machine (SVM) to classify the ROIs into the road signs and non-signs. In order to classify the detected road signs into a correct class, we will modify the existing Hanman Transform Classifier. When we are dealing with a large number of feature vectors, the question of their independence or dependence matters a lot and the repercussions therefrom ought to be investigated.
1.3. Objectives of the Research Work
These objectives include the following:
1) To formulate different types of features using histogram and uncertainty representation by coping up with the inaccurate fuzzy modelling.
2) To develop a condition for the thresholding operation, and methods for both the detection and recognition tasks.
3) To extract CNN-based features for the recognition of the road signs.
4) To deal with the independent and dependent vectors by devising a new Hanman law.
The rest of the paper is organised as follows: Section 2 briefly introduces the information sets, fuzzy/intuitionistic fuzzy entropy functions and transforms, and formulation of intuitionistic fuzzy hesitancy features including the mean features, intuitionistic fuzzy transform features and hybrid entropy-transform features. Section 3 describes the pervasive information sets and the formulation of the histogram-based probabilistic-possibilistic entropy/transform features. Section 4 is devoted to the treatment of multivariate feature vectors and to the proposition of the Hanman law that directs how to use them if they are independent or dependent. Section 5 presents the methods for the color-based thresholding, detection and recognition and also provides the normalized Red Green Blue (RGB) color model-based features for the detection and CNN-based features for the recognition. The results of their implementation on the Belgium dataset are discussed in Section 6. The conclusions are given in Section 7.
2. Information Sets and Intuitionistic Fuzzy Sets
Road signs are occasionally captured based on color and shape. In this paper, a combination of the normalized color features and HOG is employed. As we all know, the color of an object is dependent on the illumination conditions that keep on changing most of the times over a day, therefore there is always an uncertainty associated with the varying intensities of the road signs. Fuzzy sets can be used for the detection of road signs because of the variability involved in the color intensities reflecting the fuzziness but they cannot address the problem of uncertainty associated with the very variability. As the representation of uncertainty in the color intensities of the different road signs is difficult with the fuzzy sets, we will see next how this is achieved with the information sets.
2.1. Information Sets
The information set concept is originated from the Hanman-Anirban information theoretic entropy function in [19] with a view to enlarge the scope of a fuzzy set that suffers from certain shortcomings such as 1) the delinking of its elements as pairs, 2) arbitrary selection of a membership function, 3) lack of concern over what lies outside the set, and above all, 4) no provision to represent the certainty/uncertainty associated with a set of attribute values. To ameliorate those shortcomings one by one, an information value from a set of information values constituting an information set is defined to be a product of the attribute value termed as the information source value and its membership value thereby connecting the components of each pair of a fuzzy set into a single value to address the first shortcoming. A membership is constructed by looking at the distribution of the information source values in terms of its statistical parameters like mean and variance, which help fit a Gaussian membership function, thereby addressing the second shortcoming. The membership function (
) is generally chosen to be the Gaussian function, The Gaussian membership function is defined as:
(1)
where
is the standard deviation and
is the mean of the information source values. The sum of information values gives the certainty information that shows its allegiance to the concept/class whereas the sum of products of the information source values and the corresponding complement membership function values is the uncertainty information thus addressing the fourth shortcoming. The conversion of a fuzzy set into an information is facilitated by the Hanman-Anirban entropy function,
is defined as:
(2)
where a, b, c, and d are the constant parameters. A choice of their values from the statistical parameters of the distribution of the information source values will convert the exponential gain function into the Gaussian function. To this end, let us substitute
,
,
and
in Equation (2) that leads to:
(3)
where
, is the information value. To get a lot of mileage, we take the help of the generalized Hanman-Anirban entropy function that has a power
on its exponential gain function, i.e.
. Then,
becomes the generalized Gaussian function. By varying
from 0 to 5, it gives rise to several shapes such as trapezoidal, triangular, Gaussian, etc.
2.2. High-Order Information Sets
The problem with the basic information values,
is that they lack the ability to represent the high-order certainty/uncertainty in the information source values. So, we are bent upon formulating the high-order information set like the Hanman transform for which the need arises to seek the adaptive form of Hanman-Anirban entropy function.
Derivation of the Hanman Transform
The adaptive Hanman-Anirban entropy function,
that contains the variable parameters in its exponential gain function is defined as:
(4)
Now, we will derive the Hanman transform denoted by
by substituting
,
into Equation (4) as:
(5)
The higher form of the information set is denoted by
, since its exponential gain function is a function of information values whereas the basic information set is a function of the information source/attribute values. In case of an incorrect membership function, the effectiveness of an information set diminishes, and in such situation, an intuitionistic fuzzy set offers an olive branch. For the more recent works on the information sets, the readers may refer to the paper of Bansal and Madasu in [20] on Ear based authentication and that of Hanmandlu et al. in [21] on Iris based authentication.
2.3. Intuitionistic Fuzzy Sets
The intuitionistic fuzzy set [22] enhances the scope of a fuzzy set that has pairs of elements on empowering it with two additional elements, viz., the non-membership function and hesitancy function so that it can cater to the inaccurate fuzzy modeling. An intuitionistic fuzzy set
having each of its elements a 4-tuple is framed as:
(6)
where the functions denoted by
and
are the values of hesitancy function, non-membership function, and membership function respectively of an attribute value,
. The values of these three functions satisfy the following condition:
(7)
2.4. Definition of the Pervasive Information Set and Formulation of
the Hesitancy Features
The pervasive membership is a combination of the modified
and
that account for the deficiency in the fuzzy modeling arising out of an inaccurate membership function, whereas the basic pervasive information values are the products of the information source values
and the pervasive membership values,
defined as under:
(8)
The pervasive information set is denoted by
, and
is related to
from Equation (7) and Equation (8) as:
(9)
The variable hesitancy function is expressed as:
(10)
where k is the hesitancy degree that helps modify both
and
. The incremental hesitancy function is obtained by subtracting Equation (9) from Equation (10) as:
(11)
This can be simplified as:
(12)
Approximating the bracketed terms in the r.h.s. of Equation (12) as the exponential gain functions lead to:
(13)
where
for
to be positive. The Intuitionistic Fuzzy (IF) Hanman hesitancy entropy function as a sum of incremental hesitancy functions is given by:
(14)
It can be easily realized that the terms in the r.h.s. of Equation (13) are the fuzzy entropy values. Where k is the hesitancy degree by varying, it we can get different values of the incremental hesitancy function. In order to get the corresponding intuitionistic information value, we have to take the product of
with the r.h.s. of Equation (13). This is applicable to any intuitionistic fuzzy (hesitancy) entropy/transform values.
Here, the problem is how to obtain the values of
. For this, we take recourse to the variable complement membership function. We already have two variable complement functions, namely, Sugeno complement
and Yager complement
. In these complements, s is a scale parameter that helps compute,
with different distributions. Equation (14) provides the mean hesitancy feature as it is an average of the values of the incremental hesitancy function.
2.5. Relation with the Jyotsana-Hanman Fuzzy Entropy Function
To explain this relation, let us recall the Jyotsana-Hanman Fuzzy entropy function [23]:
(15)
where
and K is the normalizing factor, given by:
with the power
. This is derived from the Generalized Hanman-Anirban (GHA) entropy function in the probabilistic domain P, defined as:
(16)
The factor K is the same as given above but the constant term,
is removed in Equation (15) because of fuzzification after replacing
with
in Equation (16). If we choose K = 1/n as an approximation to the normalizing factor, a = b = c = 0 and d = 1 in Equation (16), we get one form of the Jyosana-Hanman fuzzy entropy function, as under:
(17)
This is similar to the r.h.s. of Equation (13) except that in place of
, we have
, the complement membership function and that in place of
we have (k − 1). Note that Equation (17) is a specific function that becomes the incremental hesitancy function on replacing
with
and
with (k − 1). Let us consider the adaptive Jyotsana-Hanman fuzzy entropy function having the variable parameters.
(18)
Note that the constant parameters do not change during the summation operation whereas the variable parameters in Equation (18) can be changed for each value of the index, x. But the restriction is that their values should lie in the range (−1, 1). This is a replica of Equation (15) but with the variable parameters denoted as a(.), b(.), c(.) and d(.). Now, substituting K = 1/n, a(.) = d(.) = 0 and c(.) = Ix, we get the Hanman Fuzzy Transform (FT), given by:
(19)
This becomes the coveted Intuitionistic Hanman Fuzzy Transform,
on replacing
with
and
with k-1, shown under:
(20)
The assumption of the variable parameters in Equation (18) bestows us a flexibility that they can be different in the two terms in the summation provided K = 1/n does not pose any problem as we can do away with the normalization in the possibilistic domain. In view of this, Equation (18) can be rewritten as:
(21)
By having different variable parameters in the two terms, Equation (21) bestows the facility to derive different forms of the intuitionistic fuzzy entropy functions. We will derive more general equations than Equation (18) and Equation (21) that are the variants of the Jyotsana-Hanman fuzzy entropy function by invoking the Mamta-Hanman entropy function in [24].
2.6. Derivation of the Generalized Fuzzy Entropy Function
The Mamta-Hanman entropy function, an extension of the Hanman-Anirban entropy function, is defined as:
(22)
where
.
For the purpose of simplification Jyotsana,
for the reason given above. The corresponding Mamta-Hanman fuzzy entropy function can be easily framed based on the Jyotsana-Hanman fuzzy entropy function as:
(23)
It is now easy to write the adaptive form of Equation (23) as:
(24)
By replacing
with
,
with k − 1 and by substituting b = 0 and
in Equation (23), the Mamta-Hanman Intuitionistic Fuzzy (IF) entropy function is obtained as:
(25)
In the similar manner, we can get the Mamta-Hanman Intuitionistic Fuzzy Transform (IFT) by substituting
,
,
and replacing
with
and
with k − 1 in Equation (24) as given by:
(26)
As can be observed that Equation (25) and Equation (26) have an extra parameter
over Equation (19) and Equation (20), which can be leveraged to our advantage just as in Type-2 fuzzy set the variance is changed to elongate the shape of the Gaussian function. In an information set this option can be exercised but not here, as we are dealing with the inaccurate fuzzy model, i.e. the non-membership function.
The benefit of having different variable parameters in Equation (21) is not exemplified. This exemplification is now on the cards with the invocation of the adaptive Mamta-Hanman fuzzy entropy function. To this effect, we will rewrite Equation (24) as follows:
(27)
To test the flexibility offered by the different parameters in the r.h.s. of Equation (27), consider two sets of substitutions with the first one as
,
,
,
and the second one as
,
,
,
and
in addition to replacing
with
and
. The resulting hybrid Mamta-Hanman Entropy-Transforms (ETs) are given as under:
(28)
(29)
It may be noted that we would have got the same equation but for
from Equation (21), keeping the other parameters the same and making the substitutions that include:
1)
and
,
and
.
2)
and
,
and
.
This conversion from the fuzzy domain into the intuitionistic fuzzy domain will not be possible without the different parameter settings in the two terms. The story doesn’t end here as several ramifications spring forth with an appropriate of choice of the variable parameters in the exponential gain functions in Equation (28) and Equation (29). For example, these gain functions can be converted into the logarithmic gain functions.
2.7. A Brief Discussion on the above Features
Let us gain insight into the problem to understand what is going on by the above formulations. We can see that both membership and non-membership functions are modified by the choice of parameters of the exponential functions as well as the hesitancy degree not to talk of the scale parameter in the construction of the no-membership function itself. The net effect of all these is to push the pervasive membership function values close to 1 thereby killing the underlying distribution of the information source values even if there exists any. This amounts to in clear terms what we achieve by the histogram equalization, enhancement or by a cut-set operation on the membership functions of an image to affect its look and feel. But, this analogy enthuses us to seek the alternative ways of modifying the membership function itself rather than becoming parasite on the non-membership and the hesitancy degree. Though this is interesting and emulating but our cherished objectives don’t allow us to take any more extra steps in this direction. As we can see, the histogram equalization distributes an equal number of gray levels for each frequency of occurrence, and a cut set discards the lower values of a membership function, whereas the enhancement pushes the membership function values that lie above the crossover point to the vicinity of 1 and those below the crossover point close to 0. All these methods applicable to a fuzzy set have a limited effect in altering the membership function values of a well-defined mathematical function like Gaussian, Bell, Generalized Gaussian, etc. Hence, these are of no use as far as the intuitionistic fuzzy set is concerned to adapt at this juncture. Now coming to the usage point, we will use only the terms as features on the r.h.s. of equations of interest, viz Equation (20), Equation (24) and Equation (25). Irrespective of the choice, we are bothered about how effective are the features of a particular type. This is only ascertained while using them for the detection of the road signs.
3. A Brief Description of Pervasive Information Sets
The above formulations pave the way for the definition of pervasive information set. Recall the certainty information value
in Equation (3) that tells how much
belongs to the concept (normal or abnormal) represented by the value of the Membership Function (MF),
. The Hanman-Anirban entropy function helps us find the certainty information from the distribution of the values of
by fitting MF. By this, a fuzzy set comprising
as pairs is converted into the information set
.
Consider
which contains uncertainty (external) information. In the case of the intuitionistic information set, the sum of two kinds of information, viz, certainty and uncertainty, is
and the residual information is
. By resorting to the accurate fuzzy modeling, the residual information becomes
. At this juncture, we define the variable pervasive membership function,
that leads to the variable Pervasive Information set as the sum of
and
, as given by:
(30)
This reminds us that in addition to k the pervasive membership function is also dependent on the scale s used in Yager and Sugeno complements. The pervasive information value is a variable unlike the information value in an information set and it is given by:
(31)
The pervasive information set is a collection of
, i.e.
. A flowchart showing the formation of the variable pervasive information set is given in Figure 1. As
and
are less than 1, we can investigate the Frank T-norm,
Figure 1. Flowchart on the formation of the variable pervasive information set.
(32)
or Dombi T-norm,
(33)
The above parametric T-norms are the possible variants of the
in which k acts as a parameter. However, a thorough experimentation would reveal which T-norm or S-norm is a suitable candidate for the role of a
.
3.1. Properties of Pervasive Membership Function
These are listed as follows:
1) Just as the Hanman transform the pervasive Hanman transform can be easily derived as:
(34)
2) The pervasive membership function depends on two non-statistical parameters s and k and therefore it doesn’t fit any distribution.
3) The name pervasive denotes that the information provided by
and
that cover both the inside and outside of a set. For instance, consulting a family doctor for any ailment is to receive an advice (internal information) whereas seeking the opinion of an expert doctor on the same ailment is the external information.
4) The use of pervasive information set is necessitated to circumvent the inaccurate fuzzy modeling of a membership function. When we are unable to fit a suitable
for the distribution for the information source values, then it prompts us to go in for
.
5) While dealing with the set based operations, two pervasive membership functions of a pervasive information set are amenable to the union and intersection operations but not to the complement operation. The union of two pervasive membership functions,
and
is
and their intersection is
and
. They also do not satisfy the convex property due to Property 3. The components of the pervasive membership function can be subjected to T-norm or S-norm operators as in Equation (32) and Equation (33).
3.2. Histogram Representation of Gray Levels in Road Signs
Let us confine to the representation of gray levels g from 1 to 255 first which are normalized to (0, 1) and extending it to the RGB color model is straightforward. The histogram is a plot of g vs.
where
is the frequency of occurrence of g and hardly displays any kind of distribution. On the other hand, normalized gray levels range from 0 to 1. If we select any standard membership function for g, then the information set is
and the corresponding Hanman possibilistic-probabilistic transform is
. Now coming to the case of inaccurate fuzzy modelling where
is ill-defined, we need to tread in the intuitionistic domain. As we have improper
, we have to select the non membership function
by seeking a Sugeno or Yager complement. The question of hesitancy degree arises here too, but interestingly we can use the frequency of occurrence,
akin to probability as the hesitancy degree for two reasons: Firstly it lies in the range of 0 to 1 and secondly the more the probability , the lesser is the hesitancy. In light of this reasoning, the Jyotsana-Hanman fuzzy entropy function paves the way for the formulation of the possibilistic probabilistic Intuitionistic entropy function as:
(35)
This is born out of Equation (15) with parameters set as K = 1, a = b = d = 0, c = 1 and with replacement of
with
,
with
and
with
. Here, L stands for the number of gray levels. However, if we use the variable parameters as part of the Jyotsana-Hanman fuzzy entropy function, possibilistic-probabilistic intuitionistic Hanman transform is obtained as given by:
(36)
We will now give a clue for its implementation, if g is divided into intervals such that each interval will have a bin consisting of a set of gray levels and the corresponding frequencies of occurrences.
(37)
where j indicates the jth bin and
indicates the number of gray levels in that bin. If we have Nb as the number of bins then
. Note that difference between Equation (36) and Equation (37). Each term in the bin, denoted by
in Equation (37) gives the possibilistic-probabilistic entropy function that forms one feature whereas all the terms on the r.h.s. of Equation (36) qualify as features. It may be noteworthy for the readers that several probabilistic features are generated in [25] by combining the Hanman-Anirban entropy function with Shannon, Renyi and Tsallis entropy functions.
4. Analysis of Multi-Variables in the Information Set Theory
So far, we have dealt with a single input, i.e. an attribute or an information source having either a proper distribution of values that are amenable to be fitted with a well-defined mathematical function like the Gaussian membership function based on the fuzzy set concept or having a random distribution for which fitting an inaccurate membership function is only possible in which case we go in for an intuitionistic fuzzy set concept. In either case, we can extract different types of features as explained above from the input, say, a road sign using the information set concept. An important point to note at this point is that we can have one input producing either one output or many outputs. On the other hand, there can be multiple inputs producing either a single output or many outputs. By this, we have four categories that include:
1) Single Input Single Output (SISO): e.g. one road sign showing one identity.
2) Single Input Multiple Output (SIMO): e.g. one scene containing several constituents.
3) Multiple Input Single Output (MISO): e.g. one-color road sign displaying one identity.
4) Multiple Input Multiple Output (MIMO): e.g. one-color scene with three channels having several constituents.
The 4th category is applicable to the detection of a road sign from a world scene comprising the pixel intensities from the R, G, B channels. However, the detection of a road sign is difficult because of its small size as compared to other constituents such as track, trees, and buildings. But there is a silver lining associated with this detection, as most road signs are predominantly red in color, while the world scenes containing the road signs have trees (the green patches) on the two sides of the roads, including a central patch of greenery. So, we shall exploit the color gradient as a cue in devising a threshold to segregate a road sign from the scene and in designing the Color-Based Detector.
4.1. Multi-Variables in the Histogram Representation
Here, we are concerned with a scene with RGB color model. We have already discussed how a gray level g in a bin with its membership function
, the non-membership function,
and frequency of occurrence of
that acts as a hesitancy degree is represented as the possibilistic-probabilistic entropy function. We now look at the histogram representation of g in both G and R channels with the corresponding histograms. For ease of notation, we consider a bin in G-histogram and at the same location another bin in R-histogram. Their gray levels are denoted by
and
, their membership functions by
and
and their frequencies of occurrences by
and
respectively. It may be noted that
and
are the result of opting a mathematical function for the gray levels in the two bins. The possibilistic-probabilistic information value for
is
and that for
is
. While the possibilistic-probabilistic transform values for
and
are
and
respectively.
4.2. Multi-Variables in the Information Set Representation
Unlike in the histogram representation, G and R scene images are partitioned into windows of some size. The pixel intensities of each window are fitted with a mathematical function. Let
and
be the pixel intensities and
and
be the corresponding membership function values in the windows of G-scene and R-scene respectively. Then, the basic information values and the Hanman transform values of pixel intensities are
and
respectively in the windows. If we replace
with
and
with
where p stands for the pervasive membership function, we get the corresponding pervasive information values and pervasive transform values respectively. We can also make use of the intuitionistic fuzzy and transform features derived above as alternatives by adopting them to G and r channels. Here comes the need for distinguishing between the independent and dependent variables. To this effect, the Hanman law states how to distinguish them.
Hanman Law: Let the two variables
and
or
and
from their respective samples be compared for ascertaining their relative significance then their gradient information values or simply the error values need to be considered if they are independent else their divergence information values, if they are dependent.
Whether we have the gradient or divergence information values, if they are extracted from a number of samples, then the T-norms of the information values provide the minimum information to be considered for accomplishing the tasks such as detection, tracking, and recognition. It is tempting to know that the Bayesian law that plays with a priori and posterior probability distributions has anything to do with the Hanman law whose purpose is to offer guidance in the matter of independence/dependence of the feature vectors derived from the distributions of information source/attribute values. Some discussion to this effect is postponed to the conclusion section. As the probabilistic Hanman-Anirban entropy function is extended in [26] to embed the Bayesian learning into the information set fold; it won’t be difficult to establish a connection between the Hanman and Bayesian laws.
Proof: It may be noted that the dependent variables can be called the conditional variables. Let R and G be the pixel intensities to be compared, the
and
be the corresponding information values, then the gradient information is defined in the independent case as:
(38)
In the dependent case, the gradient information becomes:
(39)
where
is the cross-entropy function. If R depends on both G and B, then the relation between them can be written according to the Hanman law as:
(40)
The parametric T-norms such as Frank and Yager T-norms are found to be effective. While implementing T-norms, two terms are taken at a time. The resulting first T-norm is used to get the second T-norm by combining the first T-norm and the third term. This way, the T-norm of any number of terms can be obtained. Here, the term refers to an information value, gradient and divergent information value. In a simplest representation, we can have a product as the T-norm, that is:
(41)
This can be extended to any number of values as:
(42)
Clarification on the Product T-Norm
This formula has so much to say on the merging of several vectors consisting of 1) Information values, 2) Gradient information values, 3) Divergent information values, where they appear as the composition of information source values, membership function values and the exponential gain functions. If they happen to be the computed values, then Jyotsana recourse to the parametric T-norms. Otherwise, this gives a solace as it can be applied on all the vectors at a time to yield their T-norm thus relieving us from the ordeal of satisfying the four properties, viz, monotonicity, commutativity, associativity and identity. The intersection operation gives the minimum of all the membership functions. Interestingly, we can use the T-norm operator on all the membership functions by taking two at a time. But this is a time-consuming process. Though we have clues at hand to improve, we will explore on this formula in a separate work.
Just as we have considered many samples of
, we can also have many samples of
, we can also have many samples of
like
,
,
in that case either we can go for a parametric T-norm, T (
,
,
) by taking two terms, (
,
) at a time and then combining with the third
. Alternatively, we can also make use of the above product as the T-norm proposed by us. Let us now move on to high level representations of attribute/information source values using the Hanman transform. The above gradient and divergent information values in both the independent and the dependent cases appear as:
(43)
(44)
where the superscripts I and D indicate the independent and the dependent cases respectively. In the situation of the inaccurate fuzzy modeling, the pervasive information set is the option for us as detailed at length in the previous sections. So, the above expressions in Equation (43) and Equation (44) can be easily changed by replacing
with
and
with
, where the subscript p represents the pervasive information set, in addition to having the hesitancy degree as (k − 1). The above expressions take the forms as given by:
(45)
(46)
5. Application of Information Theory Concepts to Detection,
Tracking and Recognition
An application of the information set theory is made for the first time to the detection, tracking and recognition of the road signs though it has been applied to other fields such as biometric authentication involving several modalities like face, fingerprint, retina, ear, and gait; to the medical diagnosis of diseases like brain tumor, breast cancer, diabetic retinopathy, and to several imaging processing applications listed in [26]. We now pinpoint its application to the tasks of the pipeline in Table 1.
Table 1. An application of the information set theory to different tasks of the pipeline.
Main tasks of the pipeline |
Sub-tasks |
Information set concept used |
Thresholding |
Thresholding Condition |
The gradient of the fuzzy
Hanman transform values |
Detection |
Feature Extraction |
Incremental hesitancy-based
color features |
Tracking |
Learning Model |
Divergent Hanman transform values |
Recognition |
Design of a Classifier |
Mamta-Hanman transform as the criterion function |
5.1. Detection
The issue here is to separate a road sign from the world scene. The striking feature of most of the road signs is the red color whereas the world scene is dominated by the green color. So, we wish to use this color contrast in devising a threshold condition for the segregation of a road sign from a world scene, based on histogram representation of green and red gray levels of a world scene. As already defined, the membership functions of the green and red channels are
and
and the corresponding frequencies of occurrences are
and
, then the fuzzy Hanman-transform based threshold condition is expressed as:
(47)
This condition is actually the gradient fuzzy Hanman transform values assuming the independence of green and red pixel intensities. Having derived the threshold condition as a fuzzy gradient between the fuzzy Hanman transform values of the normalized green and red pixel intensities in the normalized RGB color space, the histogram representation of their distributions within a world scene is depicted in Figure 2. The fuzzy gradient attains a minimum value at the value of 0.5 at which the normalized red pixels have intensities,
and those of green pixels,
where the subscript N denotes the normalization. Hence, this value is selected as a threshold.
Figure 2. Histogram distribution of the red and green channels.
All the red pixels of the scene demarcated by the threshold leading to its binarization are linked to form a contour or an edge. A bounding box or blob enclosing each contour is subjected to the height-width limits in terms of minimum and maximum values as shown in Table 2 and these in turn help in segregating the Regions of Interests (ROIs) designated as the road sign candidates. The next step is to confirm the candidacy with a Color-Based Detector (CBD) by making use of different types of features such as incremental hesitancy features, mean hesitancy features given in Equation (13) and Equation (14). These features emerge from the conversion of the normalized R, G, B values denoted by
,
and
in the candidate road signs on the application of the pervasive information set concepts. We have employed the incremental hesitancy features from Equation (13) and the mean hesitancy features from Equation (14) in the development of two CBDs with the first one being called CBD using the Hesitancy features (CBDH) and the second one, CBD using the Mean Hesitancy features (CBDMH) in addition to the Basic CBD (BCBD) that is built using the normalized RGB color features and also Histogram of Gradients (HOGs) as shape features. Though we are in position to amass several variants of CBD by the adaptation of different types of features, we focus on the three features only, and the details of their implementation are relegated to Section 6 entirely devoted to the results, where we will be using the Radial Basis Function-Support Vector Machine (RBF-SVM) as the detector to demonstrate the effectiveness of the features. The objective here is to confirm whether the captured ROI is that of a road sign with the help of any classifier in the role of a detector. The block diagram is as shown in Figure 3.
Table 2. Limiting values for the features used to filter out ROIs.
Minimum height (pixels) |
Maximum height (pixels) |
Minimum width
(pixels) |
Maximum width (pixels) |
12 |
500 |
12 |
500 |
Figure 3. Block diagram of the proposed pipeline.
5.2. Tracking
In this, we will touch upon the implementation clue. The issue here is to find the location of the corners of the bounding box enclosing the detected road signs in a new frame using the location of the previous road sign in the previous frame. This is actually the prediction of the location of a corner. To achieve, a tracking model is chosen along with an objective function. The unknown parameters of the model are found using the evolutionary learning model. Here, we choose a set of contenders called a population to achieve the goal of finding a corner in the next frame. As we are concerned with two diagonal corners, we need two X coordinates and two Y coordinates. This learning model has to be executed separately to determine each coordinate iteratively until the model is converged. The values of the objective function will provide the outcomes for which the membership functions are found. The presence of the second subscript in a membership function is denoted by the symbol,
indicating that it is outcome-based. The unknown parameters serve as the information source, also called effort values. We can use the prudent learning in [23] to achieve the goal. According to this approach, a contender not only competes with the achiever but also with the least performing contender. Then, the sum of the divergent Hanman transform values would provide the increment to the updation of the effort values as given under.
(48)
where
is the learning factor, r1 and r2 are the random numbers, (it) refers to the old iteration with (it + 1) being the new iteration. The first subscript c of the membership function denotes a contender of interest, s refers to the achiever, and e refers to the least performing contender. The number of iterations is specified a priori. We will not go deep into this approach as it is worth a separate research work.
5.3. Recognition
The need arises for the recognition as some of the ROIs demarcated by the thresholding, and then enclosed in the bounding boxes to be confirmed as the road signs by any CBD are prone to error as they turn out to be the false positives. To mitigate this shortcoming of detectors, we go in for the 2-stage mutli-scale Convolutional Neural Network (CNN) model in view wide spread popularity of the deep learning neural networks. The architecture of this model is shown in Figure 4 and we tap the feature maps after either the first block or the second block of the CNN model depending how effective are the CNN/deep features extracted from them. As explained in [27], it has two stages of learnable layers with each stage containing a convolutional layer, and a max-pooling layer followed by the local contrast normalization layer. The combined output from both the layers forms the input to softmax classifier that uses Adam optimiser. However, we have formulated the error/gradient information based Hanman Transform (HT) classifier as an alternative to the above CNN-based softmax classifier. Unlike in the detection, the color pixel intensities are unimportant in the case of recognition as we are interested in the class of the detected road sign. So, all the three-color channels are merged to provide a gray level image. We have proposed two methods for the recognition discussed next.
![]()
Figure 4. CNN model for tapping the feature maps.
Method-1: In this method, the features drawn from the feature maps are classified using the Hanman Transform Classifier (HTC). First of all, we will discuss how the feature maps are generated from the detected road sign image that serves as the input to the CNN model. As shown in [28], an application of kernel function on an input image (i.e. road sign) in the convolutional layer in the first block converts it into a membership function matrix whose size is reduced in the max pooling layer of the same block. As we apply a specified number of kernels on the input image, we have as many membership function matrices as the number of kernels employed. Next, we move on to the second block where the inputs are the membership function matrices from the first block. The application of a kernel on each membership function matrix in the convolutional layer followed by the max-pooling operations in the second block leads to the substantially modified membership function matrices of reduced size. The feature maps can be chosen from either the first block or the second block. The deep features are so named as they evolve from feature maps in the CNN model. As there are 32 filters, 32 values of the membership function emanate for each pixel, which are averaged out to get a single value. Like this, there are 28 × 28 deep features after the first convolutional layer.
Development of a Variant of the Hanman Transform Classifier
The first application of the Hanman Transform Classifier has appeared in [29] followed by its variant in [25]. The deep features are given to the proposed Variant of the Hanman Transform Classifier (VHTC) that operates on the error or gradient vectors between the training feature vectors of each class and a single test feature vector and delivers the identity or class label of the test feature vector. It may be noted that the gradient vectors contain the residual information that reflects the variability involved in the road signs due to varying lighting conditions, sizes and shapes. To process so many error vectors of a class, the parametric T-norm is advocated above. Following this principle, we have opted the Frank T-norm that takes two error vectors at a time and gives the T-normed error vector. Instead of computing an overall T-norm encompassing all the error vectors, we seek to find all the possible T-normed error vectors that contain the necessary information about class. But to get the sufficient information, the Hanman law recommends the gradient transform vector as the guiding factor. Here, we shall use the Mamta-Hanman Transform denoted by
to get the coveted the gradient transform vectors from the T-normed error vectors. The gradient transform vector with the minimum MH transform value is considered as the representative of that class; hence this transform acts as the criterion function. The infimum of all the representatives bestows the class label of the unknown road sign.
A brief description of the VHTC is as under: Let
denote the
feature vector of the
sample in the training set and
be the
feature vector of the test sample. The absolute error vector between
feature vector of the
sample of
class and
test feature vector is computed using:
(49)
Next, the Frank t-norm [30] between every possible pair
of the (i, k) error vectors in the
class, is calculated from:
(50)
where
(Number of the training feature vectors/samples) and k = i + 1 but not equal to i with s > 0. The number of possible pairs or the t-normed error vectors is equal to
. As we want to cash in on the certainty associated with each t-normed error vector belonging to a class, its degree of association with the class has to be found. For simplicity, we consider the exponential function of the t-normed error vector as its membership function vector, given by:
(51)
To derive the Mamta-Hanman transform, let us consider the adaptive Mamata-Hanman entropy function, expressed based on Equation (22) as:
(52)
Substituting
,
,
arrived at by experimentation,
,
,
and
in Equation (52) leads to the Mamta-Hanman transform as:
(53)
As the criterion for classification, the infimum of the Mamta-Hanman transform values of all the selected t-normed error vectors identifies the class label of the test sample. Apart from VHTC, another variant would take birth if the pervasive membership function swaps the exponential membership function in Equation (53). Our endeavor has been to make VHTC an indomitable classifier. The T-normed error vector with the minimum Mamta-Hanman transform value is qualified to be the representative of a class as all other T-normed error vectors possess the transform values that are well within the border adjoining another class whereas the infimum of the representatives of all classes gives the identity of the class label of the test feature vector. The representatives here resemble with the vectors of the SVM, a name given to the hyper planes that are fitted to the feature vectors of each class. The hyper parameters, viz., the regularization parameter C, and the width of the radial basis (Gaussian) function γ, of the hyper planes of SVM are found by the complex optimization for the sake of an optimum performance. The other parameters that affect the performance of SVM are K, the number of clusters, and s that gives a shift of the hyperplane from a vector passing through the origin. Method-2: As far as the generation of the feature maps is concerned, the procedure is the same in the second method as outlined in the first method but the difference lies in the features derived from the selected feature maps, and these are deep fuzzy hesitancy features and the deep mean hesitancy features. The algorithm for the extraction of these features is given below.
We have used here SVM classifier for the classification of two types of fuzzy hesitancy features computed in Steps 7) and 8). An algorithm for the Extraction of Features:
1) Resize a test road sign to 32 × 32.
2) Convert the image into Y color space and then normalize it.
3) Extract the deep hesitancy features after the first convolutional layer itself.
4) Tap the features maps of dimension 28 × 28 numbering 32 as there are 32 kernels.
5) Compute the Gaussian membership function at each pixel location using 32-pixel intensities.
6) Compute the Yager complement at pixel location.
7) Compute the deep fuzzy hesitancy features using Equation (13).
8) Compute the deep mean fuzzy hesitancy features using Equation (14).
We have used here SVM classifier for the classification of two types of fuzzy hesitancy features computed in Steps 7 and 8.
6. Results of Implementation
So far, we have described succinctly the formulation of different feature types, the creation of a threshold condition, the design of detectors, the process of tracking by a learning model without braving for its implementation, and the modification of a brand of transform-based classifiers, each of which has a role to play in ADAS, realized through the proposed pipeline.
6.1. Datasets Used
The Belgium Traffic Signs Dataset (BTSD) hails from [31] and it contains 62 classes belonging to the mandatory, prohibitory, and danger road signs. As we are interested in red colored signs for both the thresholding and detection tasks, only 30 classes of prohibitory and danger road signs are found to be relevant for our study. The Belgium Traffic Sign Classification Dataset (BTSCD) [31] is a subset of BTSD containing the cropped images and these are employed for classification. However, for the extraction of deep features, we have used only 14 classes of the road signs similar to those in the CURE-TSD [32] under the unchallenging environments only. Now, it is time to present the results emanating from their implementation on the real world scene samples of BTSD [31].
6.2. Detection
The ROIs captured through thresholding are passed onto the SVM for further confirmation as road signs with the intention of eliminating false positives. In the baseline Color-Based Detector (CBD), the histograms of the Normalized RGB (NRGB) gray values from each channel by which we obtain 45 color features as a 15-bin equal density histogram are concatenated with the HOG that capture both edge gradients and intensity changes, hence they are excellent for filtering out the false positives. To extract the HOG features, ROIs are resized to 32 × 32 and then converted into the grayscale. Further, they are divided into cells of 8 × 8 pixels and blocks of 2 × 2 cells. The number of orientations is taken as 9 with a consequent feature vector of 324. The length of the total feature vector becomes 369 for each ROI. To capture the certainty in the variation in color intensity of the road signs, the above color features are converted into the hesitancy features as described in Section 2.4 as they help improve the performance of the road sign detector. A few typical samples of the world scenes from the BTSD are shown in Figure 5.
The color features from the sample of a world scene are converted into the incremental hesitancy features using Equation (13) as they increase the recall. Non-Maximum Suppression (NMS) further reduces false positives by choosing the box with the highest score among the bounding boxes that overlap more than 50% [33]. Out of 5905 training images of BTSD, SVM is trained on 1536 images of the red color road signs in the prohibitory and danger categories. The negative set is created from the training images containing no road signs. The color thresholding is applied on them and the ROIs picked up by this process are considered as the false positives, thus serving as the non-road signs. A comparison of the results of the three CBDs, viz., BCBD, CBDH and CBDMH, with those of YOLOv5 is given in Table 3 from which it can be observed that CBDH has an edge over BCBD and CBDMH though YOLOv5 reigns supreme but with a toll of high computational time; hence not suitable as a real time detector. This is due to the conversion of the normalized RGB features into the hesitancy features thereby ascertaining the importance of the variable pervasive membership function. To get the optimum F-scores with SVM, the finetuning of its hyper parameters C and γ is also done here. The detected road signs due to BCBD, CBDH and YOLOv5 are illustrated in Figure 6.
![]()
Figure 5. The world scenes from the Belgium streets.
Table 3. The performance comparison of the three CBD detectors with YOLOv5 using SVM.
Name of detector |
Recall |
Precision |
Best F-score |
Avg. time per frame in secs. |
BCBD |
0.7018 |
0.6303 |
0.66 |
0.165 |
CBDH |
0.758 |
0.632 |
0.689 |
0.169 |
CBDMH |
0.7716 |
0.6172 |
0.685 |
0.169 |
YOLOv5 |
0.955 |
0.73 |
0.82 |
0.9 |
Figure 6. Display of the detected road signs of BCBD, CBDH and YOLOv5.
6.3. Recognition
Figure 7. Attainment of F-scores with the varying values of epochs, regularization parameter, and rejection threshold for a given learning rate under the five-fold cross-validation of YCNN features using softmax.
It may be noted that some of the ROIs detected as the road signs by the three CBDs and YOLOv5 are the false positives. To recognize them, we move from the detection phase to the classification phase wherein, we first resize the ROIs into 32 × 32 and convert the pixel intensities into YUV components of which Y (luma) component is found to be the best as shown in [27] and U and V (chroma) components are discarded. The Y components of ROIs due to BCBD, CBDH, CBDMH and YOLOv5 are denoted by Y-BCBD, Y-CBDH, Y-CBDMH and Y-YOLOv5 respectively. Second, we apply the multi-scale CNN model discussed above on these Y components to extract the corresponding CNN deep features denoted by YCNN-BCBD, YCNN-CBDH, YCNN-CBDMH and YCNN-YOLOv5 from the feature maps. Third, these features are classified by the softmax. We recall here that Type-2 HanmanNets in [28] also utilize the deep features but extracted from the feature maps of the pretrained deep networks like AlexNet, GoogLeNet etc. So, they are parasites of CNN architectures. We have used BTSCD [31] for the experimentation wherein, the five fold cross validation of YCNN features yields the best performance at 90 epochs and the learning rate initially set at 1e-3 is dynamically altered by the Adam optimizer until the convergence of softmax is achieved. As our main motive is to reduce the false positives, the maximum probability score of a road sign furnished by the softmax is checked to see if it is below a certain threshold for it to be rejected as a non-sign. The performance of softmax is judged by the best weighted F-score, an amalgamation of both the true and false positives over all the classes. Figure 7 shows F-scores attained for the varying epochs needed for training the CNN model, regularization parameter and the rejection threshold.
Table 4. The performance comparison of YCNN features with softmax.
Softmax with the
detector used |
Recall |
Precision |
Best F-score |
Avg. time per frame in secs. |
YCNN-BCBD |
0.6712 |
0.659 |
0.665 |
0.178 |
YCNN-CBDH |
0.722 |
0.662 |
0.6910 |
0.180 |
YCNN-CBDMH |
0.733 |
0.653 |
0.692 |
0.180 |
YCNN-YOLOv5 |
0.848 |
0.772 |
0.809 |
0.9129 |
A comparison of recognition performance is shown in Table 4. As can be seen from this table that with all YCNN features, the recall has degraded because some of the road signs are misclassified, but the precision is increased with the reduction of the false positives. The F-score of YCNN-YOLOv5 shows a decline after recognition but the YCNNs of the three CBDS witness an improvement.
Results of Experiments to Test the Effectiveness of Different Features and Classifiers
As a support to our experiments, we compare the recognition performance of a few prominent with that of VHTC. Note that this comparison shown in Table 5 is not connected with the pipeline as it is only exercised on the cropped road signs of BTSCD, while Table 4 shows the recognition performance of softmax applied on the road signs detected by the CBDs, as part and parcel of the pipeline.
Table 5. Accuracies of various classifiers on CNN features.
Method |
Training accuracy |
Test accuracy |
Avg. time per frame in secs. |
Softmax |
100 |
97.5 |
0.05 |
VHTC |
100 |
97.98 |
0.32 |
Random forest classifier |
100 |
98.465 |
0.0019 |
SVM (linear) |
100 |
97.89 |
0.00012 |
SVM (rbf) |
100 |
97.25 |
0.00012 |
Naive bayes |
100 |
97.899 |
0.00010 |
Logistic regression |
100 |
91.922 |
0.00013 |
KNN |
100 |
97.89 |
0.00004 |
As can be noticed from Table 5 that though the time taken by VHTC is huge, the accuracy is better than that achieved with all the classifiers except the random forest classifier. The reason for this high computational time is that both the training and testing are done simultaneously, unlike other classifiers where the training is separate from the testing.
Table 6. Accuracies of deep hesitancy features using SVM with different settings.
K |
S |
C |
γ |
Layer |
Number of features |
Testing
accuracy |
Avg. time per frame in ms |
2.3 |
0.5 |
5 |
20 |
Convolution layer |
784 |
96.75 |
1.03 |
2.3 |
0.5 |
25 |
12 |
After second
convolution layer |
100 |
93.095 |
0.33 |
Table 7. Accuracies of CNN features with SVM.
C |
γ |
Layer |
Number of features |
Classifier |
Testing
accuracy |
Avg. time per frame in ms |
25 |
0.001 |
After first
convolution layer |
25088 |
SVM |
93.98 |
49.11 |
20 |
0.0001 |
After second
convolution layer |
6400 |
SVM |
97.238 |
17.05 |
5 |
0.01 |
Without softmax |
100 |
SVM |
98.131 |
0.084 |
|
|
After applying softmax |
|
Softmax |
98.21 |
28.11 |
Next, we utilize the hesitancy function for the extraction of the deep hesitancy features from the CNN feature maps. We have named the hesitancy features from the feature map as deep hesitancy features as against the hesitancy features that are the result of the conversion of the normalized color features from RGB channels of the road signs. Table 6 shows this comparison and in this we have used only 14 classes of the BTSCD [31]. We extract the deep hesitancy features from the feature maps of the CNN model after the first and the second convolutional layers. It is found that the deep hesitancy features from the first layer give the best performance with the SVM and the time taken for the classification is very less. Moreover, we are able to reduce the execution time considerably with the features drawn at the first layer feature map as can be witnessed from Table 6. Thus, the use of the deep hesitancy features forbids the need for a deep neural network as the feature maps tapped after the first convolutional layer are good enough. Through five-fold cross-validation, different values of K, s, C and γ are experimented on SVM, and the best results are achieved with K = 2.3, s = 0.5, C = 5 and γ = 20. If we make use of CNN features either from the first convolutional layer or from the second before converting them into the deep hesitancy features, the results obtained from SVM are given in Table 7. As the number of features is reduced, the deteriorating performance can be noticed.
6.4. Strengths and Limitations of the Proposed Approaches
The proposed approaches for the demarcation of ROIs from a world scene by thresholding, their possible confirmation as the road signs by any detector and lastly their classification by the variant of the classifier, are all credited with the uncertainty representation abilities in the pixel intensities during the feature extraction and that in the feature vectors during the classifier design. The formulation of different feature types can cope up with the inaccurate fuzzy modelling. As most of these feature types are embedded with the parameters that can be attuned to achieve the desired effects. No approach is foolproof, as each one has one or two loopholes. As features used in the tasks of detection and recognition are evolved from the use of histograms of pixel intensities, the concepts of the information set, and the pervasive information set, we have a variety of them to choose; hence, the choice of an appropriate feature type is a limitation we have to live with. Some of the feature types are equipped with the parameters that need to be found by experimentation, and this is another limitation. Tackling small sizes of the road signs during detection/recognition is met with the problem of the false positives that endanger the performance of the approaches.
6.5. Contributions to the Paper
Several contributions are made while handling the detection and recognition tasks and they include the following:
1) Formulation of the Mamta-Hanman fuzzy entropy function and its modification to the intuitionistic fuzzy entropy function. As a byproduct the membership function in a fuzzy set is incarnated to the pervasive membership function in an intuitionistic fuzzy set.
2) Formulation of a variety of feature types that include intuitionistic fuzzy entropy/transform, hybrid fuzzy-transform, possibilistic-probabilistic entropy function/transform, the normalized RGB color model, and deep features from CNN model. The utility of some feature types is not leveraged due to their unsuitability.
3) Creation of a threshold condition using the fuzzy Hanman transform gradient.
4) Development of CBDH that can cope with the inaccurate membership function by roping in the non-membership function.
5) Proposition of Hanman law that provides guidance in managing the gradient/divergent values while tackling the independent and dependent vectors.
6) Design of a VHTC that unlike HTC is gifted with the ability to change the information source values in its new criterion function.
7. Conclusions
This paper presents the methods for thresholding that segregates the Region of Interest (ROI) from a world scene based on the color distinction, detection that attempts to verify whether the ROI is that of a road sign and lastly, recognition that categorizes the confirmed road sign into a correct class in the prohibitory and danger signs. The underlying processes are mainly feature extraction and classification that are given an extensive coverage through the formulation of different feature types and a variant of the Hanman Transform Classifier.
A brief exposure to the concept of information set followed by that of the pervasive information set adorns the introduction to the information set theory that has emerged from the fuzzy and intuitionistic fuzzy sets on enlarging their scopes. Consequently, in an information set, the complement membership function has a role to play, whereas in an intuitionistic/pervasive information set, the non-membership function has its own role to play in the sense that it can correct the inaccurate membership function. The comprehensive treatment of the fuzzy entropy functions and the corresponding intuitionistic fuzzy entropy functions that lead to different types of features is given to ameliorate the drawbacks of the inaccurate fuzzy modeling in the context of the detection of the road signs from the real-world scenes. While trying to detect the presence of a road sign in a world scene, the histogram representation of the scene is proved to be a boon as it facilitates the creation of a threshold condition in the form of the fuzzy gradient information values of the green and red gray levels according to the proposed Hanman law. This law aims to distinguish between the independent and dependent variables in terms of either the fuzzy gradient information values for the former or the fuzzy divergent information values for the latter. The Hanman law differs from the Bayesian law that is applicable to a priori and posterior probability distribution, whereas the Hanman law is eminently suitable for different types of information sets such as basic information sets, pervasive, gradient and divergent information sets that can provide succor to the problems involving detection, recognition, learning, etc. A thorough investigation of the capabilities and handicaps of the Hanman and Bayesian laws is beyond the scope of this research work.
The threshold condition fails to isolate the correct ROIs due to the lack of color distinction and presence of small-size road signs, thereby giving birth to the false positives. To prevent these, the three Color-Based Detectors (CBDs) saved our skins as the color features are fairly immune to the varied lighting conditions and sizes. Of the three, the CBDH using the fuzzy hesitancy has an upper hand, though YOLOv5 outperforms all the detectors, but it has its own black spot of high computational time. Though we have several feature types at our disposal, we could not dare to evaluate all of them for the fear of the paucity of time. It is observed that tracking helps eliminate the false positives by way of predicting the location of a road sign in the next frame from the knowledge of its current location. But this has not been explored, as it demands an entirely different learning framework. Enthused by the popularity of deep learning neural networks, we have embarked on the recognition of the road signs using high-level deep features extracted from the feature maps of the handcrafted CNN model. It has been shown that operating a kernel function on an image in the convolutional laver amounts to producing a membership function matrix. As a result, several kernel functions acting on the images in the different convolutional layers lead to the feature maps that are the substantially modified membership functions possessing the desirable characteristics. These are roped in while forming the deep hesitancy features classified by SVM in Method-1. On the other hand, in Method-2, the features directly derived from the feature maps are classified using a variant of the Hanman Transform Classifier where the error gradient vectors between the training and test feature vectors are converted into the fuzzy gradient transform values that imbibe the capability of identifying the unknown class of the test feature vector as stated in the Hanman law. The results from the two methods vindicate the effectiveness of the deep features and the proposed classifier.
Although several feature types mentioned above as part of the contributions are created, all of them are not investigated due to their unsuitability to the present research work like possibilistic-probabilistic entropy/transform features, the computation burden involved in their implementation and the results generated consume space, thus necessitating a future study. For the reduction of the number of color features, fusion of any two channels’ (Say, G and B) features with the third channel (i.e. R) features is an option. In the parlance of information set theory, the fusion of G and B channels’ information values with R channel information values by the T-norms needs a concerted effort. A caveat lies in the slowness of the proposed Hanman Transform Classifier due to the requirement of both the training and test feature vectors simultaneously, and to make it faster, the training has to be separated from the testing for which all the training feature vectors have to be aggregated by the T norms under aegis of the Hanman law that offers succor to deal with a large number of feature vectors, thus demanding an investigation of the parametric T-norms for the purpose of aggregation as future work. In this work, we have addressed the twin issues of certainty representation and inaccurate fuzzy modeling. It would be interesting to work on another issue of fuzzy roughness.