Application of Equality Test of Coefficients of Variation to the Heteroskedasticity Test ()
1. Introduction
Gauss-Markov’s theorem states that the least squares estimator is called BLUE, because it is the Best linear Unbiased Estimator, in the sense that it provides the lowest variances for estimators ( [1], p. 53). However, the presence of heteroskedasticity in a considered regression model may bias the standard deviations of parameters obtained by the Ordinary Least Square (OLS) method ( [2], p. 31). In this case, several hypothesis tests on the model under consideration may be biased, for example, CHOW’s coefficient stability test (or structural change test) ( [3], p. 25), Student’s t-test and Fisher’s F-test. Heterosedasticity tests are already available in the literature. Examples include the Levene test, the Goldfeld-Quandt test, the White test, the Gleisjer test and the Breush test. Most of these tests are based on the comparison of variances.
Today, tests of comparison of Coefficients of Variation (CVs) have appeared in the literature. Examples include the Curto test [4], the application of the Rényi divergence proposed by Pardo (1999) [5], the test based on a numerical approach by Gokpinar (2015) [6], the Forkman test [7], McKay and Miller’s statistics [8].
To our knowledge, the first use of the coefficient of variation in the detection of heteroskedasticity was offered by Li and Yao (2017) [9]. Thus, the question is: “is it possible to find an application of these CV equality tests to detect the existence of heteroskedasticity?”
The rest of this article is organized as follows: Section 2 will discuss the position of our problem; Section 3 will present a state of the art on heteroskedasticity test; Section 4 will propose an approach to using a CV equality test when detecting heteroskedasticity; and finally, a conclusion is given at the end.
2. Position of Problem
We have a simple linear regression model
(1)
such that the
are the errors made when applying the model. We want to check if the variance of the errors is constant for t ranging from 1 to n. That is, we want to test if the model is homoscedastic or heteroscedastic. Figure 1 shows an example of homoscedastic model, and Figures 2-4 show three examples of
Figure 1. Homoskedastic model (
).
Figure 2. Heteroscedastic model (
increases with the exogenous variable).
Figure 3. Heteroscedastic model (
decreases with the exogenous variable).
Figure 4. Heteroscedastic model (
represents a concave look).
heteroscedastic model. We note that these four models all have the same regression line equation:
.
3. State of the Art on the Homoskedasticity Test
We consider the general linear regression model
. The various tests, which we will mention below, consist in testing the following hypothesis:
3.1. Breusch-Pagan Test
The Breusch-Pagan Test assumes that the squares of the errors
are related to the dependent variable Y. According to Leblond (2003) ( [2], p. 31), the Breusch-Pagan test is done in four steps:
1) Recover the residues
of the regression;
2) Generate the residue square (
);
3) Regress the residue square on the variables dependent on the original regression (
, where
and
to be determined);
4) Test if the coefficients are jointly significant (Perform the F-test):
(2)
where k is the number of explanatory variables
, n is the sample size and
is the coefficient of determination of
and Y.
Decision-making: We accept the null hypothesis
at the confidence level
, if
, where
is the critical value of F-distribution at risk
, at k and
degrees of freedom.
3.2. Goldfeld-Quandt Test
The Goldfeld-Quandt test assumes that there is an explanatory variable
that influences the variance of errors, such as
, where h is an increasing function ( [10], p. 103). The test is summarized as follows:
1) Sort the observation values according to the increasing or decreasing values of the explanatory variable
suspected of being the source of heteroskedasticity.
2) Divide the observations into two groups:
where
and
.
3) Calculate the error variance estimators for each sub-sample:
(3)
(4)
where
is the estimator of the parameter a by the least squares method,
and k is the number of explanatory variables of the model.
4) Calculate the Goldfeld-Quandt statistic:
(5)
The
statistic follows the F-distribution at
and
degrees of freedom, noted as
.
Decision-making: The null hypothesis
is rejected at confidence level
, if
.
3.3. Gleisjer’s Test
The Gleisjer test can detect both heteroskedasticity and the form that this heteroskedasticity takes ( [1], p. 150). The Gleisjer test assumes that there is a relationship between the error
of the model and the variable
assumed to be the cause of heteroskedasticity. The steps of the test are summarized as follows:
Step 1: Determination of the residues generated by the suspected variable
.
1) Regress Y to X. This gives the simple regression model
.
2) Calculate the estimators of a and b using the Ordinary Least Squares method:
and
.
3) Estimate the model’s residues
by its estimators:
.
Thus, the vector of residues
is known.
Step 2: Proposal of possible forms of existing heteroskedasticity.
Gleisjer suggests testing different forms of possible relationships between
and
, for example:
1) Type 1:
(6)
where
is the residue of this model. This relationship generates the type of heteroskedasticity
, where c is a non-zero real constant. Thus, the variance of errors is a function of the squares of the suspected explanatory variable
.
2) Type 2:
(7)
This relationship generates the type of heteroskedasticity
. In this case, the variance of the errors is proportional to the values of the suspected explanatory variable
3) Type 3:
(8)
This relationship leads to heteroskedasticity of type
.
Step 3: Detection of heteroskedasticity
Significance test of the regression coefficient
:
(9)
with
and
,
where
follows the t-distribution at
degrees of freedom.
Decision-making: The null hypothesis
is rejected at confidence level
, if there is a
, such that
.
If the existence of heteroskedasticity is validated, then the relationship with the highest
represents the form of existing heteroskedasticity.
3.4. White’s Test
White’s test consists in testing the existence of a relationship between the square of the residue and one or more explanatory variables or its squares. The test procedures can be summarized as follows:
Step 1: Determination of model’s residues.
1) When the parameters of the model
are estimated, then we have the estimation of the residues:
.
2) Step 2: Regression of
to
and
and validation.
3) We consider the model:
(10)
what can be written in matrix form:
, where
and
4) The estimator of u is:
5) Calculate the variance of the errors:
, with
.
6) Calculate the variance-covariance matrix of parameters
and
:
.
In this case, the variance of i-th element of the vector u is:
= i-th element of the diagonal of
.
7) Significance test of parameters
: We calculate:
and
,
.
The statistics
and
follow the t-distribution at
degrees of freedom.
Decision-making: The null hypothesis
is rejected at the confidence level
, if there is a
, such that
. That means, the null hypothesis
is rejected if there is a parameter
significantly different from 0.
3.5. ANOVA Methods
1Maurice Stevenson Bartlett (June 18, 1910-January 8, 2002).
In order to determine the existence of heteroskedasticity, researchers proposed the method of analysis of variances, commonly said ANOVA. According to the application example presented in ( [1], p. 147-148), the application of ANOVA consists in dividing the observations into several classes of values. Following the example of this same example by R. Bourbonnais, we propose the following steps:
1) Order the observations according to the increasing values of the explanatory variable
suspected to be the source of heteroskedasticity.
2) Group the value of the variable
into z classes of values. To determine z, one of the following expressions can be used in ( [11], p. 33):
a)
, where n is the total number of observations, and
is the integer part function;
b) Sturge’s formula:
;
c) Yule’s formula:
.
3) Group the values of the variable to be explained Y according to their corresponding classes (
in the class corresponding to
). Thus, we obtain z samples of Y.
4) Apply the ANOVA test to the z samples of Y, then draw a conclusion.
In the following subsections, we will present some ANOVA tests that can be done in step 4.
3.5.1. Bartlett’s Test
Bartlett’s statistic1 is defined as follows:
(11)
where
,
,
and
is the number of observations belonging to the i-th class,
( [12], p. 273).
Remark: Bartlett’s statistic B follows the chi-square distribution with
degrees of freedom, noted as
, if the residues
are independent and follow the standard normal distribution
.
Decision-making: The homoskedasticity hypothesis
is rejected at confidence level
, if
.
3.5.2. Levene’s Test
The Howard Levene’s statistic proposed in 1960 ( [13], p. 4) is defined as follows:
(12)
where,
• z is the number of groups or value categories obtained,
•
is the number of observations belonging to the i-th class, and
,
•
,
•
(average of
in the i class),
•
(average of all
).
Remark: Levene’s F statistic follows the F-distribution with
and
degrees of freedom, noted
. Bartlett’s test is not robust if the normality assumption of
is not verified. However, the Levene test is stable even in the absence of this hypothesis.
Decision making: The null hypothesis
is rejected at the confidence level
if
.
3.5.3. Brown-Forsythe’s Test
The Brown-Forsythe test is an improvement on the Levene test. To get the Brown-Forsythe statistic, just change
to
, where
is the median of the i-th group of values. Brown-Forsythe’s statistic is more robust than Levene’s.
3.5.4. Hartley’s Test
We define the Hartley’s statistic ( [14], p. 14) by:
(13)
where
,
and
= variance of the Y values of the i-th group, such as
.
Remark: The Hartley test cannot be used if the group sizes
are not equal. The critical values of the H statistic are tabulated in the Hartley table.
Decision making: We reject null hypothesis
at the confidence level
if
3.5.5. Cochran’s Test
The Cochran’s statistic is defined as follows:
(14)
Remarks: The Cochran’s test cannot be used if the group sizes
are not equal. The critical values of the C statistic are tabulated in the Cochran’s table.
Decision making: We reject the null hypothesis
at the confidence level
if
.
3.6. Zhaoyuan Li and Jianfeng Yao Test
Zhaoyuan Li and Jianfeng Yao [9] proposed two measures to detect heteroskedasticity in a multivariate linear model.
1) Test based on the likelihood ratio:
(15)
where
and
( [9], p. 9).
follows the standard normal distribution
, and
is the Euler’s constant ( [9], p. 10).
Decision making: the
assumption is rejected at the confidence level
, if
, where
is the quantile of
at the risk threshold
. For
, we have
.
2) Coefficient of variation test:
(16)
where
.
follows the standard normal distribution
( [9], p. 11).
Decision making: the
assumption is rejected at the confidence level
, if
.
This last test shows a trend in the use of coefficient of variation in the detection of heteroskedasticity.
4. Application of the Equality Test of Coefficients of Variation to the Heteroskedasticity Test
4.1. Our Approach
In this section, we will show that the test of equality of coefficients of variation allows us to detect the existence of heteroskedasticity. The steps of our approach can be summarized as follows:
1) Estimate the parameter a of the regression model of Y to X, noted as
.
2) Estimate the model’s residues:
.
3) Calculate the square of residues:
.
4) As the Goldfeld-Quandt method, divide the residue squares into two groups:
where
and
.
5) Calculate the Johannes Forkman’s statistic ( [7], p. 10):
(17)
where
for
,
,
,
and
.
Decision making: if
, then we accept
at the confidence level
.
is the quantile
of F-distribution with
and
degrees of freedom.
We chose Forkman’s statistic because
is stable for all
, where
( [7], p. 11).
4.2. Monte Carlo Simulation
Now, we will test the robustness of these measures proposed in the literature and the one in which we have proposed.
4.2.1. Methodology
Like the Gleisjer method, our simulation consists of generating two variables X and Y of size
, such as
and
(see the Section 3.3). Thus, we consider 3 forms of heteroskedasticity: 1)
, 2)
and 3)
( [1], p. 151).
Moreover, in order to enrich the forms of heteroskedasticity studied, we also propose to take the other three forms considered by Li and Yao: 4)
, 5)
and 6)
, where
is a random variable following the standard normal distribution
( [9], p. 15).
In this simulation, we consider only the simple regression model. We repeat
times this test, and we count the number k of times the test rejects the
hypothesis at the 95% confidence level. Then, the probability
is calculated.
As p is a random variable, then we repeat these procedures several times (1000 times), then we calculate
. We really put ourselves in the
case where the error is significantly not negligible (value of
sufficiently different from 0).
So, if
, then the test is considered robust. In addition, the measure with the highest
is the measure considered most sensitive to the type of error i considered (
).
As we want to test the robustness of the test, then it would be better to check whether the test in question detects small variations or not. During the simulations we did, we took
,
,
and
. We took
, because it is already different from 0, but judged subjectively low value.
In Table 1, the probabilities
,
,
,
,
,
,
and
correspond respectively to the rejection probabilities of the null hypothesis
of the Breush, Goldfeld-Quandt, Gleisjer, White, Bartlette, Levene, Li and Yao tests, and our proposal.
4.2.2. Simulation Results
From Table 1, we obtain the classifications in Tables 2-6.
4.3. Discussion
First of all, from these simulations, it is indisputable that the Levene test is the most robust and sensitive of all the tests considered in this study.
However, these results show that, among the 06 forms of heteroskedasticity proposed, our proposal can detect 04 for
, and 05 for
.
In general, our proposal fails to detect the only form of heteroskedasticity
(whether for
or
.)
Furthermore, it is the second best test to detect the heteroskedasticity of type
for
.
In addition, our proposal seems better than the Li and Yao test, which is, to our knowledge, the first tendency to use the coefficient of variation to detect heteroskedasticity.
Table 1. Results of monte carlo simulations.
Table 2. Classification of tests in ascending order according to their wrong acceptance numbers of H0.
Table 3. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity proposed by Gleisjeir for
.
Table 4. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity proposed by Gleisjeir for
.
Table 5. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity considered by Li and Yao for
.
Table 6. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity considered by Li and Yao for
.
Finally, these results contribute to the justification of the weakness of Bartlette’s test. Indeed, we see from these results that this test is less robust than our proposal.
5. Conclusions
In this paper, we proposed a technique to detect the existence of heteroskedasticity by an equality test of the coefficients of variation. Thus, to illustrate our state of the art, we first recalled some tests to detect the existence of heteroskedasticity existing in the literature, such as the Breusch-Pagan test, the Goldfeld-Quandt test, the Gleisjer test, the White test and some heteroskedasticity tests based on an analysis of variance (ANOVA): Bartlett’s test, Levene’s test, Brown-Forsythe’s test, Hartley’s test and Cochran’s test.
Next, we also presented the heteroskedasticity test of Zhaoyuan Li and Jianfeng Yao. To the best of our knowledge, the Zhaoyuan Li and Jianfeng Yao test was the first tendency to use coefficients of variation to determine the existence of heteroskedasticity.
Among the equality tests of coefficients of variation available in the literature, we have considered Forkman’s test to illustrate our approach, as it is a robust and stable test for a sample with size
. The results of our performance tests have shown that our approach can detect 5 types of heteroskedasticity among the 6 types considered in this paper.
At the end of this analysis, we affirm that the equality test of coefficients of variation allows us to detect the existence of possible heteroskedasticity in a simple regression model. Thus, our study contributes to the reapplication of several equality tests of coefficients of variation that have already appeared in the literature.
Acknowledgements
We thank the Editor and the referee for their comments and assistance.