Parker Test for Heteroskedasticity Based on Sample Fitted Values ()
1. Introduction
A basic assumption of classical linear regression analysis is that the random error terms of the model
are homoskedastic, i.e. they have the same variance
. However, studies have shown that heteroskedasticity is an almost universal phenomenon when regression analysis is performed with cross-sectional or time-series data. Therefore, the study of heteroskedasticity in econometric modelling has become a hot issue for many scholars. There are many different tests for heteroskedasticity. For example, the graphical test, the Parker test, the Spearman’s rank correlation test, the Glejser test, the White test, the G-Q test and so on [1] - [7], Bai Xuemei [8] proposed various methods to test heteroscedasticity, including Parker’s test model and its existing shortcomings, Liu Ming and Huang Hengjun [9] proposed to use sample fitting value
as the standard for sorting sample points when conducting G-Q test on the linear regression model with multiple explanatory variables, and improved the method of sorting sample points several times by using the order of each explanatory variable, thus improving the work efficiency.
Among them, the traditional Parker test can not only test the existence of heteroskedasticity in the one-dimensional linear regression model, but also write the specific expression of heteroskedasticity. However, for multiple linear regression models, the traditional Parker test does not have a specific equation to test, and can only test each explanatory variable one by one, which is a tedious process and computationally intensive, and can lead to multiple heteroskedasticity models. Tan Xin [10] et al. improved on this problem by using the idea of principal components to build a heteroskedasticity model with all sample principal components instead of a single explanatory variable, changing from multiple equations to one equation, thus greatly simplifying the traditional Parker test method for testing heteroskedasticity in multiple linear regression models. However, there are certain problems with this improved method. First, the significant coefficient of the principal component of the improved Parker test with the principal component idea does not mean that the heteroskedasticity is related to the corresponding explanatory variable, so the regression coefficients of the principal component cannot be compared with the regression coefficients of the corresponding independent variable in the traditional test. Secondly, the Parker test based on the idea of principal component also needs to calculate the principal component of the sample to test the heteroscedasticity. Therefore, on the premise of no loss of sample information, this paper uses the sample fitting value as a new explanatory variable to establish a regression model, carries out the significance test of the coefficient, and gives a new heteroscedasticity test method, which simplifies the steps of heteroscedasticity test for the multiple linear regression model. It is also compared with Tan Xin’s improved Parker’s test. To ensure the completeness of the study, a brief introduction to the definition of heteroskedasticity and the traditional Parker test is given below.
2. Heteroskedasticity Model
(1)
The linear regression model is a univariate regression model when
and multiple regression model when
, the number of k explanatory variables is the number of explanatory variables, y called the explanatory variable (dependent variable) and
called the explanatory variable (independent variable). In the regression model (1), if there is
homoscedasticity for all
, regardless of the value taken x, then the
have homoscedasticity, and when homoscedasticity is not satisfied, but the other basic assumptions are satisfied, i.e. when it is
not equal to a constant, the variance of the random error term is no longer a constant, but is different from each other, then the error term is said
to have heteroscedasticity. This model is called a linear regression model with heteroskedasticity, or a heteroskedasticity model for short.
In particular, heteroskedasticity is usually unavoidable when discussing cross-sectional data. For example, when discussing a linear regression model of firm profits with a number of explanatory variables, it is clear that the profits of large-scale firms have greater volatility than those of small-scale firms. Another example is that when examining a linear regression relationship between household income (the explanatory variable) and savings (the dependent variable), since high-income households have a larger surplus in addition to necessary household expenditures and a larger discretionary component, the variability in the amount of savings is also greater, which is the variation in the amount of savings is greater.
Heteroskedasticity in the model generally arises from four sources: omission of some important explanatory variables from the model; poor model setup; measurement error arising from the sample data; and variation in error over time.
3. The Traditional Parker Test
The Park test was proposed by Park in 1966. The Park test is based on a residual diagram that suggests
a function of the explanatory variables
, and then formulates the diagram as a function of
, taking the logarithm to obtain
, as
it is unknown, Parker proposed that
be represented by the residual squared of
. The following regression is performed:
If
the above equation is statistically significant, then the data is heteroscedastic; if
it is not statistically significant, then there is no heteroscedasticity.
Specific steps of the traditional Parker test.
Step 1: Estimate the original regression using ordinary least squares to derive the square of
the sample residuals.
Step 2: Regress the
log of the explanatory variables associated with the heteroskedasticity on the basis of:
Step 3: Perform a statistical test on the above equation and reject the null hypothesis of homoscedasticity if
it is statistically significant, or accept the null hypothesis of homoscedasticity if
it is not statistically significant.
4. Parker’s Test Improved by Principal Components Thinking
Principal component analysis of explanatory variable
, all the principal components obtained were logarithmically processed with
, and the following heteroscedasticity model was established according to the new data:
(2)
where
denotes the principal components generated based on the explanatory variables
.
The least squares method was used to estimate the coefficients for model (2). The significance of the coefficients
in the model is tested using p-values. Comparing the p values obtained with
the coefficients in the model, the presence of significant coefficients in the
model indicates heteroskedasticity, while conversely, the assumption of homoskedasticity is satisfied.
5. An Improved Parker Test Based on Sample Fitted Values
The above-mentioned traditional Parker test method is a complex and cumbersome process of construct in k a structural form of heteroskedasticity to test for the presence of heteroskedasticity in the original model. Based on this, we propose to use the sample fitting value
as the new explanatory variable to establish the regression equation with the residual logarithm, and compare the two methods with the Parker test improved by the principle component idea to compare the effectiveness and simplicity of the two methods.
Specific steps to improve the Parker test.
Step 1: OLS estimation of Equation (1) to obtain the sample fit
and residuals.
Step 2: Establish a
regression with the logarithm of the fitted value
as the explanatory variable and the logarithm of the squared residuals as the explained variable.
Step 3: Perform a statistical test on the above equation, if
statistically significant, reject the null hypothesis of same variance; if
statistically insignificant, accept the null hypothesis of homoscedasticity.
6. Random Simulation
The study of heteroskedasticity test analysis through random simulation can demonstrate the usefulness and validity of both the Parker test with sample principal components as explanatory variables and the Parker test with sample fitted values as explanatory variables.
1) Generation of simulation data
To generate the random simulation data, three sets of sample variables
all with mean 0, variance 1 and standard normal distribution, were set with a sample size of 400. Considering that the correlation between the three variables might affect the test results, an additional covariance (variance 1) was set between the two variables, Pearson’s correlation coefficient was 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9. The parameters
,
,
,
were set and the explanatory variables
were generated as follows:
(3)
where
is the 400 normal random terms from the mean of 0, set in the form of:
where
are mutually independent random variables that follow a standard normal distribution.
Obviously
this form of random term is highly susceptible to heteroskedasticity, and the heteroskedasticity is related to the explanatory variables
.
denotes the value
ofp the coefficient test,
denotes the p value of the coefficient tes
, and
denotes the value
ofp the coefficient test. Since the Parker test with the fitted value as the new explanatory variable has only onep value, the results of the test are written centered as shown in Table 1.
The following conclusions were drawn from Table 1.
Table 1. Significance tests for coefficients in the model (one simulation) Table
a) From the Parker test modified with the principal component idea, it is clear from the results that not only are the coefficients
significant, but some of the
coefficients are also significant, contradicting our assumption that heteroscedasticity is
correlated. This is because the Parker method, modified with the principal component idea, contains all the information, and the principal component
contains the
information, so the significant coefficient does not mean
that the coefficient is significant.
b) Comparing the two methods, the significance results are the same. The improved Parker method using the fitted values of the samples is all significant, indicating the hypothesis that the heteroskedasticity is
correlated, leading to the overall significance of this multiple linear regression model. In contrast to the improved Parker test with the principal component idea, the results of the new and improved method are clear at a glance, there are no conditions that violate the hypothesis, and only one p value is needed to obtain the presence of heteroskedasticity in the multiple linear regression model.
A further 10,000 randomised simulation experiments were conducted at each correlation coefficient occasion and the simulated data were used to test for heteroskedasticity, comparing the Parker test with the principal component as the explanatory variable with the Parker test under the modified approach, the results of which are shown in Table 2.
The following conclusions were drawn from Table 2.
Regardless of the magnitude of the correlation coefficients between the explanatory variables, the number of significance counts for the test with the sample principal component as the explanatory variable is lower than the number of significance counts under the improved method, indicating that the improved Parker test with the principal component idea violates the prior assumption of the existence of heteroskedasticity in the multiple linear regression model,
Table 2. Significance tests for coefficients in the model (10,000 simulations) Table.
further indicating that the improved method is superior to the method with the sample principal component as the explanatory variable. It follows that the new method can replace the method of using the main component as the interpretation variable. The new method is also more concise and easy to calculate.
7. Analysis of Practical Examples
1) Data sources
The regional gross domestic product (y), per capita consumption expenditure (
), per capita regional general budget expenditure (
), and price index of fixed asset investment (
) by region were collected in the statistical yearbook for 31 provinces in 2018, in RMB. First,the following multiple regression model is established:
(4)
and the OLS regression of (4) is performed to obtain a set of fitted values
, and the square of the residuals is calculated
(5)
a) Methodological steps for using sample principal components as new variables:
Step 1: OLS regression of equation (5) to obtain a set of residuals
Step 2: Calculate the sample principal components, get the first, second and third principal components
, the new variable contains all the information of the explained quantity
, bring 31 groups of observations into the above equation to get
Step 3: Regress the logarithm of
to
, which is
Step 4: Calculate and obtain the significance test of the regression coefficients.
Multiple regression model (4) have Heteroskedasticity.
b) Methodological steps with fitted values
as new variables.
Step 1: Perform an OLS regression on the following equation to obtain a set of residuals
;
Step 2: Calculate the sample fit
(as it reflects the overall variation of the data and captures information on variance changes) to obtain a set of
,
;
Step 3:
return the pairs
;
Step 4: Calculate and obtain the significance test of the regression coefficients.
Table 3. Sample principal components as new variables for Parker test output.
Table 4. output of the Parker test with the sample fitted values as new variables.
At the significance level
, the results of the test in Table 3 show that the values
of p are less than 0.05, indicating that the regression coefficient is significantly non-zero, which is the logarithmic value of the residuals is related to the second principal component, but it is still not possible to find whether the heteroskedasticity is related to
,
or
.
At the significance level
, it was learned from Table 4 tests that the p regression coefficient is less than 0.05, which means that the logarithm of the residuals is significantly non-zero,
there is a correlation between the logarithm and the residuals.
The same conclusion was obtained using a modified Parker test, with heteroskedasticity in the multiple regression model.
In summary, it can be seen that the Parker test modified with principal components and the Parker test modified with sample fitted values reach the same conclusion, although both methods only require a heteroskedasticity model to obtain the existence of heteroskedasticity in the original model, the implementation of the Parker test modified with sample fitted values is simpler and faster in the experimental process, omitting the step of calculating the principal components of the sample, and the results are more valid. In contrast, the sample fit also contains all the information on the explanatory variables and the fit is more reflective of the variance trends in the overall data. Therefore, it can be concluded that the method using the sample fitted values as new variables is more effective than the method using the sample principal components. This indicates that the improved method can replace the Parker test with the principal components as the explanatory variables.
8. Conclusions
There are many different methods of testing for heteroskedasticity in regression models, and scholars at home and abroad have proposed many different tests that are more effective than the traditional methods. In the paper, a new test method is proposed on the basis of the traditional Parker test method. When constructing the auxiliary regression model, the regression equation of the logarithm of the residuals of the fitted original regression model against the fitted values calculated by least squares (OLS) is fitted to the original regression model, and the effectiveness and simplicity of the newly proposed method in the paper are proved after simulation and example analysis.
The deficiencies of the article and the prospect of the future:
when constructing the auxiliary regression model, the improved Parker test proposed by us only considers the regression equation of fitting residual logarithm to fitting value, but does not consider the specific form of heteroscedasticity, which needs to be further studied in the future work.