A New Stochastic Restricted Liu Estimator for the Logistic Regression Model ()
1. Introduction
Consider the following multiple logistic regression model is
(1.1)
which follows Bernoulli distribution with parameter
as
(1.2)
where
is a
vector of coefficients and
is the ith row of X, which is an
data matrix with P explanatory variables,
is independent with mean zero and variance
of the response
. The maximum likelihood method is the most commonly used method of estimating parameters and the Maximum Likelihood Estimator (MLE) is defined as
(1.3)
where
;
and Z is the column vector with ith element equals
, which is an asymptotically unbiased estimate of
. The covariance matrix of
is
(1.4)
Multicollinearity inflates the variance of the Maximum Likelihood Estimator (MLE) in the logistic regression. Therefore, MLE is no longer the best estimate of parameter in the logistic regression model.
To overcome the problem of multicollinearity in the logistic regression, many scholars conducted a lot of research. Schaffer et al. (1984) [1] proposed Ridge Logistic Regression (RLR). Aguilera et al. (2006) [2] proposed Principal Component Logistic Estimator (PCLE). Nja et al. (2013) [3] proposed Modified Logistic Ridge Regression Estimator (MLRE). Inan and Erdogan (2013) [4] proposed Liu-type estimator (LLE).
Some scholars also improve estimation by limiting unknown parameters in the model which may be exact or stochastic. Where additional linear restriction on parameter vector is assumed to hold, Duffy and Santer (1989) [5] proposed Restricted Maximum Likelihood Estimator (RMLE), Siray et al. (2014) [6] proposed Restricted Liu Estimator (RLE), Asar Y et al. (2016) [7] proposed Restricted Ridge Estimator. Where additional stochastic linear restriction on parameter vector is assumed to hold, Nagarajah V, Wijekoon P (2015) [8] proposed Stochastic Restricted Maximum Likelihood Estimator (SRMLE), Varathan N, Wijekoon P (2016) [9] proposed Stochastic Restricted Liu Maximum Likelihood Estimator (SRLMLE), Varathan N, Wijekoon P (2016) [10] proposed Stochastic Restricted Ridge Maximum Likelihood Estimator (SRRMLE).
In this article, we propose a new estimator which is called the Stochastic Restricted Liu Estimator (SRLE) when the linear stochastic restrictions are available in addition to the logistic regression model. The article is structured as follows. Model specifications and the new estimators are proposed in Section 2. Section 3 is derived to compare the mean square error matrix (MSEM) of SRLE, MLE etc. Section 4 is a Numerical Example. A Monte Carlo Simulation is used to verify the above theoretical results shown in Section 5.
2. The Proposed Estimators
For the unrestricted model given in Equation (1.1), the LLE proposed by Liu (1993), Urgan and Tez (2008), Mansson et al. (2012) is defined as
(2.1)
where
is a parameter and
. The bias and variance matrices of the LLE:
(2.2)
(2.3)
In addition to sample model (1.1), let us be given some prior information about
in the form of a set of j independent linear stochastic restrictions as follows:
(2.4)
where H is a
of full rank
known elements, h is an
stochastic known vector and v is an
random vector of disturbances with dispersion matrix
and mean 0, and
is assumed to be known
positive definite matrix. Further, it is assumed that v is stochastically independent of
, i.e.
.
For the restricted model specified by Equations (1.1) and (2.4), the SRMLE proposed by Varathan Nagarajah and Pushpakanthie (2015), the SRLMLE proposed by Varathan N, Wijekoon P (2016) are denoted as
(2.5)
(2.6)
respectively, the bias and variance matrices of the SRMLE and SRLMLE:
(2.7)
(2.8)
(2.9)
and
(2.10)
respectively.
We propose the Mix Maximum Likelihood Estimator (MME) [11] in logistic regression model which through analogy OME [12] in linear model. Defined as follows
(2.11)
the bias and variance matrices of the MME:
,
In this paper, we propose a new estimator which is named Stochastic Restricted Liu Estimator. Defined as follows
(2.12)
the bias and variance matrices of the SRLE:
(2.13)
and
(2.14)
respectively.
Now we will give a theorem and a lemma that will be used in the following paragraphs.
Theorem 2.1. [13] (Rao and Toutenburg, 1995) Let
:
such that
and
. Then
.
Lemma 2.1. [14] (Rao et al., 2008) Let the two
matrices
,
, then
if
.
3. Mean Square Error Matrix (MSEM) Comparisons of the Estimators
In this section, we will compare SRLE with MLE, LLE, SRMLE, SRLMLE under the standard of MSEM.
First, the MSEM of
which is an estimator of
is
(3.1)
where
is the bias vector and
is the dispersion matrix. For two given estimators
and
, the estimator
is considered to be better than
in the MSEM criterion, if and only if
(3.2)
The scalar mean square error matrix (MSE) is defined as
(3.3)
Note that the MSEM criterion is always superior over the scalar MSE criterion, we only consider the MSEM comparisons among the estimators.
3.1. MSEM Comparisons of the MLE and SRLE
In this section, we make the MSEM comparison between the MLE and SRLE.
First, the MSEM of MLE and SRLE as
(3.4)
and
(3.5)
respectively.
We now compare these two estimates to the criterion of the MSEM
(3.6)
where
and
. Obviously,
is non-negative definite matrices,
and
are positive definite. Using Theorem 2.1, it is clear that
is positive define matrix. By Lemma 2.1, if
, where
is the largest eigen value of
then
is positive definite matrix. Based on the above discussions, the following theorem can be proved.
Theorem 3.1. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is superior to MLE if and only if
in the MSEM sense.
3.2. MSEM Comparisons of the LLE and SRLE
First, the MSEM of LLE as
(3.7)
We now compare these two estimates to the criterion of the MSEM
(3.8)
where
. Obviously,
is positive definite. Based on the above discussions, the following theorem can be proved.
Theorem 3.2. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is always superior to LLE in the MSEM sense.
3.3. MSEM Comparisons of the SRMLE and SRLE
First, the MSEM of SRMLE as
(3.9)
We now compare these two estimates to the criterion of the MSEM
(3.10)
where
and
. Obviously,
is non-negative definite matrices,
and
are positive definite. Using Theorem 2.1, it is clear that
is positive define matrix. By Lemma 2.1, if
, where
is the largest eigen value of
then
is positive definite matrix. Based on the above discussions, the following theorem can be proved.
Theorem 3.3. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is superior to SRMLE if and only if
in the MSEM sense.
3.4. MSEM Comparisons of the SRLMLE and SRLE
First, the MSEM of SRMLE as
(3.11)
Now, we consider the following difference
(3.12)
where
and
. Obviously,
,
and
are positive definite matrices. By Lemma 2.1, if
, where
is the largest eigen value of
then
is positive definite matrix. Based on the above discussions, the following theorem can be proved.
Theorem 3.4. For the restricted linear model specified by Equations (1.1) and (2.4), the SRLE is superior to SRLMLE if and only if
in the MSEM sense.
4. Numerical Example
In this section, we now consider the data set of IRIS from UCI to illustrate our theoretical results.
A binary logistic regression model is set where the dependent variable is as follows. If the plant is Iris-setosa, it is indicated with 0 and if the plant is Iris-versicolor, it is 1. The explanatory variables is as follows.
: Sepal. Length;
: Petal. Length; and
: Petal. Width.
The sample consists of the first 80 observations. The correlation matrix can be seen in Table A1 (Appendix A). From Table A1 (Appendix A), it can be seen that the correlations among the regressors are all greater than 0.80 and some of them are close to 0.98 and the condition number is 55.4984 showing that there is a severe multicollinearity problem in this data.
From Table A2 (Appendix A) we can conclude that:
1) With the increase of d, the MSE values of the estimators are decreasing which are LRE, SRRMLE, SRLRE, SRLMLE, SRLE. 2) With the increase of d, the MSE values of the estimators are same which are MLE, SRMLE, MME. 3) The new estimator is always superior to the other estimators.
5. Monte Carlo Simulation
To illustrate the above theoretical results, the Monte Carlo Simulation is used for data Simulation. Following McDonald and Galarneau (1975) [15] and Kibria (2003) [16] , the explanatory variables are generated using the following equation.
(5.1)
where
are pseudo-random numbers from standardized normal distribution and
represents the correlation between any two explanatory variables.
In this section, we set
to take 0.70, 0.80, 0.99 and n to take 20, 100, 200 for the dependent variable with two and four explanatory variables. The dependent variable
in (1.1) is obtained from the Bernoulli (
) distribution where
. The parameter values of
are chosen so that
and
. Further for the Liu parameter d, some selected values is chosen so that
. Moreover, for the restriction, we choose
(5.2)
The simulation is repeated 2000 times by generating new pseudo-random numbers and the simulated MSE values of the estimators are obtained using the following equation
(5.3)
The results of the simulation are reported in Tables A3-A9 (Appendix A) and also displayed in Figures A1-A3 (Appendix B).
From Tables A3-A9, Figures A1-A3, we can conclude that:
1) The MSE values of all the estimators are increasing along with the increase of
; 2) The MSE values of all the estimators are decreasing along with the increase of n; 3) SRLE is always superior to the MLE, LLE, SRMLE, SRLMLE for all d, n and
.
6. Conclusion Remarks
In this paper, we proposed the Stochastic Restricted Liu Estimator (SRLE) for logistic regression model when the linear stochastic restriction was available. In the sense of MSEM, we got the necessary and sufficient condition or sufficient condition that SRLE was superior to MLE, LLE, SRMLE and SRLMLE and Verify its superiority by using Monte Carlo simulation. How to reduce the new estimation’s bias is the focus of our next step which guaranteed mean square error does not increase.
Acknowledgements
This work was supported by the Natural Science Foundation of Henan Province of China (No. 152300410112).
Appendix A
Table A1. The correlation matrix of the dataset.
Table A2. The estimated MSEM values for different d.
Table A3. The estimated MSEM values for different d when
and
.
Table A4. The estimated MSEM values for different d when
and
.
Table A5. The estimated MSEM values for different d when
and
.
Table A6. The estimated MSEM values for different d when
and
.
Table A7. The estimated MSEM values for different d when
and
.
Table A8. The estimated MSEM values for different d when
and
.
Table A9. The estimated MSEM values for different d when
and
.
Appendix B
Figure A1. The estimated MSE values for MLE, LLE, SRMLE, SRLMLE and SRLE for
.
Figure A2. The estimated MSE values for MLE, LLE, SRMLE, SRLMLE and SRLE for
.
Figure A3. The estimated MSE values for MLE, LLE, SRMLE, SRLMLE and SRLE for
.