Robust Estimators for Poisson Regression

Idriss Abdelmajid Idriss; Weihu Cheng

doi:10.4236/ojs.2023.131007

Open Journal of Statistics > Vol.13 No.1, February 2023

Robust Estimators for Poisson Regression

Idriss Abdelmajid Idriss, Weihu Cheng
College of Applied Science, Department of Statistics, Beijing University of Technology, Beijing, China.
DOI: 10.4236/ojs.2023.131007 PDF HTML XML 161 Downloads 933 Views

Abstract

The present paper proposes a new robust estimator for Poisson regression models. We used the weighted maximum likelihood estimators which are regarded as Mallows-type estimators. We perform a Monte Carlo simulation study to assess the performance of a suggested estimator compared to the maximum likelihood estimator and some robust methods. The result shows that, in general, all robust methods in this paper perform better than the classical maximum likelihood estimators when the model contains outliers. The proposed estimators showed the best performance compared to other robust estimators.

Keywords

Poisson Regression Model, Maximum Likelihood Estimator, Robust Estimation, Contaminated Model, Weighted Maximum Likelihood Estimator

Share and Cite:

Idriss, I. and Cheng, W. (2023) Robust Estimators for Poisson Regression. Open Journal of Statistics, 13, 112-118. doi: 10.4236/ojs.2023.131007.

1. Introduction

Poisson regression model is widely used for modeling response variables that are counted. It is discussed by [1] . Practically, a common method used to estimate parameters is the maximum likelihood estimator (MLE). Unfortunately, this technique is high sensitivity to outliers in data, see ( [2] [3] ). To overcome this issue, many robust are an alternative to Maximum Likelihood Estimates. One of the first robust methods used to estimate the parameters in Poisson regression models is the Conditionally Unbiased Bounded Influence introduced by [4] . [5] discussed the M-estimator for Poisson regression model, these estimates belong to Mallows-type. [6] developed robust M-estimates for generalized linear models (GLM), these estimates are asymptotically normal and consistent. [7] proposed a fast and stable technique based on breakdown point of the trimmed maximum likelihood for generalized linear models. [8] introduced the class of M-estimators based on quasi likelihood estimators proposed by [9] . [10] developed a robust estimator for Poisson regression model based on Mallows quasi-likelihood estimator. [11] discussed the behaviour of maximum likelihood estimator in the present of outliers. [12] discussed a robust resistant estimator based on the misclassification model. [13] generalized Optimally Bounded Score Function discussed by [14] for linear models to the generalized linear model. More recently, [15] introduced a robust method for logistic regression models. [16] discussed the robust estimators for Poisson regression model with outliers.

In this paper, we introduced a robust method for Poisson regression by using the weight functions proposed by [17] these weight functions are based on Mallow’s type estimator, moreover, to evaluate the performance of the new methods with (MLE), Mallows, and (CUBIF) using the Monte Carlo simulation study. In Section 2, we discuss the Poisson regression model and the maximum likelihood estimators. In Section 3, we provide robust estimators for Poisson regression. In Section 4, we show the results of Monte Carlo simulation study. In Section 5, we offer the conclusions.

2. Poisson Regression Model and ML Estimator

Poisson regression is proper method to model a count data. Probability mass function is

$P (Y = y) = \frac{\exp^{- μ} μ^{y}}{y_{!}}; y = 0, 1, 2, \dots$ (1)

where: $E (Y) = μ$ and $V a r (Y) = μ$ , that is mean the Poisson regression has equal mean and variance. Based on a sample $(y_{1}, y_{2}, \dots, y_{n})$ , the Poisson regression model in terms of the mean of response can be written as follow: $E (y_{i}) = μ$

$y_{i} = E (y_{i}) + ε_{i}, i = 1, 2, \dots, n,$ (2)

where $ε_{i}$ are disturbance terms. The relationship between the mean of the dependent variable and explanatory variable can be describe by use the log-link function:

$μ_{i} = e^{x_{i}^{T}} β,$ (3)

where $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i p})$ is the explanatory variables and $β = (β_{1}, β_{2}, \dots, β_{p})$ is the parameters of regression. The popular method used to estimate parameters in Poisson regression models is the maximum likelihood estimation, the likelihood function of the response variables $(y_{1}, y_{2}, \dots, y_{n})$ is:

$l (Y, μ) = \prod_{i = 1}^{n} p_{i} (y_{i}) = \prod_{i = 1}^{n} \frac{\exp^{- μ_{i}} μ_{i}^{y_{i}}}{y_{i}!},$ (4)

$\log l (Y, μ) = \sum_{i = 1}^{n} y_{i} \log (μ_{i}) - \sum_{i = 1}^{n} μ_{i} - \sum_{i = 1}^{n} \lg (y_{i}!)$

$\log l (y_{1}, y_{2}, \dots, y_{n}) | β, x_{1}, x_{2}, \dots, x_{n} = \sum_{i = 1}^{n} \log p (Y_{i} = y | β, x_{i})$ , with according to (1), $p (Y_{i} = y_{i} | β, x_{i}) = \frac{\exp (- μ_{i}) μ_{i}^{y}}{y_{i}}$ and $μ_{i} = \exp (β^{t} x_{i})$ . We can write the total likelihood as follows:

$\log l (β) = \sum_{i = 1}^{n} [- \exp (β^{t} x_{i}) - \log (y_{i}!)]$

We can define the MLE as: ${\hat{β}}_{M L} = \arg \max l (β)$ . To get the estimates of maximum likelihood for this model, we can maximizing the likelihood function by differentiating it respect to $β$ . While, maximizing the likelihood function has no closed form solution so may use the (Fisher Scoring) or the iteratively weighted least squares algorithm (IWLS) to get the maximum likelihood estimates see ( [1] [18]).

In this paper, we focus on maximum weighted likelihood estimators for Poisson regression model, the maximum weighted likelihood estimator is:

${\hat{β}}_{M L} = {(X^{T} \hat{W} X)}^{- 1} X^{T} \hat{W} \hat{Z}$

There $\hat{Z} = {({\hat{Z}}_{1}, {\hat{Z}}_{2}, \dots, {\hat{Z}}_{n})}^{T}$ , with

${\hat{Z}}_{i} = \log ({\hat{μ}}_{i}) + \frac{y_{i} - {\hat{μ}}_{i}}{{\hat{μ}}_{i}}!$ , ${\hat{W}}_{i} = diag ({\hat{μ}}_{i})$ , and $X = [\begin{matrix} x_{11} & \dots & x_{1 p} \\ ⋮ & ⋮ \\ x_{n 1} & \dots & x_{n p} \end{matrix}]$

3. Robust Poisson Regression

In robust Poisson regression, Mallows-type estimator introduced by [4] can be applied to fit the data a count variables, this method minimizes the weighted log-likelihood function. [5] studied Mallows-type estimator deeply and introduced a robust method for generalized linear models. We can measure the leverage of observation x by used the following:

$h_{n} (x) = {({(x - {\hat{μ}}_{n})}^{T} {\hat{Σ}}_{n}^{- 1} (x - {\hat{μ}}_{n}))}^{1 / 2},$ (5)

where ${\hat{μ}}_{n}$ is the robust location estimator and ${\hat{Σ}}_{n}$ is the robcation estimator $\hat{Σ}$ and $\hat{μ}$ , can be calculated by using minimum covariance determinate (MCD) method. We can get the Mallows-type estimator fust variance-covariance matrix of the predictor variables $(x_{1}, x_{2}, \dots, x_{n})$ . The robust scale and loor Poisson regression by solution the equations:

$\sum_{i = 1}^{n} w_{i} [y_{i} \log (μ_{i}) - μ_{i} - \log (y_{i}!)],$ (6)

where $w_{i} = w (h_{n} (x_{i}))$ , w is a non increasing function such that $w (u)$ is bounded. [5] introduced choosing w depends on a constant $c > 0$ .

$W (u) = {(1 - \frac{u^{2}}{c_{2}})}^{3} I (| u | \leq c),$

this estimate knows as Mallows-type estimator or weighted maximum likelihood estimator (WMLE).

In this paper, we introduced a robust methods for Poisson regression model they are based on maximum weighted likelihood estimators. The weight of this method depends on the function introduced by [17]. We first calculate the initial scatter and location estimators of predictor ${\hat{Σ}}^{(0)}$ and ${\hat{μ}}^{(0)}$ respectively. then, compute the squared Mahalanobis distances of predictor which can be defined as:

$m^{2} = {(x_{i} - {\hat{μ}}^{(0)})}^{T} {({\hat{Σ}}^{(0)})}^{- 1} (x_{i} - {\hat{μ}}^{(0)}) .$

The weight function we introducing can be defined as follows: first weight: $w_{1} = (0.8 * m^{2} + 0.2)$ , where $m^{2}$ indicate to squares Mahalanobis distances, then:

$w_{1} = (0.8 * {(x_{i} - {\hat{μ}}^{(0)})}^{T} {({\hat{Σ}}^{(0)})}^{- 1} (x_{i} - {\hat{μ}}^{(0)}) + 0.2),$

second weight: $w_{2} = (0.8 * {(m^{2})}^{2} + 0.2)$ , then, we can write in the form of:

$w_{2} = (0.8 * {({(x_{i} - {\hat{μ}}^{(0)})}^{T} {({\hat{Σ}}^{(0)})}^{- 1} (x_{i} - {\hat{μ}}^{(0)}))}^{2} + 0.2) .$

Then, the maximum weighted likelihood estimators for Poisson regression model can be gained by a solution the following form:

$\sum_{i = 1}^{n} w_{i} [y_{i} \log (μ_{i}) - μ_{i} - \log (y_{i}!)] .$ (7)

For compute the maximum weighted likelihood estimators we used algorithm of Mallows-type introduced by [5] .

4. Evaluation of the Robust Methods

In order to test the performance of the above estimates, we conduced Monte Carlo simulation study for comparing the a new methods with the Maximum likelihood estimator (MLE), Mallows type estimator for [4] and [5] .

4.1. Monte Carlo Simulation Study

In this subsection, we examine the perform of the new robust methods (WMLEw₁, WMLEw₂) and compare with the maximum likelihood estimate (MLE), Mallows-type estimator (Mallows) of [5] and the conditionally unbiased bounded influence (CUBI) of [4] . The simulation study includes three models. First model is clean model, second model is 5% of data contaminated and third model is 10% of data are contaminated. In the three models, we generated the explanatory variables $x_{i}$ from standard normal distribution $N_{p} (0,1)$ , and the victor of parameter is $β (1,2,2)$ , with four sample size $n = (100,200,300,400)$ , these values were chose to represent moderate and large samples.

The response variables $y_{i}$ are generated from poisson distribution with $p (μ_{i})$ with $μ_{i} = h (η_{i})$ . The outliers are distributed according to Poisson distribution with mean $3 IQR (e^{X β})$ : where IQR is the interquartile range. To examine the perform of these estimators, we compute the Bias and mean squared error (MSE) for the three models. For all scenarios, we run 1000 repetitions. However, a good estimator is the one has small Bias and MSE. Therefore, we compute the bias and MSE for each parameter as follows:

$Bias = | \frac{1}{1000} \sum_{i = 1}^{1000} β_{i} - β |,$

and

$MSE = \frac{1}{1000} \sum_{i = 1}^{n} {| {\hat{β}}_{i} - β |}^{2} .$

4.2. Results from the Monte Carlo Simulation Study

It is seen in Table 1 for clean model, the values of the Bias and MSE for maximum likelihood estimator (MLE), (Mallows) and (CUBIF) are smaller those of new weighted estimators (WMLE₁), (WMLE₂). We can conclude that the WMLE₁ and WMLE₂ estimators perform less compared to others in clean models. But when the 5% of data are contaminated in second scenario (Table 2) and 10% of data are contaminated in third scenario (Table 3), the weighted maximum likelihood estimators (WMLE₁ and WMLE₂) has lower bias and (MSE) compered with others methods, that is mean our new methods perform better compered with other estimators.

Table 1. Bias and MSE of estimators for clean model.

Table 2. Bias and MSE of estimators when 5% of data are contaminated.

Table 3. Bias and MSE of estimators when 10% of data are contaminated.

5. Conclusion

In this paper, we suggested new robust estimators for Poisson regression: the weighted maximum likelihood estimators (WMLE₁ and WMLE₂). To examine the performance of suggested estimators, we conducted a Monte Carlo simulation study to compare the suggested estimators with the classical maximum likelihood estimator (ML), Mallows and CUBIF. The result of simulation study shows that in Table 1 (clean model), the maximum likelihood estimator, CUBIF and Mallows perform close to each other, while the proposed estimators have lower performance compared to other estimators. In contaminated models (Table 2 and Table 3), the new weighted maximum likelihood estimators (WMLE₁ and WMLE₂) performs better compared with other estimators. The extent of new estimators proposed in this paper to other generalized linear models would be an interesting subject to follow.

Acknowledgements

We thank the editor and the referee for their comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. Chapman Hall, London. https://doi.org/10.1007/978-1-4899-3242-6
[2]	Agresti, A. (2001) Categorical Data Analysis. Wiley, New York. https://doi.org/10.1002/0471249688
[3]	Cameron, A. and Trivedi, P. (2013) Regression Analysis of Count Data. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781139013567
[4]	Kunsch, H.R. and Stefonski, L.A. and Carroll, R.J. (1989) Conditionally Unbiased Bounded-Influence Estimation in General Regression Models, with Applications to Generalized Linear Models. Journal of the American Statistical Association, 84, 460-466. https://doi.org/10.1080/01621459.1989.10478791
[5]	Carroll, R.J. and Pederson, S. (1993) On Robustness in the Logistic Regression Model. Journal of the Royal Statistical Society. Series B (Methodological), 55, 693-706. https://doi.org/10.1111/j.2517-6161.1993.tb01934.x
[6]	Bianco, A.M. and Yohai, V.J. (1996) Robust Estimation in the Logistic Regression Model. In: Reider, H., Ed., Robust Statistics, Data Analysis, and Computer Intensive Methods, Springer, Berlin, 17-34. https://doi.org/10.1007/978-1-4612-2380-1_2
[7]	Croux, C. and Haesbroeck, G. (2003) Implementing the Bianco and Yohai Estimator for Logistic Regression. Computational Statistics and Data Analysis Journal, 44, 273-295. https://doi.org/10.1016/S0167-9473(03)00042-2
[8]	Cantoni, E. and Ronchetti, E. (2001) Robust Inference for Generalized Linear Models. Journal of the American Statistical Association, 544, 1022-1030. https://doi.org/10.1198/016214501753209004
[9]	Wedderburn, R.W.M. (1974) Quasi-Likelihood Functions, Generalized Linear Models, and the Gaussi Newton Method. Biometrika, 61, 439-447. https://doi.org/10.1093/biomet/61.3.439
[10]	Hosseinian, S. and Morgenthaler, S. (2011) Weighted Maximum Likelihood Estimates in Poisson Regression. International Conference on Robust Statistics, Antalya, 2008.
[11]	Croux, C., Flandre, C. and Haesbroeck, G. (2002) The Breakdown Behavior of the Maximum Likelihood Estimator in the Logistic Regression Model. Statistics and Probability Letters, 60, 377-386. https://doi.org/10.1016/S0167-7152(02)00292-4
[12]	Copas, J.B. (1988) Binary Regression Models for Contaminated Data. Journal of the Royal Statistical Society B, 50, 225-265. https://doi.org/10.1111/j.2517-6161.1988.tb01723.x
[13]	Stefanski, L.A., Carroll, R.J. and Ruppert, D. (1986) Optimally Bounded Score Functions for Generalized Linear Models with Applications to Logistic Regression. Biometrika, 2, 413-424. https://doi.org/10.1093/biomet/73.2.413
[14]	Krasker, W.S. and Welsch, R.E. (1982) Efficient Bounded-Influence Regression Estimation. Journal of the American Statistical Association, 77, 595-604. https://doi.org/10.1080/01621459.1982.10477855
[15]	Ahmed, I.A.I. and Cheng, W.H. (2020) The Performance of Robust Methods in Logistic Regression Model. Open Journal of Statistics, 10, 127-138. https://doi.org/10.4236/ojs.2020.101010
[16]	Abonazel, M. and Saber, O. (2020) A Comparative Study of Robust Estimators for Poisson Regression Model with Outliers. Journal of Statistics Applications and Probability, 2, 279-286. https://doi.org/10.18576/jsap/090208
[17]	Simeckova, M. (2005) Maximum Weighted Likelihood Estimator in Logistic Regression. Charles University, Faculty of Mathematics and Physics, Prague, 144-148.
[18]	Dobson, A. and Barnet, A. (2008) An Introduction to Generalized Linear Models. Chapman, Hall. https://doi.org/10.1201/9780367807849

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies