Endogeneity Effect on AR (1) Models in Small Samples


This study examines the endogeneity effect on autoregressive linear models of AR (1) in small samples, making use of the Ordinary Least Square (OLS) estimator, Two-Stage Least Squares (2SLS) estimator, and Generalized Method of Moment (GMM) estimator, based on the sensitivity analysis of sample size and specification errors in estimator determination in linear regression model through the use of Monte Carlo simulation and application to real-life data. The simulation indicates that 2SLS and GMM estimators show the smallest biases when the sample size is varied from n = 10, 25, 50 to 100. The estimator that performs best when sample size n = 10 across autocorrelation (ρ) and significant correlation (α) at all levels of replication of 10,000 is GMM. In the real-life data, OLS and 2SLS exhibit higher endogeneity characteristics from the dataset used. The empirical analysis base on MSE criteria GMM is the best estimator for dealing with external shock factors to inflations embedded with endogeneity in the linear model. When endogeneity and autocorrelation are bedeviled in a linear AR (1) model, in small samples, using the GMM estimator will provide the best results in small samples than using 2SLS and OLS.

Share and Cite:

Kanyir, Y. , Olaomi, J. and Luguterah, A. (2022) Endogeneity Effect on AR (1) Models in Small Samples. Modern Economy, 13, 1194-1205. doi: 10.4236/me.2022.139063.

1. Introduction

Much research has indicated that there is enough evidence for the large samples theory in assessing frameworks in models, what is still lacking is how to get a complete set of the sample in the regression model, which is still not been appropriately dealt with (Kramer, 1998). The dependence on the theory of asymptotic lead to the most problem of bias and sometimes the inferential accuracy level when the sample size, are small (Philips, 1982; Olaomi & Sangodoyin, 2010). Many statisticians are often concerned about, the less subject to sampling fluctuation statistic presumptions, when they seem to be failing in the model. Wooldridge (2002) did mention the things that bring about endogeneity bias in models such as; error in measurement, variables occurring at the same time, and omission of some important variables.

Cochrane and Orcutt (1949) in their findings reveal that there were highly positive autocorrelated error terms in most economic relations. The findings in Rao and Griliches (1969) indicated in their study that there is more to benefit from when one tries to deal with the presumptions in regressions regarding the predictor variables and disturbance errors in the linear model than its original form. Endogeneity is the variable or a change that sets in from within a system or a model. A variation in customer choice of food with regards to high cholesterol to low cholesterol is an endogenous change that may affect any meaningful marching model (Bedri et al., 2010).

Kennedy (2008) stated that four different issues may significantly introduce endogeneity in OLS regression models such as; errors in variables (measurement error), autoregression, omitted variables, and simultaneous causality. In all of these scenarios, OLS regression many times report biased coefficients instead of estimating the true relationship between the independent variable and the dependent variable, OLS regression mistakenly includes the correlation between the independent variable and the error term in the estimation of the independent variable’s coefficient.

Infractions in predictor disturbance term presumptions contain certain vital components for the OLS model. For instance, the predictor outcomes may be wrong when testing it for significance with the parameters. The coefficients are always not as strong as they would have been when you consider their autocorrelations in the estimations of the parameters in the regression model. Lastly, because of the nature of the predictor variables, many at times carry ideas that may be made use of during the process of prediction of values in the future in the linear model.

Clougherty et al. (2016) say endogeneity bias renders coefficient estimates from standard regressions practically difficult to explain as the estimates will be inconsistent in the manner that they do not converge to the exact coefficient values. Some studies have been done by many researchers in estimators and estimation of linear models any time least squares assumptions of error terms of independence and zero correlation within regressors and their error terms are violated making use of Monte Carlo design (Olaomi & Shangodoyin, 2010). Other studies emphasize that no matter how small the presence of endogeneity is, it can lead to biased and inconsistent results which will lead to causal inference (Semadeni et al., 2014), “Little experience is sufficient to show that the traditional machinery of statistical processes is wholly unsuited to the needs of practical research. Not only does it take cannon to shoot a sparrow, but it misses the sparrow. The elaborate mechanism built on the theory of infinitely large samples is not accurate enough for simple laboratory data. Only by systematically tackling small sample problems on their merits does it seem possible to apply the accurate test to practical data” (Fisher, 1925). It is known that in an autocorrelated but none endogenized model, the Feasible Generalized Least Square (FGLS) estimator is better than the OLS estimator when it comes to efficiency in their estimates. Two-Stage Least Squares (2SLS) estimator similarly performs better than other estimators with the presence of endogeneity in the model and absence of autocoreelation (Olaomi & Iyaniwura, 2006; Olaomi, 2008). Infractions in predictor disturbance term presumptions contain certain vital components of the OLS model. For instance, the predictor outcomes may be wrong when testing it for significance with the parameters. The coefficients are always not as strong as they would have been when considering their autocorrelations in the estimations of the parameters in the regression model, lastly, because of the nature of the predictor variables, many at times carry ideas that may be made use of during the process of prediction of values in the future in linear models.

Violation of the presumptions underlining the independence of regressors and disturbance terms in most linear models has brought about the problems of autocorrelation and multicollinearity. All of these have an effect on the estimates, which also affect predictions (Kayode et al., 2012).

Reeb et al. (2012) were of the view that there is still much work that needs to be done to increase knowledge on endogeneity in models and how researchers can provide methods of resolving this crucial methodological problem. It has been proven that large sample properties of estimators can be established, while that of small sample properties typically remains a problem (Adedayo, 2008). One of the estimation procedures in some situations may be preferred due to its ability to give better parameter estimates precisions over the others (Kayode, 2007).

Blundell and Bond (1998) in their studies proposed another method in dealing with endogeneity estimation in a linear model with the technique of Generalized Method of Moments (GMM) aims at exploiting all the conditions between the dependent variables and the disturbance term.

Nicola and Mathias (2017) did extensive work on whether the preference is affected by the support for democracy for a certain number of years regarding the endogeneity of political preferences. What they did was to find out inside countries changes in the individual interest for democracy on the preference for it.

Some methods of estimation in models were developed by (Fair, 1984, 1973) what was left was violations in their least-squares in the model which has the potential to render them not responsive, therefore needs to be given further studies with regards to its sample size, specification error, effects, degree of level of significance and to do that by comparing our results to the other estimators in literature.

Accordingly, a well-designed study must be clear about how and why variables influence one another and the logic and direction of the relationship must be specified (Larcker & Rusticus, 2007). Therefore, this paper presents results of the endogeneity effect on AR (1) models, in small samples, making use of existing estimators of OLS, 2SLS, and GMM, based on the sensitivity analysis of sample size and specification errors in estimator determination in linear regression model through the use of Monte Carlo simulation when the least square assumption of lack of autocorrelation and zero correlation between regressor and error terms are violated.

This study thus made use of these existing estimators which have been established asymptotically in nature (for large samples) but seek to establish their behaviors in a small samples environment and to find out estimator’s in dealing with it when the least square assumption of lack of autocorrelation and zero correlation between regressor and error terms are violated and when there is endogeneity and autocorrelation in the model is present. Also of interest are the characteristics such as rho (correlation between regressor and error term), significant level, and autocorrelation increase in the model. Hence this study included large samples in the design and confirmed their known asymptotic nature in literature during the simulation process.

2. Model

We assume a simple linear regression and nonlinear model in our study as:

Y t = α + β X 1 + U t (1)

Y t = α + β X 1 + γ X 2 + U t (2)

Y t = α l x β + U t (3)

Y t = α l β x 1 + γ l x 2 + U t (4)

U t = ρ U t 1 + ε t , X t = λ X t 1 + v t , ε t N ( 0 , σ 2 ) , U t A R ( 1 )

E ( X i , U ) 0 , E ( U i , U j ) 0 , E ( ε 1 , ε 2 ) 0 , | ρ , λ | < 1 , ( α , β ) = ( 1 , 1 ) , r = C o r ( U t , X t )

U t N ( 0 , σ 2 1 ρ 2 ) , X t N ( 0 , σ 2 1 λ 2 ) ,

Y t endogenous variable, U t and X t represents first-order autoregressive variables, ε t white noise processes, ρ and λ for stationary parameters, α , β are usually assumed to be unity or fixed and significant at α when E ( ε 1 , ε 2 ) 0 and autocorrelation level ( ρ ).

2.1. Experimental Structure

The study investigated estimators to ascertain autocorrelation levels ( ρ ), their efficiencies, significance levels ( α ) of correlation within, X t and U t , the effects they have on the endogenous variable Y t employing Means Square Error (MSE) and Bias criteria simultaneously. We performed serious sensitivity analysis on GMM, 2SLS, and OLS on the estimation of the stationary parameters α and β when, E ( x i , u j ) 0 , E ( u i , u j ) 0 and E ( ε i , ε j ) 0 simultaneously as their assumptions are violated and therefore we perform a Monte Carlo experiment on them as well.

Employing the Model (3) above, a value U 0 (for a certain sample size-specific) was generated and drawing a value at random ε 0 coming from this N (0, 1) which was then divided by 1 ρ 2 . From N (0, 1) t values taking successively and those values were used to calculate the autoregressive U t , X t and Z t which similarly were generated to be AR (1). In all these processes, Monte Carlo experiments involving endogeneity Z t was drawn once and then held constant throughout the replication process (Nelson & Startz, 1990).

The study used simulation approach as this C o v ( X t , U t ) 0 , hence the closer to intractability by the procedure of analysis in our sensitivity approach using small sample method during the investigation in Monte Carlo design.

In this sensitivity investigation, the degree of autocorrelation was varying (ρ) 0.4, 0.8, and 0.9. The effect of the sample size was also changing from 10, 15, 25, 50, and 100 during each replication procedure in total 10,000 times in the experimental set. The effectiveness of our estimators was examined by making use of accuracy test criteria of Bias and MSE. We involved a design set of 27 which was spread across the sample size as mentioned earlier to help in the data generation process.

2.2. Data Generation

We used the following to enable us to generate the needed data and they are, U t Z, and X t . Data were generated to be AR (1) and in the replication process, Z is drawn once and equally held constant C o r ( Z , X ) > 0.8 . It was also held constant to make sure that estimators are not being driven by frail different variables. With the model above, each of these Z t , X t and U t was generated in AR(1).

The values of C o r ( U t , X t ) and C o r ( U t , Z t ) were computed their values in absolute terms tested against the following significance levels 1%, 2%, and 5% respectively. In the process of selection, after the simulation, anytime C o r ( U t , X t ) is significant and this C o r ( U t , Z t ) is not significant, the series U t then was selected on the other hand, if C o r ( U t , Z t ) is insignificant then we disregard it in our selection process. This procedure of selections were replicated for each ρ, α and N in 10,000 times. After all the selection procedure Y t was computed as our endogeneity variable for each selected U t and X t to form our model.

3. Best Estimator of the Model Results

Model 3 Monte—Carlo Simulation results (Table 1 & Table 2)

y t = α l x β + U t

Table 1. Average bias (estimations).

Table 2. Average MSE = Variance + (Bias)2.

3.1. Best Estimator of the Model Results

Table 3 contains the summaries of the best scenario in terms of the estimates in accordance to the criteria of bias and MSE in Small samples concerning our Model (3) which is an intrinsically nonlinear model with one variable in the simulation has also been accomplished. The sensitivity analysis reveals that GMM and 2SLS estimators from the simulation indicate the one with minor biases and

Table 3. Best estimator of Model (3) for each scenario.

that of OLS the one with the substantial was bias in the findings. In the analysis following the MSE criterion, GMM and 2SLS have the minimum and OLS however has the complete worst performance. The estimator that did well in terms of dealing with sample size n = 10 across autocorrelation (ρ) and significant (α) at all levels is GMM, meanwhile, 2SLS did somehow better when the sample size was n = 100. When sample size (n = 10, 25, and 50) the estimator that produces the best outcomes is GMM at all ρ and α levels. 2SLS performs somewhat better when levels of ρ and α respectively. OLS estimator from our discuss researchers indicates that it is a biased estimator and is consistent with models of similar characters in any sensitivity econometric analysis. GMM and 2SLS are unbiased from the analysis same way in makes that they manifest that and indeed they are consistent estimators according to (Koutsoyannis, 2003; Fair, 1984) and should be dependent on when researchers are to deal with a smaller sample size in line with conducting any studies making use of our intrinsically nonlinear model of such nature as we have it in the simulation. The simulation results make use of the bias criterion, and the estimators can be ranked as, GMM, 2SLS, and OLS. In the case of criterion for MSE same estimators can equally be ranked as GMM, 2SLS, and OLS. When sample sizes (n = 10, 15, 25, and 50) using the criterion of bias, GMM put on the best outcomes at all levels of ρ and α. 2SLS discharge well when sample size (n = 100, ρ = 0.9). The findings also convey that GMM performs best at all levels of ρ and α when the sample size is ranked from (n = 10, n = 15, 25, and n = 50) from the MSE point of view.

3.2. Real-Life Data Application

In this analysis, three different datasets from (World Bank, Bank of Ghana, Ministry of Finance, and Ghana Statistical Service) were applied. Each of the datasets has a small sample of 20 yearly observations from 1998 to 2017. The dataset comprises of Exchange Rate (Monthly Average GHC/USD) from (the Bank of Ghana and Ministry of Finance), International Oil Price (in $) from (World Bank), Inflation from (Ghana Statistical Service), and Trade Openness (World Bank). Here in this dataset, factors that contribute to changes in inflation in time regimes from s 1998 through 2017 are assessed for the presence of autocorrelation and endogeneity, and other econometric factors.

3.3. Ordinary Least Square Parameter Estimations

The dataset is applied to Model (4) to investigate the correlational characteristics of external shock factors (exchange rate, oil price, and trade openness) on Ghana’s inflation. The model is

Inf = ϖ 0 + ϖ 1 ( exch ) + ϖ 2 ( oil ) + ϖ 3 ( To ) . (4)

3.4. Endogeneity Test

Wu Hausman test in Table 4 was conducted to check the existence of correlation between independent variables and the error term in the model. The test revealed p-value greater than 0.05, indicating the presence of endogeneity in the model.

3.5. The 2SLS Estimates

The 2SLS estimation technique was used on the dataset based on the instrumental variable model

Inf = ϖ 0 + ϖ 1 ( exch ) + ϖ 2 ( oil ) + ϖ 3 ( To ) | ϖ 1 ( exch ) + ϖ 2 ( oil ) + ϖ 4 ( Exp ) (5)

ϖ v = ( ϖ 0 , ϖ 1 , ϖ 2 , ϖ 3 , ϖ 4 ) T is a vector of parameters.

Table 5 shows OLS, the 2SLS suggests that Ghana’s inflation increases (75.306) when there are decreasing exchange rates (−2.518) oil prices (−0.234), and openness to trade (−51.169). The results on the model are not satisfactory for 2SLS as all the parameters were not significant. This is because the 2SLS result has no power to control endogeneity in the model.

3.6. The Generalised Method of Moment Estimator

The GMM model controls for endogeneity by internally transforming the data

Table 4. Endogeneity test results.

Table 5. 2SLS estimates of Model (5).

and by including lagged values of the dependent variable. In this, the GMM model provides a better estimation method compared to the OLS model and the 2SLS. Results in Table 6 show all the factors are significant, an exchange rate (0.019), oil price (0.002), and openness to trade (0.005). Only prices of oil are significant (p-value = 0.002) and negatively ( t-value = −3.043) correlated to Inflation, the other factors like exchange rate, and openness to trade showed a positive correlation to inflation in the GMM estimation.

3.7. Selection of Best Estimator for the Dataset

All three estimators have demonstrated differences in their capacities for econometric properties in small samples when apply to the dataset. These three estimators (OLS, 2SLS, and GMM) were then computed. The estimators after the analysis were compared based on mean square error (MSE) criteria. The results in Table 7 show that GMM is a better estimator to consider when having a small sample with the presence of endogeneity and autocorrelation in the dataset than OLS and 2SLS in computing model parameters of inflation dependent on shock factors such as exchange rate, oil prices and openness to trade. For MSE the smallest the value the better and the closer it is to find the best line fit on the dataset and again the more perfect the estimator is, hence GMM estimator is more suitable.

Table 6. GMM estimates of Model (4).

Table 7. Selection of best estimator for the dataset.

4. Conclusion

In the sensitivity analysis of the endogeneity effect on an autoregressive linear model of Order (1) in small samples in this intrinsically nonlinear regression model with one variable to determine the best estimator using OLS, 2SLS, and GMM respectively, we were able to attain its expected results in the simulation. When endogeneity and autocorrelation are assailed in a nonlinear autoregressive

model of Order (1) then using GMM and 2SLS estimators stand the chance to produce the best results in small samples than using OLS estimator. Furthermore, GMM estimators also represent more perfect results than 2SLS and OLS when the sample size is (n = 100) across all specifications. When there is an increase in autocorrelation and the sample size is small, efficiency reduces in 2SLS and OLS accept GMM at all levels of ρ and α respectively. Sample size issue has been a worrying case to a lot of empirical applied studies in literature; from the simulation, such workers can have a breath since they can make use of GMM and 2SLS estimators as a solution such as in Model (3) with all underlining conditions intrinsically nonlinear regression model therein. The effect of the error term, the extent of correlation in Model (3), and specification error when dealing with the endogeneity effect with a minimum bias has been accomplished and the GMM estimate causes the best outcome across all levels. The best estimator ranking from the analysis is GMM, 2SLS, and OLS. The OLS and 2SLS exhibit higher characteristics of endogeneity from the dataset analyzed used. The empirical analysis puts GMM as the best estimator for handling and controlling endogeneity on external shock factors to inflations and by extension in small samples when there is endogeneity presence in the model.

The limitations of the study are it is difficult to consider which sample was the smallest as a researcher in the process and also 2SLS estimator has no in-build mechanisms to internally transform the dataset when there is the detection of endogeneity present in the dataset.


Datasets Source

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.


[1] Adedayo, A. A. (2008). Comparative Performance of the Limited Information Techniques in a Two-Equation Structural Model. European Journal of Scientific Research, 20, 197-205.
[2] Bedri, K., Onur, T., & Selahattin, T. (2010). A Direct Test of the Endogeneity of Money. Turkish Economic Association Discussion, 29, 577-585.
[3] Blundell, R., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115-143.
[4] Clougherty, J.A., Duso, T., & Muck, J. (2016). Correcting for Self-selecting Based Endogeneity in Management Research: Review, Recommendations and Simulations. Organizational Research Methods, 19, 286-347.
[5] Cochrane, D., & Orcutt, G. H. (1949). Application of Least Square to Relationship Containing Autocorrelated Error Terms. Journal of the American Statistical Association, 44, 32-61.
[6] Fair, R. C. (1973). A Comparison of Alternative Estimators of Macroeconomic Models. International Economic Review, 14, 261-277.
[7] Fair, R. C. (1984). Specification, Estimation, and Analysis of Macro Econometric Models (pp. 210-214). Harvard University Press.
[8] Fisher, R. (1925). Theory of Statistical Estimation. Proceedings of the Cambridge Philosophical Society, 22, 700-725.
[9] Kayode, A., (2007). Performance of Some Estimators of a Linear Model with Autocorrelated Error Terms in the Presence of Multicollinearity. Research Journal of Applied Science, 2, 536-543.
[10] Kayode, A., Alao, R. F., & Femi J. A. (2012). Effect of Multicollinearity and Autocorrelation on Predictive Ability of Some Estimators of Linear Regression Model. Mathematical Theory and Modeling, 2, 41-52.
[11] Kennedy, P. (2008). A Guide to Econometrics (2nd ed.). Blackwell Publishing Ltd.
[12] Koutsoyannis, A. (2003). Theory of Econometrics (2nd ed., pp. 200-223). Palgrave Publishers.
[13] Kramer, W. (1998). Asymptotic Equivalence of Ordinary Least Squares and Generalized Least Squares with Trending Regressors and Stationary Autoregressive Disturbances. In R. Galata, & H. Küchenhoff (Eds.), Econometrics in Theory and Practice (pp. 137-142). Physica-Verlag HD.
[14] Larcker, D., & Rusticus, T. (2007). Endogeneity and Empirical Accounting Research. European Accounting Review, 16, 207-215.
[15] Nelson, C. R., & Startz, R. (1990). The Distributions of Instrumental Variables Estimator and Its T-Ratio When the Instrument Is a Poor One. The Journal of Business, 63, 125-140.
[16] Nicola, F.S., & Matthias, S. (2017). On the Endogeneity of Political Preferences. Political Economy Research Reports, 347, 1145-1148.
[17] Olaomi, J. O. (2008). Performance of the Estimators of Linear Regression Model with Autocorrelated Error Terms Which Are Also Correlated with the Geometric Trended Regressor. European Journal of Scientific Research, 20, 187-196.
[18] Olaomi, J. O., & Iyaniwura, J. O. (2006). The Efficiency of GLS Estimators in a Linear Regression Model with Autocorrelated Error Terms Which Are Also Correlated with the Regressor. International Journal of Biological and Physical Sciences, No. 11, 129-133.
[19] Olaomi, J. O., & Shangodoyin D.K. (2010). Comparative Study of Estimators in Autocorrelated-Endogenized Linear Model. Interstat, No. 6, 1-10.
[20] Philips, P. C. B. (1982). Small Sample Distribution Theory in Econometric Models of Simultaneous Equations. Discussion Paper, Cowles Foundation.
[21] Reeb, D., Sakakibara. M., & Mahmood, I. (2012). From the Editors: Endogeneity in International Business Research. Journal of International Business Studies, 43, 211-218.
[22] Semadeni, M., Withers, M. C., & Certo, S. T. (2014). The Perils of Endogeneity and Instrumental Variables in Strategy Research: Understanding through Simulations. Strategic Management Journal, 35, 1070-1079.
[23] Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. The MIT Press.
[24] Rao, P., & Griliches, Z. (1969). Small Sample Properties of Several Two-Stage Regression Methods in the Context of Autocorrelated Errors. Journal of the American Statistical Association, 64, 251-272.

Copyright © 2022 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.