Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a Simulation Study

Missing data can frequently occur in a longitudinal data analysis. In the literature, many methods have been proposed to handle such an issue. Complete case (CC), mean substitution (MS), last observation carried forward (LOCF), and multiple imputation (MI) are the four most frequently used methods in practice. In a real-world data analysis, the missing data can be MCAR, MAR, or MNAR depending on the reasons that lead to data missing. In this paper, simulations under various situations (including missing mechanisms, missing rates, and slope sizes) were conducted to evaluate the performance of the four methods considered using bias, RMSE, and 95% coverage probability as evaluation criteria. The results showed that LOCF has the largest bias and the poorest 95% coverage probability in most cases under both MAR and MCAR missing mechanisms. Hence, LOCF should not be used in a longitudinal data analysis. Under MCAR missing mechanism, CC and MI method are performed equally well. Under MAR missing mechanism, MI has the smallest bias, smallest RMSE, and best 95% coverage probability. Therefore, CC or MI method is the appropriate method to be used under MCAR while MI method is a more reliable and a better grounded statistical method to be used under MAR.

KEYWORDS

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

Zhu, X. (2014) Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a Simulation Study. Open Journal of Statistics, 4, 933-944. doi: 10.4236/ojs.2014.411088.

 [1] Little, R.J.A. and Rubin, D.B. (1987) Statistical Analysis with Missing Data. John Wiley & Sons, New York. [2] Collins, L.M., Schafer, J.L. and Kam, C.M. (2001) A Comparison of Inclusive and Restrictive Missing-Data Strategies in Modern Missing-Data Procedures. Psychological Methods, 6, 330-351. http://dx.doi.org/10.1037/1082-989X.6.4.330 [3] Little, R.J.A. (1988) A Test of Missing Completely at Random for Multivariate Data with Missing Values. Journal of the American Statistical Association, 83, 1198-1202. http://dx.doi.org/10.1080/01621459.1988.10478722 [4] Diggle, P.J., Heagerty, P., Liang, K.Y. and Zeger, S.L. (2002) Analysis of Longitudinal Data. 2nd Edition, Clarendon Press, Clarendon. [5] Carpenter, J.R., Kenward, M.G. and Vansteelandt, S. (2006) A Comparison of Multiple Imputation and Doubly Robust Estimation for Analyses with Missing Data. Journal of the Royal Statistical Society, Series A (Statistics in Society), 169, 571-584. http://dx.doi.org/10.1111/j.1467-985X.2006.00407.x [6] Musil, C.M., Warner, C.B., Yobas, P.K. and Jones, S.L. (2002) A Comparison of Imputation Techniques for Handling Missing Data. Western Journal of Nursing Research, 24, 815-829. http://dx.doi.org/10.1177/019394502762477004 [7] Sprint, A. and Dupin-Sprint, T. (1993) Imperfect Data Analysis. Drug Information Journal, 27, 995-994. [8] Myers, W.R. (2000) Handling Missing Data in Clinical Trials: An Overview. Drug Information Journal, 34, 525-533. [9] Hening, D. and Koonce, D.A. (2014) Missing Data Imputation Method Comparison in Ohio University Student Retention Database. Proceeding of the 2014 International Conference on Industrial Engineering and Operations Management, Bali, Indonesia. [10] Ali, A.M.G., Dawson, S.J., Blows, F.M., Provenzano, E., Ellis, I.O., Baglietto, L., Huntsman, D., Caldas, C. and Pharoah, P.D. (2011) Comparison of Methods for Handling Missing Data on Immunohistochemical Markers in Survival Analysis of Breast Cancer. British Journal of Cancer, 104, 693-699. [11] Patrician, P.A. (2002) Focus on Research Methods Multiple Imputation for Missing Data. Research in Nursing & Health, 25, 76-84. http://dx.doi.org/10.1002/nur.10015 [12] Nakai, M., Chen, D.G., Nishimura, K. and Miyamoto, Y. (2014) Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism. Open Journal of Statistics, 4, 27-37. [13] Lavori, P.W., Dawson, R. and Shera, D. (1995) A Multiple Imputation Strategy for Clinical Trials with Truncation of Patient Data. Statistics in Medicine, 14, 1913-1925. http://dx.doi.org/10.1002/sim.4780141707 [14] Allison, P.D. (2001) Missing Data. Sage Publications, Thousand Oaks. [15] Kim, J.O. and Curry, J. (1977) The Treatment of Missing Data in Multivariate Analysis. Sociological Methods Research, 6, 215-240. http://dx.doi.org/10.1177/004912417700600206 [16] Allison, P.D. (1998) Multiple Regression: A Primer. Pine Forge Press, Thousand Oaks. [17] Little, R.J.A. (1992) Regression with Missing X’s: A Review. Journal of the American Statistical Association, 87, 1227-1237. [18] Greenland, S. and Finkle, W.D. (1995) A Critical Look at Methods for Handling Missing Covariates in Epidemiologic Regression Analyses. American Journal of Epidemiology, 142, 1255-1264. [19] Schafer, J.L. and Graham, J.W. (2002) Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177. http://dx.doi.org/10.1037/1082-989X.7.2.147 [20] Carpenter, J., Kenward, M.G., Evans, S. and White, I. (2004) Last Observation Carry-Forward and Last Observation Analysis. Statistics in Medicine, 23, 3241-3242. http://dx.doi.org/10.1002/sim.1891 [21] Cook, R.J., Zeng, L.L. and Yi, G.Y. (2004) Marginal Analysis of Incomplete Longitudinal Binary Data: A Cautionary Note on LOCF Imputation. Biometrics, 60, 820-828. http://dx.doi.org/10.1111/j.0006-341X.2004.00234.x [22] Jansen, I., Beunckens, C., Molenberghs, G., Verbeke, G. and Mallinckrodt, C. (2006) Analyzing Incomplete Discrete Longitudinal Clinical Trial Data. Statistical Science, 21, 52-69. http://dx.doi.org/10.1214/088342305000000322 [23] Rubin, D.B. (1987) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons Inc., New York. http://dx.doi.org/10.1002/9780470316696 [24] Tabachnick, B.G. and Fidell, L.S. (2000) Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton. [25] Molenberghs, G., Thijs, H., Jansen, I., et al. (2004) Analyzing Incomplete Longitudinal Clinical Trial Data. Biostatistics, 5, 445-464. http://dx.doi.org/10.1093/biostatistics/kxh001 [26] Shao, J. and Zhong, B. (2003) Last Observation Carry-Forward and Last Observation Analysis. Statistics in Medicine, 22, 2429-2441. http://dx.doi.org/10.1002/sim.1519 [27] Mallinckrodt, C.H., Clark, W.S. and David, S.R. (2001) Accounting for Dropout Bias Using Mixed-Effects Models. Journal of Biopharmaceutical Statistics, 11, 9-21. http://dx.doi.org/10.1081/BIP-100104194 [28] Mallinckrodt, C.H., Kaiser, C.J., Watkin, J.G., Detke, M.J., Molenberghs, G. and Carroll, R.J. (2004) Type I Error Rates from Likelihood-Based Repeated Measures Analyses of Incomplete Longitudinal Data. Pharmaceutical Statistics, 3, 171-186. http://dx.doi.org/10.1002/pst.131 [29] Gadbury, G.L., Coffey, C.S. and Allison, D.B. (2003) Modern Statistical Methods for Handling Missing Repeated Measurements in Obesity Trials: Beyond LOCF. Obesity Reviews, 4, 175-184. http://dx.doi.org/10.1046/j.1467-789X.2003.00109.x [30] Rubin, D.B. (1977) Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys. Journal of the American Statistical Association, 72, 538-543. http://dx.doi.org/10.1080/01621459.1977.10480610 [31] Schafer, J.L. (1997) The Analysis of Incomplete Multivariate Data. Chapman & Hall, London. http://dx.doi.org/10.1201/9781439821862 [32] Schafer, J.L. (2000) Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton. [33] Rubin, D.B. (2004) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons Inc., New York. [34] Bodner, T.E. (2008) What Improves with Increased Missing Data Imputations? Structural Equation Modeling: A Multidisciplinary Journal, 15, 651-675. http://dx.doi.org/10.1080/10705510802339072 [35] Dmitrienko, A., Molenberghs, G., Chuang-Stein, C. and Offen, W. (2005) Analysis of Clinical Trials Using SAS: A Practical Guide. SAS Institute Inc., Cary. [36] Yuan, Y.C. (2000) Multiple Imputation for Missing Data: Concepts and New Development. SAS Institute Inc., Rockville. [37] Allison, P.D. (2000) Multiple Imputation for Missing Data: A Cautionary Tale. Sociological Methods and Research, 28, 301-309. http://dx.doi.org/10.1177/0049124100028003003 [38] Huang, R. and Carriere, K.C. (2006) Comparison of Methods for Incomplete Repeated Measures Data Analysis in Small Samples. Journal of Statistical Planning and Inference, 136, 235-247. http://dx.doi.org/10.1016/j.jspi.2004.06.005 [39] Unnebrink, K. and Windeler, J. (2001) Intention-to-Treat: Methods for Dealing with Missing Values in Clinical Trials of Progressively Deteriorating Diseases. Statistics in Medicine, 20, 3931-3946. http://dx.doi.org/10.1002/sim.1149 [40] Halabi, S., Wun, C.C. and Davis, B.R. (2003) Analysis of Survival Data with Missing Measurements of a Time-Dependent Binary Covariate. Journal of Biopharmaceutical Statistics, 13, 253-270. http://dx.doi.org/10.1081/BIP-120019270 [41] Kenward, M.G. and Molenberghs, G. (2009) Last Observation Carried Forward: A Crystal Ball? Journal of Biopharmaceutical Statistics, 19, 872-888. http://dx.doi.org/10.1080/10543400903105406