TITLE:
Cross-Validation, Shrinkage and Variable Selection in Linear Regression Revisited
AUTHORS:
Hans C. van Houwelingen, Willi Sauerbrei
KEYWORDS:
Cross-Validation; LASSO; Shrinkage; Simulation Study; Variable Selection
JOURNAL NAME:
Open Journal of Statistics,
Vol.3 No.2,
April
24,
2013
ABSTRACT:
In deriving a regression model analysts often have to use variable
selection, despite of problems introduced by data- dependent model
building. Resampling approaches are proposed to handle some of the critical
issues. In order to assess and compare several strategies, we will conduct a
simulation study with 15 predictors and a complex correlation structure in
the linear regression model. Using sample sizes of 100 and 400 and estimates of
the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information.
We also consider two examples with 24 and 13 predictors, respectively. We will
discuss the value of cross-validation, shrinkage and backward
elimination (BE) with varying significance level. We will assess whether 2-step
approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to
models derived with the LASSO procedure. Beside of MSE we will use model
sparsity and further criteria for model assessment. The amount of information
in the data has an influence on the selected models and the comparison of the
procedures. None of the approaches was best in all scenarios. The
performance of backward elimination with a suitably chosen significance level
was not worse compared to the LASSO and BE models selected were much sparser,
an important advantage for interpretation and transportability. Compared to
global shrinkage, PWSF had better performance. Provided that the amount of
information is not too small, we conclude that BE followed by PWSF is a suitable
approach when variable selection is a key part of data analysis.