Regularization and Estimation in Regression with Cluster Variables

HTML  XML Download Download as PDF (Size: 2924KB)  PP. 814-825  
DOI: 10.4236/ojs.2014.410077    4,416 Downloads   6,122 Views  Citations
Author(s)

ABSTRACT

Clustering Lasso, a new regularization method for linear regressions is proposed in the paper. The Clustering Lasso can select variable while keeping the correlation structures among variables. In addition, Clustering Lasso encourages selection of clusters of variables, so that variables having the same mechanism of predicting the response variable will be selected together in the regression model. A real microarray data example and simulation studies show that Clustering Lasso outperforms Lasso in terms of prediction performance, particularly when there is collinearity among variables and/or when the number of predictors is larger than the number of observations. The Clustering Lasso paths can be obtained using any established algorithm for Lasso solution. An algorithm is proposed to construct variable correlation structures and to compute Clustering Lasso paths efficiently.

Share and Cite:

Yu, Q. and Li, B. (2014) Regularization and Estimation in Regression with Cluster Variables. Open Journal of Statistics, 4, 814-825. doi: 10.4236/ojs.2014.410077.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.