TITLE:
Dimensionality Reduction of High-Dimensional Highly Correlated Multivariate Grapevine Dataset
AUTHORS:
Uday Kant Jha, Peter Bajorski, Ernest Fokoue, Justine Vanden Heuvel, Jan van Aardt, Grant Anderson
KEYWORDS:
High-Dimensional Data, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, Sure Independence Screening, Functional Data Analysis
JOURNAL NAME:
Open Journal of Statistics,
Vol.7 No.4,
August
25,
2017
ABSTRACT: Viticulturists traditionally
have a keen interest in studying the relationship between the biochemistry of
grapevines’ leaves/petioles and their associated spectral reflectance in order
to understand the fruit ripening rate, water status, nutrient levels, and
disease risk. In this paper, we implement imaging spectroscopy (hyperspectral)
reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard
nutrient status; this constitutes a high dimensional dataset with a covariance
matrix that is ill-conditioned. The identification of the variables (wavelength
bands) that contribute useful information for nutrient assessment and
prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully
developed many continuous, nearly unbiased, sparse and accurate variable
selection methods to overcome this problem. This paper compares four
regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure
Independence Screening, and Functional Data Analysis for wavelength variable
selection. Thereafter, the predictive performance of these regularized sparse
models is enhanced using the stepwise regression. This comparative study of
regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the
performance of Elastic Net for variable selection yields the best predictive
ability.