Abstract
Variable selection is important in fine tuning partial least squares (PLS) regression models. This study introduces a novel variable weighting method for PLS regression where the univariate response variable y is used to guide the variable weighting in a recursive manner-the method is called recursive weighted PLS or just rPLS. The method iteratively reweights the variables using the regression coefficients calculated by PLS. The use of the regression vector to make up the weights is a reasonable idea from the fact that the weights in the regression vector ideally reflect the importance of the variables. In contrast to many other variable selection methods, the rPLS method has the advantage that only one parameter needs to be estimated: the number of latent factors used in the PLS model. The rPLS model has the fascinating output that it, under normal conditions, converges to a very limited number of variables (useful for interpretation), but it will exhibit optimal regression performance before convergence, normally including covarying neighbor variables. This study examines the properties of rPLS by application to a near-infrared spectroscopy dataset of feed samples predicting the protein content and to a metabolomics dataset modeling a reference metabolic parameter (creatinine) from nuclear magnetic resonance spectra of human urine.
Original language | English |
---|---|
Journal | Journal of Chemometrics |
Volume | 28 |
Issue number | 5 |
Pages (from-to) | 439–447 |
Number of pages | 9 |
ISSN | 0886-9383 |
DOIs | |
Publication status | Published - May 2014 |