Recursive weighted partial least squares (rPLS): an efficient variable selection method using PLS

Åsmund Rinnan; Martin Andersson; Carsten Ridder; Søren Balling Engelsen

doi:10.1002/cem.2582

Recursive weighted partial least squares (rPLS): an efficient variable selection method using PLS

Åsmund Rinnan, Martin Andersson, Carsten Ridder, Søren Balling Engelsen

Food Analytics and Biotechnology

50 Citations (Scopus)

Abstract

Variable selection is important in fine tuning partial least squares (PLS) regression models. This study introduces a novel variable weighting method for PLS regression where the univariate response variable y is used to guide the variable weighting in a recursive manner-the method is called recursive weighted PLS or just rPLS. The method iteratively reweights the variables using the regression coefficients calculated by PLS. The use of the regression vector to make up the weights is a reasonable idea from the fact that the weights in the regression vector ideally reflect the importance of the variables. In contrast to many other variable selection methods, the rPLS method has the advantage that only one parameter needs to be estimated: the number of latent factors used in the PLS model. The rPLS model has the fascinating output that it, under normal conditions, converges to a very limited number of variables (useful for interpretation), but it will exhibit optimal regression performance before convergence, normally including covarying neighbor variables. This study examines the properties of rPLS by application to a near-infrared spectroscopy dataset of feed samples predicting the protein content and to a metabolomics dataset modeling a reference metabolic parameter (creatinine) from nuclear magnetic resonance spectra of human urine.

Original language	English
Journal	Journal of Chemometrics
Volume	28
Issue number	5
Pages (from-to)	439–447
Number of pages	9
ISSN	0886-9383
DOIs	https://doi.org/10.1002/cem.2582
Publication status	Published - May 2014

Access to Document

10.1002/cem.2582

Cite this

@article{5bce8e12ea504effb7a52b25b21f78cb,

title = "Recursive weighted partial least squares (rPLS): an efficient variable selection method using PLS",

abstract = "Variable selection is important in fine tuning partial least squares (PLS) regression models. This study introduces a novel variable weighting method for PLS regression where the univariate response variable y is used to guide the variable weighting in a recursive manner-the method is called recursive weighted PLS or just rPLS. The method iteratively reweights the variables using the regression coefficients calculated by PLS. The use of the regression vector to make up the weights is a reasonable idea from the fact that the weights in the regression vector ideally reflect the importance of the variables. In contrast to many other variable selection methods, the rPLS method has the advantage that only one parameter needs to be estimated: the number of latent factors used in the PLS model. The rPLS model has the fascinating output that it, under normal conditions, converges to a very limited number of variables (useful for interpretation), but it will exhibit optimal regression performance before convergence, normally including covarying neighbor variables. This study examines the properties of rPLS by application to a near-infrared spectroscopy dataset of feed samples predicting the protein content and to a metabolomics dataset modeling a reference metabolic parameter (creatinine) from nuclear magnetic resonance spectra of human urine.",

author = "{\AA}smund Rinnan and Martin Andersson and Carsten Ridder and Engelsen, {S{\o}ren Balling}",

year = "2014",

month = may,

doi = "10.1002/cem.2582",

language = "English",

volume = "28",

pages = "439–447",

journal = "Journal of Chemometrics",

issn = "0886-9383",

publisher = "Wiley",

number = "5",

}

TY - JOUR

T1 - Recursive weighted partial least squares (rPLS)

T2 - an efficient variable selection method using PLS

AU - Rinnan, Åsmund

AU - Andersson, Martin

AU - Ridder, Carsten

AU - Engelsen, Søren Balling

PY - 2014/5

Y1 - 2014/5

N2 - Variable selection is important in fine tuning partial least squares (PLS) regression models. This study introduces a novel variable weighting method for PLS regression where the univariate response variable y is used to guide the variable weighting in a recursive manner-the method is called recursive weighted PLS or just rPLS. The method iteratively reweights the variables using the regression coefficients calculated by PLS. The use of the regression vector to make up the weights is a reasonable idea from the fact that the weights in the regression vector ideally reflect the importance of the variables. In contrast to many other variable selection methods, the rPLS method has the advantage that only one parameter needs to be estimated: the number of latent factors used in the PLS model. The rPLS model has the fascinating output that it, under normal conditions, converges to a very limited number of variables (useful for interpretation), but it will exhibit optimal regression performance before convergence, normally including covarying neighbor variables. This study examines the properties of rPLS by application to a near-infrared spectroscopy dataset of feed samples predicting the protein content and to a metabolomics dataset modeling a reference metabolic parameter (creatinine) from nuclear magnetic resonance spectra of human urine.

AB - Variable selection is important in fine tuning partial least squares (PLS) regression models. This study introduces a novel variable weighting method for PLS regression where the univariate response variable y is used to guide the variable weighting in a recursive manner-the method is called recursive weighted PLS or just rPLS. The method iteratively reweights the variables using the regression coefficients calculated by PLS. The use of the regression vector to make up the weights is a reasonable idea from the fact that the weights in the regression vector ideally reflect the importance of the variables. In contrast to many other variable selection methods, the rPLS method has the advantage that only one parameter needs to be estimated: the number of latent factors used in the PLS model. The rPLS model has the fascinating output that it, under normal conditions, converges to a very limited number of variables (useful for interpretation), but it will exhibit optimal regression performance before convergence, normally including covarying neighbor variables. This study examines the properties of rPLS by application to a near-infrared spectroscopy dataset of feed samples predicting the protein content and to a metabolomics dataset modeling a reference metabolic parameter (creatinine) from nuclear magnetic resonance spectra of human urine.

U2 - 10.1002/cem.2582

DO - 10.1002/cem.2582

M3 - Journal article

SN - 0886-9383

VL - 28

SP - 439

EP - 447

JO - Journal of Chemometrics

JF - Journal of Chemometrics

IS - 5

ER -

Recursive weighted partial least squares (rPLS): an efficient variable selection method using PLS

Abstract

Access to Document

Fingerprint

Cite this