Benchmarking support vector regression against partial least squares regression and artificial neural network: Effect of sample size on model performance

Rikke Ingemann Tange; Morten Arendt Rasmussen; Eizo Taira; Rasmus Bro

doi:10.1177/0967033517734945

Benchmarking support vector regression against partial least squares regression and artificial neural network: Effect of sample size on model performance

Rikke Ingemann Tange, Morten Arendt Rasmussen, Eizo Taira, Rasmus Bro

Food Analytics and Biotechnology

10 Citations (Scopus)

Abstract

It has become easy to obtain multivariate chemical data of high dimensions. However, it may be expensive or time consuming to obtain a large number of samples or to acquire reference measures, so the number of samples available for multivariate calibration modelling may be limited. If data contains nonlinear relationships, nonlinear methods are required for the calibration task. The combination of limited amounts of data of high dimensions and highly flexible nonlinear methods may result in overfitted models which in turn perform badly on new data. Therefore, for real world applications, it is desirable to understand how the sample size affects model prediction performance. For this purpose, we compared partial least squares regression, artificial neural network, and support vector regression applied to three real world nonlinear datasets of which two were of high dimensions. We evaluated the effect of calibration sample size (i) on test set performance, including variation in test set performance due to sampling variation and (ii) tested if the cross-validated performance was adequate for assessing the predictive ability. We demonstrated the applicability of artificial neural network and support vector regression for real world data of limited size and showed that support vector regression had advantages over artificial neural network: (i) fewer calibration samples were required to obtain a desired model performance, (ii) support vector regression was less sensitive to sampling variation for small sample sets and (iii) cross-validation was an approximately unbiased option for evaluating the true support vector regression model performance even for small sample sets.

Original language	English
Journal	Journal of Near Infrared Spectroscopy
Volume	25
Issue number	6
Pages (from-to)	381-390
Number of pages	10
ISSN	0967-0335
DOIs	https://doi.org/10.1177/0967033517734945
Publication status	Published - 1 Jan 2017

Access to Document

10.1177/0967033517734945

Cite this

Benchmarking support vector regression against partial least squares regression and artificial neural network: Effect of sample size on model performance. / Tange, Rikke Ingemann; Rasmussen, Morten Arendt; Taira, Eizo et al.
In: Journal of Near Infrared Spectroscopy, Vol. 25, No. 6, 01.01.2017, p. 381-390.

Research output: Contribution to journal › Journal article › Research › peer-review

@article{c8548e30735947618d80b628565bde27,

title = "Benchmarking support vector regression against partial least squares regression and artificial neural network: Effect of sample size on model performance",

abstract = "It has become easy to obtain multivariate chemical data of high dimensions. However, it may be expensive or time consuming to obtain a large number of samples or to acquire reference measures, so the number of samples available for multivariate calibration modelling may be limited. If data contains nonlinear relationships, nonlinear methods are required for the calibration task. The combination of limited amounts of data of high dimensions and highly flexible nonlinear methods may result in overfitted models which in turn perform badly on new data. Therefore, for real world applications, it is desirable to understand how the sample size affects model prediction performance. For this purpose, we compared partial least squares regression, artificial neural network, and support vector regression applied to three real world nonlinear datasets of which two were of high dimensions. We evaluated the effect of calibration sample size (i) on test set performance, including variation in test set performance due to sampling variation and (ii) tested if the cross-validated performance was adequate for assessing the predictive ability. We demonstrated the applicability of artificial neural network and support vector regression for real world data of limited size and showed that support vector regression had advantages over artificial neural network: (i) fewer calibration samples were required to obtain a desired model performance, (ii) support vector regression was less sensitive to sampling variation for small sample sets and (iii) cross-validation was an approximately unbiased option for evaluating the true support vector regression model performance even for small sample sets.",

author = "Tange, {Rikke Ingemann} and Rasmussen, {Morten Arendt} and Eizo Taira and Rasmus Bro",

year = "2017",

month = jan,

day = "1",

doi = "10.1177/0967033517734945",

language = "English",

volume = "25",

pages = "381--390",

journal = "Journal of Near Infrared Spectroscopy",

issn = "0967-0335",

publisher = "N I R Publications",

number = "6",

}

TY - JOUR

T1 - Benchmarking support vector regression against partial least squares regression and artificial neural network: Effect of sample size on model performance

AU - Tange, Rikke Ingemann

AU - Rasmussen, Morten Arendt

AU - Taira, Eizo

AU - Bro, Rasmus

PY - 2017/1/1

Y1 - 2017/1/1

N2 - It has become easy to obtain multivariate chemical data of high dimensions. However, it may be expensive or time consuming to obtain a large number of samples or to acquire reference measures, so the number of samples available for multivariate calibration modelling may be limited. If data contains nonlinear relationships, nonlinear methods are required for the calibration task. The combination of limited amounts of data of high dimensions and highly flexible nonlinear methods may result in overfitted models which in turn perform badly on new data. Therefore, for real world applications, it is desirable to understand how the sample size affects model prediction performance. For this purpose, we compared partial least squares regression, artificial neural network, and support vector regression applied to three real world nonlinear datasets of which two were of high dimensions. We evaluated the effect of calibration sample size (i) on test set performance, including variation in test set performance due to sampling variation and (ii) tested if the cross-validated performance was adequate for assessing the predictive ability. We demonstrated the applicability of artificial neural network and support vector regression for real world data of limited size and showed that support vector regression had advantages over artificial neural network: (i) fewer calibration samples were required to obtain a desired model performance, (ii) support vector regression was less sensitive to sampling variation for small sample sets and (iii) cross-validation was an approximately unbiased option for evaluating the true support vector regression model performance even for small sample sets.

AB - It has become easy to obtain multivariate chemical data of high dimensions. However, it may be expensive or time consuming to obtain a large number of samples or to acquire reference measures, so the number of samples available for multivariate calibration modelling may be limited. If data contains nonlinear relationships, nonlinear methods are required for the calibration task. The combination of limited amounts of data of high dimensions and highly flexible nonlinear methods may result in overfitted models which in turn perform badly on new data. Therefore, for real world applications, it is desirable to understand how the sample size affects model prediction performance. For this purpose, we compared partial least squares regression, artificial neural network, and support vector regression applied to three real world nonlinear datasets of which two were of high dimensions. We evaluated the effect of calibration sample size (i) on test set performance, including variation in test set performance due to sampling variation and (ii) tested if the cross-validated performance was adequate for assessing the predictive ability. We demonstrated the applicability of artificial neural network and support vector regression for real world data of limited size and showed that support vector regression had advantages over artificial neural network: (i) fewer calibration samples were required to obtain a desired model performance, (ii) support vector regression was less sensitive to sampling variation for small sample sets and (iii) cross-validation was an approximately unbiased option for evaluating the true support vector regression model performance even for small sample sets.

U2 - 10.1177/0967033517734945

DO - 10.1177/0967033517734945

M3 - Journal article

SN - 0967-0335

VL - 25

SP - 381

EP - 390

JO - Journal of Near Infrared Spectroscopy

JF - Journal of Near Infrared Spectroscopy

IS - 6

ER -

Benchmarking support vector regression against partial least squares regression and artificial neural network: Effect of sample size on model performance

Abstract

Access to Document

Fingerprint

Cite this