Abstract
When predicting the chemical composition of food samples from near-infrared spectroscopy using partial least squares regression, deep knowledge of the origin of the information is not present. We are aiming at opening a Pandora's box of how the prediction of protein proceeds in a unique set of chemically diverse barley mutant samples. An external validation of the sources of co-variation in nature that are exploited by chemometric models would give a framework for manipulating the deciding information to make expensive calibration more economical. The barley samples were supplemented by two designed data sets: one mirroring the coarse composition of the barley samples by mixing six main chemical components and one set where the biological covariance between the six chemical components had been reduced. The three original data sets give remarkably comparable prediction models, albeit their regression coefficients are quite different. The origin of the prediction ability of the data is elucidated by splitting the natural barley samples into two parts: one based on simulated biology extracted from a set of chemical mixtures, and the residual after the chemistry has been removed from the raw data. As much as 98.1% of the spectral information in the natural barley data is explained through the simulated biology, leaving as little as 1.9% of the spectral information for the unexplained biological variation and noise. However, unexplained biological variation still gives a fair prediction of protein (RMSECV=1.23 and r2=0.80, compared with RMSECV=0.46 and r2=0.97 for the natural data), and it gives a clear principal component analysis separation of the three genotype classes. The results were interpreted by conducting spectral inspection on the origin of the unique covariate patterns appearing in self-organised biological systems that should motivate researchers and industry to investigate the compressive effect that the model has on the essential deterministic biological data.
Original language | English |
---|---|
Journal | Journal of Chemometrics |
Volume | 26 |
Issue number | 8-9 |
Pages (from-to) | 487-495 |
Number of pages | 9 |
ISSN | 0886-9383 |
DOIs | |
Publication status | Published - Apr 2012 |