Functional Data Analysis Applied in Chemometrics: With Focus on NMR Nutri-Metabolomics

Martha Muller

Abstract

In this thesis we explore the use of functional data analysis as a method to analyse chemometric data, more specically spectral data in metabolomics. Functional data analysis is a vibrant eld in statistics. It has been rapidly expanding in both methodology and applications since it was made well known by Ramsay & Silverman's monograph in 1997. In functional data analysis, the data are curves instead of data points. Each curve is measured at discrete points along a continuum, for example, time or frequency. It is assumed that the underlying process generating the curves is smooth, but it is not assumed that the adjacent points measured along the continuum are independent. Standard chemometric methods originate from the eld of multivariate analysis, where variables are often assumed to be independent. Typically these methods do not explore the rich functional nature of spectral data.

Metabolomics studies the `unique chemical ngerprints' (Daviss, 2005) that cellular processes create in living systems. Metabolomics is used to study the in uence of nutrition on the human metabolome. Nutritional metabolomics shows great potential for the discovery of novel biomarkers of food consumption, personal nutritional status and metabolic phenotype. We want to understand how metabolomic spectra can be analysed using functional data analysis to detect the in uence of dierent factors on specic metabolites. These factors can include, for example, gender, diet culture or dietary intervention. In Paper I we apply wavelet-based functional mixed model methodology and use bootstrap-based inference on functions to nd jointly signicant dierences in metabolites, or spectral regions. In more detail, wavelets are used to model sharp, localised peaks in the spectra. Wavelet shrinkage reduces the noise and provides a sparse representation of each spectrum. Subset selection of wavelet coecients generates the input to mixed models. Mixed-model methodology enables us to take the study design into account while modelling covariates. Bootstrap-based inference preserves the correlation structure between curves and enables the estimation of functional condence intervals for mean curves. We also discuss the many practical considerations in wavelet estimation and thresholding, and the important in uence the choices can have on the resulting estimates.

On a conceptual level, the purpose of this thesis is to build a stronger connection between the worlds of statistics and chemometrics. We want to provide a glimpse of the essential and complex data pre-processing that is well known to chemometricians, but is generally unknown to statisticians. Pre-processing can potentially have a strong in uence on the results of consequent data analysis. Our focus is on nuclear magnetic resonance data and we discuss the inherent structure in this type of data. However, many of the methods covered in this thesis are also applicable to other spectral data, e.g. mass spectrometry or
infrared.

In Paper II we give a brief overview of functional data analysis { a eld that is known to
statisticians, but often obscured from chemometricians. We illustrate the rich nature of
functional derivatives in simulated nuclear magnetic peaks with characteristic Lorentzian
line shape. Using phase-plane plots to explore the anatomy of NMR peaks, we introduce
the novelty of heart plots for spectral data.

The important aspect of registration, also called warping or alignment, emerges from both
the chemometric and statistical perspectives. In Paper III we apply functional registration
in the context of biomechanics, specically to data from a juggling experiment. The novelty
of this work is that the registration is done towards an idealized biomechanical model. In
this way, the warping is performed subject to biomechanical constraints.

The supplemental paper, Paper IV, demonstrates the application of classical mixed-model
methodology in the context of targeted metabolomics. Dietary eects on biomarkers of
bone turnover in children were investigated as part of the pan-European DiOGenes dietary
intervention trial. The metabolomics data in paper I originated from a pilot study of the
DiOGenes trial.

Overall this thesis gives an indication of the huge possibilities for functional data analysis in metabolomics and chemometrics. Spectral data are inherently functional in nature. Functional data analysis provides access to many functional equivalents of methods currently used in chemometrics, with the benets of no strong assumptions regarding neighbouring observations. Functional data analysis also provides access to the data's derivatives and opens up the ability to analyse information that is otherwise locked away in the data. The use of functional data analysis in metabolomics can make a valuable contribution to the emerging technology in personalised medicine and health care, including personalised nutrition for prevention and treatment.
OriginalsprogEngelsk
ForlagDepartment of Mathematical Sciences, Faculty of Science, University of Copenhagen
Antal sider166
ISBN (Trykt)978-87-7078-968-4
StatusUdgivet - 2014

Citationsformater