Integrative analysis of metabolomics and transcriptomics data: a unified model framework to identify underlying system pathways

Kasper Brink-Jensen; Søren Bak; Kirsten Jørgensen; Claus Thorn Ekstrøm

doi:10.1371/journal.pone.0072116

Integrative analysis of metabolomics and transcriptomics data: a unified model framework to identify underlying system pathways

Kasper Brink-Jensen, Søren Bak, Kirsten Jørgensen, Claus Thorn Ekstrøm

11 Citations (Scopus)

1683 Downloads (Pure)

Abstract

The abundance of high-dimensional measurements in the form of gene expression and mass spectroscopy calls for models to elucidate the underlying biological system. For widely studied organisms like yeast, it is possible to incorporate prior knowledge from a variety of databases, an approach used in several recent studies. However if such information is not available for a particular organism these methods fall short. In this paper we propose a statistical method that is applicable to a dataset consisting of Liquid Chromatography-Mass Spectroscopy (LC-MS) and gene expression (DNA microarray) measurements from the same samples, to identify genes controlling the production of metabolites. Due to the high dimensionality of both LC-MS and DNA microarray data, dimension reduction and variable selection are key elements of the analysis. Our proposed approach starts by identifying the basis functions ("building blocks") that constitute the output from a mass spectrometry experiment. Subsequently, the weights of these basis functions are related to the observations from the corresponding gene expression data in order to identify which genes are associated with specific patterns seen in the metabolite data. The modeling framework is extremely flexible as well as computationally fast and can accommodate treatment effects and other variables related to the experimental design. We demonstrate that within the proposed framework, genes regulating the production of specific metabolites can be identified correctly unless the variation in the noise is more than twice that of the signal.

Original language	English
Article number	e72116
Journal	P L o S One
Volume	8
Issue number	9
Number of pages	8
ISSN	1932-6203
DOIs	https://doi.org/10.1371/journal.pone.0072116
Publication status	Published - 25 Sept 2013

Access to Document

10.1371/journal.pone.0072116Licence: CC BY

Integrative analysis of metabolomics and transcriptomics data: a unified model framework to identify underlying system pathwaysFinal published version, 494 KBLicence: CC BY

Cite this

@article{9fcd19670ce2463a972996845e0efa99,

title = "Integrative analysis of metabolomics and transcriptomics data: a unified model framework to identify underlying system pathways",

abstract = "The abundance of high-dimensional measurements in the form of gene expression and mass spectroscopy calls for models to elucidate the underlying biological system. For widely studied organisms like yeast, it is possible to incorporate prior knowledge from a variety of databases, an approach used in several recent studies. However if such information is not available for a particular organism these methods fall short. In this paper we propose a statistical method that is applicable to a dataset consisting of Liquid Chromatography-Mass Spectroscopy (LC-MS) and gene expression (DNA microarray) measurements from the same samples, to identify genes controlling the production of metabolites. Due to the high dimensionality of both LC-MS and DNA microarray data, dimension reduction and variable selection are key elements of the analysis. Our proposed approach starts by identifying the basis functions ({"}building blocks{"}) that constitute the output from a mass spectrometry experiment. Subsequently, the weights of these basis functions are related to the observations from the corresponding gene expression data in order to identify which genes are associated with specific patterns seen in the metabolite data. The modeling framework is extremely flexible as well as computationally fast and can accommodate treatment effects and other variables related to the experimental design. We demonstrate that within the proposed framework, genes regulating the production of specific metabolites can be identified correctly unless the variation in the noise is more than twice that of the signal.",

author = "Kasper Brink-Jensen and S{\o}ren Bak and Kirsten J{\o}rgensen and Ekstr{\o}m, {Claus Thorn}",

year = "2013",

month = sep,

day = "25",

doi = "10.1371/journal.pone.0072116",

language = "English",

volume = "8",

journal = "P L o S One",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "9",

}

TY - JOUR

T1 - Integrative analysis of metabolomics and transcriptomics data

T2 - a unified model framework to identify underlying system pathways

AU - Brink-Jensen, Kasper

AU - Bak, Søren

AU - Jørgensen, Kirsten

AU - Ekstrøm, Claus Thorn

PY - 2013/9/25

Y1 - 2013/9/25

N2 - The abundance of high-dimensional measurements in the form of gene expression and mass spectroscopy calls for models to elucidate the underlying biological system. For widely studied organisms like yeast, it is possible to incorporate prior knowledge from a variety of databases, an approach used in several recent studies. However if such information is not available for a particular organism these methods fall short. In this paper we propose a statistical method that is applicable to a dataset consisting of Liquid Chromatography-Mass Spectroscopy (LC-MS) and gene expression (DNA microarray) measurements from the same samples, to identify genes controlling the production of metabolites. Due to the high dimensionality of both LC-MS and DNA microarray data, dimension reduction and variable selection are key elements of the analysis. Our proposed approach starts by identifying the basis functions ("building blocks") that constitute the output from a mass spectrometry experiment. Subsequently, the weights of these basis functions are related to the observations from the corresponding gene expression data in order to identify which genes are associated with specific patterns seen in the metabolite data. The modeling framework is extremely flexible as well as computationally fast and can accommodate treatment effects and other variables related to the experimental design. We demonstrate that within the proposed framework, genes regulating the production of specific metabolites can be identified correctly unless the variation in the noise is more than twice that of the signal.

AB - The abundance of high-dimensional measurements in the form of gene expression and mass spectroscopy calls for models to elucidate the underlying biological system. For widely studied organisms like yeast, it is possible to incorporate prior knowledge from a variety of databases, an approach used in several recent studies. However if such information is not available for a particular organism these methods fall short. In this paper we propose a statistical method that is applicable to a dataset consisting of Liquid Chromatography-Mass Spectroscopy (LC-MS) and gene expression (DNA microarray) measurements from the same samples, to identify genes controlling the production of metabolites. Due to the high dimensionality of both LC-MS and DNA microarray data, dimension reduction and variable selection are key elements of the analysis. Our proposed approach starts by identifying the basis functions ("building blocks") that constitute the output from a mass spectrometry experiment. Subsequently, the weights of these basis functions are related to the observations from the corresponding gene expression data in order to identify which genes are associated with specific patterns seen in the metabolite data. The modeling framework is extremely flexible as well as computationally fast and can accommodate treatment effects and other variables related to the experimental design. We demonstrate that within the proposed framework, genes regulating the production of specific metabolites can be identified correctly unless the variation in the noise is more than twice that of the signal.

U2 - 10.1371/journal.pone.0072116

DO - 10.1371/journal.pone.0072116

M3 - Journal article

C2 - 24086255

SN - 1932-6203

VL - 8

JO - P L o S One

JF - P L o S One

IS - 9

M1 - e72116

ER -

Integrative analysis of metabolomics and transcriptomics data: a unified model framework to identify underlying system pathways

Abstract

Access to Document

Fingerprint

Cite this