Understanding data fusion within the framework of coupled matrix and tensor factorizations

Acar Ataman Evrim; Morten Arendt Rasmussen; Francesco Savorani; Tormod Næs; Rasmus Bro

doi:10.1016/j.chemolab.2013.06.006

Understanding data fusion within the framework of coupled matrix and tensor factorizations

Acar Ataman Evrim, Morten Arendt Rasmussen, Francesco Savorani, Tormod Næs, Rasmus Bro

Food Analytics and Biotechnology

61 Citationer (Scopus)

Abstract

Recent technological advances enable us to collect huge amounts of data from multiple sources. Jointly analyzing such multi-relational data from different sources, i.e., data fusion (also called multi-block, multi-view or multi-set data analysis), often enhances knowledge discovery. For instance, in metabolomics, biological fluids are measured using a variety of analytical techniques such as Liquid Chromatography-Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy. Data measured using different analytical methods may be complementary and their fusion may help in the identification of chemicals related to certain diseases. Data fusion has proved useful in many fields including social network analysis, collaborative filtering, neuroscience and bioinformatics. In this paper, unlike many studies demonstrating the success of data fusion, we explore the limitations as well as the advantages of data fusion. We formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem, which jointly factorizes multiple data sets in the form of higher-order tensors and matrices by extracting a common latent structure from the shared mode. Using numerical experiments on simulated and real data sets, we assess the performance of coupled analysis compared to the analysis of a single data set in terms of missing data estimation and demonstrate cases where coupled analysis outperforms analysis of a single data set and vice versa.

Originalsprog	Engelsk
Tidsskrift	Chemometrics and Intelligent Laboratory Systems
Vol/bind	129
Sider (fra-til)	53-63
Antal sider	11
ISSN	0169-7439
DOI	https://doi.org/10.1016/j.chemolab.2013.06.006
Status	Udgivet - 15 nov. 2013

Adgang til dokumentet

10.1016/j.chemolab.2013.06.006

Citationsformater

@article{dd780b973eb24dcc8700ce4b1b75c995,

title = "Understanding data fusion within the framework of coupled matrix and tensor factorizations",

abstract = "Recent technological advances enable us to collect huge amounts of data from multiple sources. Jointly analyzing such multi-relational data from different sources, i.e., data fusion (also called multi-block, multi-view or multi-set data analysis), often enhances knowledge discovery. For instance, in metabolomics, biological fluids are measured using a variety of analytical techniques such as Liquid Chromatography-Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy. Data measured using different analytical methods may be complementary and their fusion may help in the identification of chemicals related to certain diseases. Data fusion has proved useful in many fields including social network analysis, collaborative filtering, neuroscience and bioinformatics. In this paper, unlike many studies demonstrating the success of data fusion, we explore the limitations as well as the advantages of data fusion. We formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem, which jointly factorizes multiple data sets in the form of higher-order tensors and matrices by extracting a common latent structure from the shared mode. Using numerical experiments on simulated and real data sets, we assess the performance of coupled analysis compared to the analysis of a single data set in terms of missing data estimation and demonstrate cases where coupled analysis outperforms analysis of a single data set and vice versa.",

author = "Evrim, {Acar Ataman} and Rasmussen, {Morten Arendt} and Francesco Savorani and Tormod N{\ae}s and Rasmus Bro",

year = "2013",

month = nov,

day = "15",

doi = "10.1016/j.chemolab.2013.06.006",

language = "English",

volume = "129",

pages = "53--63",

journal = "Chemometrics and Intelligent Laboratory Systems",

issn = "0169-7439",

publisher = "Elsevier",

}

TY - JOUR

T1 - Understanding data fusion within the framework of coupled matrix and tensor factorizations

AU - Evrim, Acar Ataman

AU - Rasmussen, Morten Arendt

AU - Savorani, Francesco

AU - Næs, Tormod

AU - Bro, Rasmus

PY - 2013/11/15

Y1 - 2013/11/15

N2 - Recent technological advances enable us to collect huge amounts of data from multiple sources. Jointly analyzing such multi-relational data from different sources, i.e., data fusion (also called multi-block, multi-view or multi-set data analysis), often enhances knowledge discovery. For instance, in metabolomics, biological fluids are measured using a variety of analytical techniques such as Liquid Chromatography-Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy. Data measured using different analytical methods may be complementary and their fusion may help in the identification of chemicals related to certain diseases. Data fusion has proved useful in many fields including social network analysis, collaborative filtering, neuroscience and bioinformatics. In this paper, unlike many studies demonstrating the success of data fusion, we explore the limitations as well as the advantages of data fusion. We formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem, which jointly factorizes multiple data sets in the form of higher-order tensors and matrices by extracting a common latent structure from the shared mode. Using numerical experiments on simulated and real data sets, we assess the performance of coupled analysis compared to the analysis of a single data set in terms of missing data estimation and demonstrate cases where coupled analysis outperforms analysis of a single data set and vice versa.

AB - Recent technological advances enable us to collect huge amounts of data from multiple sources. Jointly analyzing such multi-relational data from different sources, i.e., data fusion (also called multi-block, multi-view or multi-set data analysis), often enhances knowledge discovery. For instance, in metabolomics, biological fluids are measured using a variety of analytical techniques such as Liquid Chromatography-Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy. Data measured using different analytical methods may be complementary and their fusion may help in the identification of chemicals related to certain diseases. Data fusion has proved useful in many fields including social network analysis, collaborative filtering, neuroscience and bioinformatics. In this paper, unlike many studies demonstrating the success of data fusion, we explore the limitations as well as the advantages of data fusion. We formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem, which jointly factorizes multiple data sets in the form of higher-order tensors and matrices by extracting a common latent structure from the shared mode. Using numerical experiments on simulated and real data sets, we assess the performance of coupled analysis compared to the analysis of a single data set in terms of missing data estimation and demonstrate cases where coupled analysis outperforms analysis of a single data set and vice versa.

U2 - 10.1016/j.chemolab.2013.06.006

DO - 10.1016/j.chemolab.2013.06.006

M3 - Journal article

SN - 0169-7439

VL - 129

SP - 53

EP - 63

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

ER -

Understanding data fusion within the framework of coupled matrix and tensor factorizations

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater