Link prediction in heterogeneous data via generalized coupled tensor factorization

Beyza Ermis; Acar Ataman Evrim; A. Taylan Cemgil

doi:10.1007/s10618-013-0341-y

Link prediction in heterogeneous data via generalized coupled tensor factorization

Beyza Ermis, Acar Ataman Evrim, A. Taylan Cemgil

Food Analytics and Biotechnology

75 Citationer (Scopus)

Abstract

This study deals with missing link prediction, the problem of predicting the existence of missing connections between entities of interest. We approach the problem as filling in missing entries in a relational dataset represented by several matrices and multiway arrays, that will be simply called tensors. Consequently, we address the link prediction problem by data fusion formulated as simultaneous factorization of several observation tensors where latent factors are shared among each observation. Previous studies on joint factorization of such heterogeneous datasets have focused on a single loss function (mainly squared Euclidean distance or Kullback–Leibler-divergence) and specific tensor factorization models (CANDECOMP/PARAFAC and/or Tucker). However, in this paper, we study various alternative tensor models as well as loss functions including the ones already studied in the literature using the generalized coupled tensor factorization framework. Through extensive experiments on two real-world datasets, we demonstrate that (i) joint analysis of data from multiple sources via coupled factorization significantly improves the link prediction performance, (ii) selection of a suitable loss function and a tensor factorization model is crucial for accurate missing link prediction and loss functions that have not been studied for link prediction before may outperform the commonly-used loss functions, (iii) joint factorization of datasets can handle difficult cases, such as the cold start problem that arises when a new entity enters the dataset, and (iv) our approach is scalable to large-scale data.

Originalsprog	Engelsk
Tidsskrift	Data Mining and Knowledge Discovery
Vol/bind	29
Udgave nummer	1
Sider (fra-til)	203-236
Antal sider	34
ISSN	1384-5810
DOI	https://doi.org/10.1007/s10618-013-0341-y
Status	Udgivet - jan. 2013

Adgang til dokumentet

10.1007/s10618-013-0341-y

Citationsformater

@article{f6f1fb293c7444efb4140633b8ff058d,

title = "Link prediction in heterogeneous data via generalized coupled tensor factorization",

abstract = "This study deals with missing link prediction, the problem of predicting the existence of missing connections between entities of interest. We approach the problem as filling in missing entries in a relational dataset represented by several matrices and multiway arrays, that will be simply called tensors. Consequently, we address the link prediction problem by data fusion formulated as simultaneous factorization of several observation tensors where latent factors are shared among each observation. Previous studies on joint factorization of such heterogeneous datasets have focused on a single loss function (mainly squared Euclidean distance or Kullback–Leibler-divergence) and specific tensor factorization models (CANDECOMP/PARAFAC and/or Tucker). However, in this paper, we study various alternative tensor models as well as loss functions including the ones already studied in the literature using the generalized coupled tensor factorization framework. Through extensive experiments on two real-world datasets, we demonstrate that (i) joint analysis of data from multiple sources via coupled factorization significantly improves the link prediction performance, (ii) selection of a suitable loss function and a tensor factorization model is crucial for accurate missing link prediction and loss functions that have not been studied for link prediction before may outperform the commonly-used loss functions, (iii) joint factorization of datasets can handle difficult cases, such as the cold start problem that arises when a new entity enters the dataset, and (iv) our approach is scalable to large-scale data.",

author = "Beyza Ermis and Evrim, {Acar Ataman} and Cemgil, {A. Taylan}",

year = "2013",

month = jan,

doi = "10.1007/s10618-013-0341-y",

language = "English",

volume = "29",

pages = "203--236",

journal = "Data Mining and Knowledge Discovery",

issn = "1384-5810",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - Link prediction in heterogeneous data via generalized coupled tensor factorization

AU - Ermis, Beyza

AU - Evrim, Acar Ataman

AU - Cemgil, A. Taylan

PY - 2013/1

Y1 - 2013/1

N2 - This study deals with missing link prediction, the problem of predicting the existence of missing connections between entities of interest. We approach the problem as filling in missing entries in a relational dataset represented by several matrices and multiway arrays, that will be simply called tensors. Consequently, we address the link prediction problem by data fusion formulated as simultaneous factorization of several observation tensors where latent factors are shared among each observation. Previous studies on joint factorization of such heterogeneous datasets have focused on a single loss function (mainly squared Euclidean distance or Kullback–Leibler-divergence) and specific tensor factorization models (CANDECOMP/PARAFAC and/or Tucker). However, in this paper, we study various alternative tensor models as well as loss functions including the ones already studied in the literature using the generalized coupled tensor factorization framework. Through extensive experiments on two real-world datasets, we demonstrate that (i) joint analysis of data from multiple sources via coupled factorization significantly improves the link prediction performance, (ii) selection of a suitable loss function and a tensor factorization model is crucial for accurate missing link prediction and loss functions that have not been studied for link prediction before may outperform the commonly-used loss functions, (iii) joint factorization of datasets can handle difficult cases, such as the cold start problem that arises when a new entity enters the dataset, and (iv) our approach is scalable to large-scale data.

AB - This study deals with missing link prediction, the problem of predicting the existence of missing connections between entities of interest. We approach the problem as filling in missing entries in a relational dataset represented by several matrices and multiway arrays, that will be simply called tensors. Consequently, we address the link prediction problem by data fusion formulated as simultaneous factorization of several observation tensors where latent factors are shared among each observation. Previous studies on joint factorization of such heterogeneous datasets have focused on a single loss function (mainly squared Euclidean distance or Kullback–Leibler-divergence) and specific tensor factorization models (CANDECOMP/PARAFAC and/or Tucker). However, in this paper, we study various alternative tensor models as well as loss functions including the ones already studied in the literature using the generalized coupled tensor factorization framework. Through extensive experiments on two real-world datasets, we demonstrate that (i) joint analysis of data from multiple sources via coupled factorization significantly improves the link prediction performance, (ii) selection of a suitable loss function and a tensor factorization model is crucial for accurate missing link prediction and loss functions that have not been studied for link prediction before may outperform the commonly-used loss functions, (iii) joint factorization of datasets can handle difficult cases, such as the cold start problem that arises when a new entity enters the dataset, and (iv) our approach is scalable to large-scale data.

U2 - 10.1007/s10618-013-0341-y

DO - 10.1007/s10618-013-0341-y

M3 - Journal article

SN - 1384-5810

VL - 29

SP - 203

EP - 236

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

IS - 1

ER -

Link prediction in heterogeneous data via generalized coupled tensor factorization

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater