Significant subgraph mining with multiple testing correction

Mahito Sugiyama; Felipe Llinares López; Niklas Kasenburg; Karsten M. Borgwardt

doi:10.1137/1.9781611974010.5

Significant subgraph mining with multiple testing correction

Mahito Sugiyama, Felipe Llinares López, Niklas Kasenburg, Karsten M. Borgwardt

Datalogisk Institut

18 Citationer (Scopus)

Abstract

The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant item-set mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world dataseis. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in item-set mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.

Originalsprog	Dansk
Titel	Proceedings of the 2015 SIAM International Conference on Data Mining
Antal sider	9
Forlag	Society for Industrial and Applied Mathematics
Publikationsdato	2015
Sider	37-45
ISBN (Elektronisk)	978-1-61197-401-0
DOI	https://doi.org/10.1137/1.9781611974010.5
Status	Udgivet - 2015
Begivenhed	SIAM International Conference on Data Mining 2015 - Hotel Vancouver, British Columbia, Canada Varighed: 30 apr. 2015 → 2 maj 2015

Konference

Konference	SIAM International Conference on Data Mining 2015
Lokation	Hotel Vancouver
Land/Område	Canada
By	British Columbia
Periode	30/04/2015 → 02/05/2015

Adgang til dokumentet

10.1137/1.9781611974010.5

Significant Subgraph Mining with Multiple Testing CorrectionForlagets udgivne version, 200 KB

Citationsformater

Significant subgraph mining with multiple testing correction. / Sugiyama, Mahito; López, Felipe Llinares; Kasenburg, Niklas et al.

Proceedings of the 2015 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2015. s. 37-45.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

Sugiyama, M, López, FL, Kasenburg, N & Borgwardt, KM 2015, Significant subgraph mining with multiple testing correction. i Proceedings of the 2015 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, s. 37-45, SIAM International Conference on Data Mining 2015, British Columbia, Canada, 30/04/2015. https://doi.org/10.1137/1.9781611974010.5

@inproceedings{f3375ad8cde944e69369d3ee1c52733e,

title = "Significant subgraph mining with multiple testing correction",

abstract = "The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant item-set mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world dataseis. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in item-set mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.",

author = "Mahito Sugiyama and L{\'o}pez, {Felipe Llinares} and Niklas Kasenburg and Borgwardt, {Karsten M.}",

year = "2015",

doi = "10.1137/1.9781611974010.5",

language = "Dansk",

pages = "37--45",

booktitle = "Proceedings of the 2015 SIAM International Conference on Data Mining",

publisher = "Society for Industrial and Applied Mathematics",

address = "USA",

note = "SIAM International Conference on Data Mining 2015 ; Conference date: 30-04-2015 Through 02-05-2015",

}

TY - GEN

T1 - Significant subgraph mining with multiple testing correction

AU - Sugiyama, Mahito

AU - López, Felipe Llinares

AU - Kasenburg, Niklas

AU - Borgwardt, Karsten M.

PY - 2015

Y1 - 2015

N2 - The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant item-set mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world dataseis. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in item-set mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.

AB - The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant item-set mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world dataseis. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in item-set mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.

U2 - 10.1137/1.9781611974010.5

DO - 10.1137/1.9781611974010.5

M3 - Konferencebidrag i proceedings

SP - 37

EP - 45

BT - Proceedings of the 2015 SIAM International Conference on Data Mining

PB - Society for Industrial and Applied Mathematics

T2 - SIAM International Conference on Data Mining 2015

Y2 - 30 April 2015 through 2 May 2015

ER -