Significant subgraph mining with multiple testing correction

Mahito Sugiyama, Felipe Llinares López, Niklas Kasenburg, Karsten M. Borgwardt

18 Citationer (Scopus)

Abstract

The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant item-set mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world dataseis. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in item-set mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.

OriginalsprogDansk
TitelProceedings of the 2015 SIAM International Conference on Data Mining
Antal sider9
ForlagSociety for Industrial and Applied Mathematics
Publikationsdato2015
Sider37-45
ISBN (Elektronisk)978-1-61197-401-0
DOI
StatusUdgivet - 2015
BegivenhedSIAM International Conference on Data Mining 2015 - Hotel Vancouver, British Columbia, Canada
Varighed: 30 apr. 20152 maj 2015

Konference

KonferenceSIAM International Conference on Data Mining 2015
LokationHotel Vancouver
Land/OmrådeCanada
ByBritish Columbia
Periode30/04/201502/05/2015

Citationsformater