Abstract
The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant item-set mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world dataseis. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in item-set mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.
Originalsprog | Dansk |
---|---|
Titel | Proceedings of the 2015 SIAM International Conference on Data Mining |
Antal sider | 9 |
Forlag | Society for Industrial and Applied Mathematics |
Publikationsdato | 2015 |
Sider | 37-45 |
ISBN (Elektronisk) | 978-1-61197-401-0 |
DOI | |
Status | Udgivet - 2015 |
Begivenhed | SIAM International Conference on Data Mining 2015 - Hotel Vancouver, British Columbia, Canada Varighed: 30 apr. 2015 → 2 maj 2015 |
Konference
Konference | SIAM International Conference on Data Mining 2015 |
---|---|
Lokation | Hotel Vancouver |
Land/Område | Canada |
By | British Columbia |
Periode | 30/04/2015 → 02/05/2015 |