Entropy as a Measure of Log Variability

Christoffer Olling Back; Søren Debois; Tijs Slaats

doi:10.1007/s13740-019-00105-3

Entropy as a Measure of Log Variability

Christoffer Olling Back^*, Søren Debois, Tijs Slaats

^*Corresponding author af dette arbejde

1 Citationer (Scopus)

Abstract

Process mining algorithms fall in two classes: imperative miners output flow diagrams, showing all possible paths, whereas declarative miners output constraints, showing the rules governing a process. But given a log, how do we know which of the two to apply? Assuming that logs exhibiting a large degree of variability are more suited for declarative miners, we can attempt to answer this question by defining a suitable measure of the variability of the log. This paper reports on an exploratory study into the use of entropy measures as metrics of variability. We survey notions of entropy used, e.g. in physics; we propose variant notions likely more suitable for the field of process mining; we provide an implementation of every entropy notion discussed; and we report entropy measures for a collection of both synthetic and real-life logs. Finally, based on anecdotal indications of which logs are better suited for declarative/imperative mining, we identify the most promising measures for future studies. For estimating overall entropy, global block and k-nearest neighbour estimators of entropy appear most promising and excel at identifying noise in logs. For estimating entropy rate we identify Lempel–Ziv and certain variants of k-block estimators performing well, and note that the former is more stable, but sensitive to noise, while the latter is less stable, being sensitive to cut-off constraints determining block size.

Originalsprog	Engelsk
Tidsskrift	Journal on Data Semantics
Vol/bind	8
Udgave nummer	2
Sider (fra-til)	129–156
Antal sider	28
ISSN	1861-2032
DOI	https://doi.org/10.1007/s13740-019-00105-3
Status	Udgivet - 1 jun. 2019

Emneord

Det Natur- og Biovidenskabelige Fakultet

Adgang til dokumentet

10.1007/s13740-019-00105-3

Citationsformater

@article{6138a20588854e0182f4aa3595788ac6,

title = "Entropy as a Measure of Log Variability",

abstract = "Process mining algorithms fall in two classes: imperative miners output flow diagrams, showing all possible paths, whereas declarative miners output constraints, showing the rules governing a process. But given a log, how do we know which of the two to apply? Assuming that logs exhibiting a large degree of variability are more suited for declarative miners, we can attempt to answer this question by defining a suitable measure of the variability of the log. This paper reports on an exploratory study into the use of entropy measures as metrics of variability. We survey notions of entropy used, e.g. in physics; we propose variant notions likely more suitable for the field of process mining; we provide an implementation of every entropy notion discussed; and we report entropy measures for a collection of both synthetic and real-life logs. Finally, based on anecdotal indications of which logs are better suited for declarative/imperative mining, we identify the most promising measures for future studies. For estimating overall entropy, global block and k-nearest neighbour estimators of entropy appear most promising and excel at identifying noise in logs. For estimating entropy rate we identify Lempel–Ziv and certain variants of k-block estimators performing well, and note that the former is more stable, but sensitive to noise, while the latter is less stable, being sensitive to cut-off constraints determining block size. ",

keywords = "Faculty of Science, Process Mining, Hybrid Models, Process Variability, Process Flexbility, Information Theory, Entropy, Knowledge Work",

author = "Back, {Christoffer Olling} and S{\o}ren Debois and Tijs Slaats",

year = "2019",

month = jun,

day = "1",

doi = "10.1007/s13740-019-00105-3",

language = "English",

volume = "8",

pages = "129–156",

journal = "Journal on Data Semantics",

issn = "1861-2032",

publisher = "springer verlag (germany)",

number = "2",

}

TY - JOUR

T1 - Entropy as a Measure of Log Variability

AU - Back, Christoffer Olling

AU - Debois, Søren

AU - Slaats, Tijs

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Process mining algorithms fall in two classes: imperative miners output flow diagrams, showing all possible paths, whereas declarative miners output constraints, showing the rules governing a process. But given a log, how do we know which of the two to apply? Assuming that logs exhibiting a large degree of variability are more suited for declarative miners, we can attempt to answer this question by defining a suitable measure of the variability of the log. This paper reports on an exploratory study into the use of entropy measures as metrics of variability. We survey notions of entropy used, e.g. in physics; we propose variant notions likely more suitable for the field of process mining; we provide an implementation of every entropy notion discussed; and we report entropy measures for a collection of both synthetic and real-life logs. Finally, based on anecdotal indications of which logs are better suited for declarative/imperative mining, we identify the most promising measures for future studies. For estimating overall entropy, global block and k-nearest neighbour estimators of entropy appear most promising and excel at identifying noise in logs. For estimating entropy rate we identify Lempel–Ziv and certain variants of k-block estimators performing well, and note that the former is more stable, but sensitive to noise, while the latter is less stable, being sensitive to cut-off constraints determining block size.

AB - Process mining algorithms fall in two classes: imperative miners output flow diagrams, showing all possible paths, whereas declarative miners output constraints, showing the rules governing a process. But given a log, how do we know which of the two to apply? Assuming that logs exhibiting a large degree of variability are more suited for declarative miners, we can attempt to answer this question by defining a suitable measure of the variability of the log. This paper reports on an exploratory study into the use of entropy measures as metrics of variability. We survey notions of entropy used, e.g. in physics; we propose variant notions likely more suitable for the field of process mining; we provide an implementation of every entropy notion discussed; and we report entropy measures for a collection of both synthetic and real-life logs. Finally, based on anecdotal indications of which logs are better suited for declarative/imperative mining, we identify the most promising measures for future studies. For estimating overall entropy, global block and k-nearest neighbour estimators of entropy appear most promising and excel at identifying noise in logs. For estimating entropy rate we identify Lempel–Ziv and certain variants of k-block estimators performing well, and note that the former is more stable, but sensitive to noise, while the latter is less stable, being sensitive to cut-off constraints determining block size.

KW - Faculty of Science

KW - Process Mining

KW - Hybrid Models

KW - Process Variability

KW - Process Flexbility

KW - Information Theory

KW - Entropy

KW - Knowledge Work

U2 - 10.1007/s13740-019-00105-3

DO - 10.1007/s13740-019-00105-3

M3 - Journal article

SN - 1861-2032

VL - 8

SP - 129

EP - 156

JO - Journal on Data Semantics

JF - Journal on Data Semantics

IS - 2

ER -

Entropy as a Measure of Log Variability

Abstract

Emneord

Adgang til dokumentet

Fingeraftryk

Citationsformater