TY - GEN
T1 - L1-depth revisited
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018
AU - Pham, Ninh
PY - 2019
Y1 - 2019
N2 - Angle-based outlier detection (ABOD) has been recently emerged as an effective method to detect outliers in high dimensions. Instead of examining neighborhoods as proximity-based concepts, ABOD assesses the broadness of angle spectrum of a point as an outlier factor. Despite being a parameter-free and robust measure in high-dimensional space, the exact solution of ABOD suffers from the cubic cost O(n3 regarding the data size n, hence cannot be used on large-scale data sets. In this work we present a conceptual relationship between the ABOD intuition and the L1-depth concept in statistics, one of the earliest methods used for detecting outliers. Deriving from this relationship, we propose to use L1-depth as a variant of angle-based outlier factors, since it only requires a quadratic computational time as proximity-based outlier factors. Empirically, L1-depth is competitive (often superior) to proximity-based and other proposed angle-based outlier factors on detecting high-dimensional outliers regarding both efficiency and accuracy. In order to avoid the quadratic computational time, we introduce a simple but efficient sampling method named SamDepth for estimating L1-depth measure. We also present theoretical analysis to guarantee the reliability of SamDepth. The empirical experiments on many real-world high-dimensional data sets demonstrate that SamDepth with √n samples often achieves very competitive accuracy and runs several orders of magnitude faster than other proximity-based and ABOD competitors. Data related to this paper are available at: https://www.dropbox.com/s/nk7nqmwmdsatizs/Datasets.zip. Code related to this paper is available at: https://github.com/NinhPham/Outlier.
AB - Angle-based outlier detection (ABOD) has been recently emerged as an effective method to detect outliers in high dimensions. Instead of examining neighborhoods as proximity-based concepts, ABOD assesses the broadness of angle spectrum of a point as an outlier factor. Despite being a parameter-free and robust measure in high-dimensional space, the exact solution of ABOD suffers from the cubic cost O(n3 regarding the data size n, hence cannot be used on large-scale data sets. In this work we present a conceptual relationship between the ABOD intuition and the L1-depth concept in statistics, one of the earliest methods used for detecting outliers. Deriving from this relationship, we propose to use L1-depth as a variant of angle-based outlier factors, since it only requires a quadratic computational time as proximity-based outlier factors. Empirically, L1-depth is competitive (often superior) to proximity-based and other proposed angle-based outlier factors on detecting high-dimensional outliers regarding both efficiency and accuracy. In order to avoid the quadratic computational time, we introduce a simple but efficient sampling method named SamDepth for estimating L1-depth measure. We also present theoretical analysis to guarantee the reliability of SamDepth. The empirical experiments on many real-world high-dimensional data sets demonstrate that SamDepth with √n samples often achieves very competitive accuracy and runs several orders of magnitude faster than other proximity-based and ABOD competitors. Data related to this paper are available at: https://www.dropbox.com/s/nk7nqmwmdsatizs/Datasets.zip. Code related to this paper is available at: https://github.com/NinhPham/Outlier.
UR - http://www.scopus.com/inward/record.url?scp=85061153578&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-10925-7_7
DO - 10.1007/978-3-030-10925-7_7
M3 - Article in proceedings
AN - SCOPUS:85061153578
SN - 9783030109240
VL - 1
T3 - Lecture notes in computer science
SP - 105
EP - 121
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings
A2 - Bonchi, Francesco
A2 - Gärtner, Thomas
A2 - Hurley, Neil
A2 - Ifrim, Georgiana
A2 - Berlingerio, Michele
PB - Springer
Y2 - 10 September 2018 through 14 September 2018
ER -