L1-depth revisited: A robust angle-based outlier factor in high-dimensional space

Ninh Pham*

*Corresponding author for this work
    1 Citation (Scopus)

    Abstract

    Angle-based outlier detection (ABOD) has been recently emerged as an effective method to detect outliers in high dimensions. Instead of examining neighborhoods as proximity-based concepts, ABOD assesses the broadness of angle spectrum of a point as an outlier factor. Despite being a parameter-free and robust measure in high-dimensional space, the exact solution of ABOD suffers from the cubic cost O(n3 regarding the data size n, hence cannot be used on large-scale data sets. In this work we present a conceptual relationship between the ABOD intuition and the L1-depth concept in statistics, one of the earliest methods used for detecting outliers. Deriving from this relationship, we propose to use L1-depth as a variant of angle-based outlier factors, since it only requires a quadratic computational time as proximity-based outlier factors. Empirically, L1-depth is competitive (often superior) to proximity-based and other proposed angle-based outlier factors on detecting high-dimensional outliers regarding both efficiency and accuracy. In order to avoid the quadratic computational time, we introduce a simple but efficient sampling method named SamDepth for estimating L1-depth measure. We also present theoretical analysis to guarantee the reliability of SamDepth. The empirical experiments on many real-world high-dimensional data sets demonstrate that SamDepth with √n samples often achieves very competitive accuracy and runs several orders of magnitude faster than other proximity-based and ABOD competitors. Data related to this paper are available at: https://www.dropbox.com/s/nk7nqmwmdsatizs/Datasets.zip. Code related to this paper is available at: https://github.com/NinhPham/Outlier.

    Original languageEnglish
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings
    EditorsFrancesco Bonchi, Thomas Gärtner, Neil Hurley, Georgiana Ifrim, Michele Berlingerio
    Number of pages17
    Volume1
    PublisherSpringer
    Publication date2019
    Pages105-121
    ISBN (Print)9783030109240
    DOIs
    Publication statusPublished - 2019
    EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018 - Dublin, Ireland
    Duration: 10 Sept 201814 Sept 2018

    Conference

    ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018
    Country/TerritoryIreland
    CityDublin
    Period10/09/201814/09/2018
    SeriesLecture notes in computer science
    Volume11051
    ISSN0302-9743

    Fingerprint

    Dive into the research topics of 'L1-depth revisited: A robust angle-based outlier factor in high-dimensional space'. Together they form a unique fingerprint.

    Cite this