Query-centric failure recovery for distributed stream processing engines

    Abstract

    Correlated failures that usually involve a number of nodes failing simultaneously have significant effect on systems' availability, especially for streaming applications that require real-Time analysis. Most state-of-The-Art distributed stream processing engines focus on recovering individual operator failure. By analyzing the existing recovery techniques, we identify the challenges and propose a fault-Tolerance framework that can tolerate both individual and correlated failures with minimum overhead during the system's normal execution. Our progressive and query-centric recovery paradigm carefully schedules the recovery of failed operators based on the current availability of resources, such that the outputs of queries can be recovered as early as possible. We also formulate the new problem of recovery scheduling under correlated failures and design algorithms to optimize the recovery latency with a performance guarantee.

    OriginalsprogEngelsk
    TidsskriftProceedings - International Conference on Data Engineering
    Vol/bind2018
    Sider (fra-til)1280-1283
    Antal sider4
    ISSN1084-4627
    DOI
    StatusUdgivet - 24 okt. 2018
    Begivenhed34th IEEE International Conference on Data Engineering, ICDE 2018 - Paris, Frankrig
    Varighed: 16 apr. 201819 apr. 2018

    Konference

    Konference34th IEEE International Conference on Data Engineering, ICDE 2018
    Land/OmrådeFrankrig
    ByParis
    Periode16/04/201819/04/2018

    Fingeraftryk

    Dyk ned i forskningsemnerne om 'Query-centric failure recovery for distributed stream processing engines'. Sammen danner de et unikt fingeraftryk.

    Citationsformater