Progressive recovery of correlated failures in distributed stream processing engines

3 Citations (Scopus)

Abstract

Correlated failures in large-scale clusters have significant effects on systems’ availability, especially for streaming data applications that run continuously and require low processing latency. Most state-of-the-art distributed stream processing engines (DSPEs) adopt a blocking recovery paradigm, which, upon correlated failure, would block the progress of recovery until sufficient new resources for recovery are available. As the arrival of new resources is usually progressive, a blocking paradigm fails to minimize the recovery latency. To address this problem, we propose a progressive and query-centric recovery paradigm where the recovery of the failed operators would be carefully scheduled to progressively recover the outputs of queries as early as possible based on the current availability of resources. In this work, we propose and implement a fault-tolerance framework which supports progressive recovery after correlated failures with minimum overhead during the system’s normal execution. We also formulate the new problem of recovery scheduling under correlated failures and design effective algorithms to optimize the recovery latency. The proposed methods are implemented on Apache Storm and preliminary experiments are conducted to verify their validity.

Original languageEnglish
Title of host publicationAdvances in Database Technology : Proceedings of the 20th International Conference on Extending Database Technology
EditorsVolker Markl, Salvatore Orlando, Bernhard Mitschang, Periklis Andritsos, Kai-Uwe Sattler, Sebastian Breß
Number of pages4
PublisherOpenProceedings.org
Publication date2017
Pages518-521
ISBN (Electronic)978-3-89318-073-8
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event20th International Conference on Extending Database Technology - Venedig, Italy
Duration: 21 Mar 201724 Mar 2017
Conference number: 20

Conference

Conference20th International Conference on Extending Database Technology
Number20
Country/TerritoryItaly
CityVenedig
Period21/03/201724/03/2017
SeriesAdvances in Database Technology
Volume2017
ISSN2367-2005

Fingerprint

Dive into the research topics of 'Progressive recovery of correlated failures in distributed stream processing engines'. Together they form a unique fingerprint.

Cite this