CBP: a new parallelization paradigm for massively distributed stream processing

Qingsong Guo, Yongluan Zhou

39 Downloads (Pure)

Abstract

Resource efficiency is essential for distributed stream processing engines (DSPEs), in which a streaming application is modeled as an operator graph where each operator is parallelized into a number of instances to meet the low-latency and high-throughput requirements. The major objectives of optimizing resource efficiency in DSPEs include minimizing the communication cost by collocating the tasks that transfer a lot of data between each other, and by dynamically configuring the systems according to the load variations at runtime. In the current literature, most proposals handle these two optimizations separately, and a shallow integration of these techniques, such as performing the two optimizations one after another, would result in a suboptimal solution. In this paper, we present component-based parallelization (CBP), a new paradigm for optimizing the resource efficiency of DSPEs, which provides a framework for a deeper integration of the two optimizations. In the CBP paradigm, the operators are encapsulated into a set of non-overlapping components, in which operators are parallelized consistently, i.e., using the same partitioning key, and hence the intra-component communication is eliminated. According to the changes of workload, each component can be adaptively partitioned into multiple instances, each of which is deployed on a computing node. We build a cost model to capture both the communication cost and adaptation cost of a CBP plan, and then propose several optimization algorithms. We implement the CBP scheme and the optimization algorithms on top of Apache Storm, and verify its efficiency by an extensive experiment study.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications : 22nd International Conference, DASFAA 2017, Suzhou, China, March 27-30, 2017, Proceedings, Part II
EditorsSelçuk Candan, Lei Chen, Torben Bach Pedersen, Lijun Chang, Wen Hua
Number of pages17
VolumePart II
PublisherSpringer
Publication date2017
Pages304-320
ISBN (Print)978-3-319-55698-7
ISBN (Electronic)978-3-319-55699-4
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event22nd International Conference on Database Systems for Advanced Applications - Suzhou, China
Duration: 27 Mar 201730 Mar 2017
Conference number: 22

Conference

Conference22nd International Conference on Database Systems for Advanced Applications
Number22
Country/TerritoryChina
CitySuzhou
Period27/03/201730/03/2017

Fingerprint

Dive into the research topics of 'CBP: a new parallelization paradigm for massively distributed stream processing'. Together they form a unique fingerprint.

Cite this