Abstract
Resource efficiency is essential for distributed stream processing engines (DSPEs), in which a streaming application is modeled as an operator graph where each operator is parallelized into a number of instances to meet the low-latency and high-throughput requirements. The major objectives of optimizing resource efficiency in DSPEs include minimizing the communication cost by collocating the tasks that transfer a lot of data between each other, and by dynamically configuring the systems according to the load variations at runtime. In the current literature, most proposals handle these two optimizations separately, and a shallow integration of these techniques, such as performing the two optimizations one after another, would result in a suboptimal solution. In this paper, we present component-based parallelization (CBP), a new paradigm for optimizing the resource efficiency of DSPEs, which provides a framework for a deeper integration of the two optimizations. In the CBP paradigm, the operators are encapsulated into a set of non-overlapping components, in which operators are parallelized consistently, i.e., using the same partitioning key, and hence the intra-component communication is eliminated. According to the changes of workload, each component can be adaptively partitioned into multiple instances, each of which is deployed on a computing node. We build a cost model to capture both the communication cost and adaptation cost of a CBP plan, and then propose several optimization algorithms. We implement the CBP scheme and the optimization algorithms on top of Apache Storm, and verify its efficiency by an extensive experiment study.
Originalsprog | Engelsk |
---|---|
Titel | Database Systems for Advanced Applications : 22nd International Conference, DASFAA 2017, Suzhou, China, March 27-30, 2017, Proceedings, Part II |
Redaktører | Selçuk Candan, Lei Chen, Torben Bach Pedersen, Lijun Chang, Wen Hua |
Antal sider | 17 |
Vol/bind | Part II |
Forlag | Springer |
Publikationsdato | 2017 |
Sider | 304-320 |
ISBN (Trykt) | 978-3-319-55698-7 |
ISBN (Elektronisk) | 978-3-319-55699-4 |
DOI | |
Status | Udgivet - 2017 |
Udgivet eksternt | Ja |
Begivenhed | 22nd International Conference on Database Systems for Advanced Applications - Suzhou, Kina Varighed: 27 mar. 2017 → 30 mar. 2017 Konferencens nummer: 22 |
Konference
Konference | 22nd International Conference on Database Systems for Advanced Applications |
---|---|
Nummer | 22 |
Land/Område | Kina |
By | Suzhou |
Periode | 27/03/2017 → 30/03/2017 |