Towards Low-Latency Batched Stream Processing by Pre-Scheduling

Hai Jin, Fei Chen, Song Wu, Yin Yao, Zhiyi Liu, Lin Gu, Yongluan Zhou

    3 Citations (Scopus)
    5 Downloads (Pure)

    Abstract

    Many stream processing frameworks have been developed to meet the requirements of real-time processing. Among them, batched stream processing frameworks are widely advocated with the consideration of their fault-tolerance, high throughput and unified runtime with batch processing. In batched stream processing frameworks, straggler, happened due to the uneven task execution time, has been regarded as a major hurdle of latency-sensitive applications. Existing straggler mitigation techniques, operating in either reactive or proactive manner, are all post-scheduling methods, and therefore inevitably result in high resource overhead or long job completion time. We notice that batched stream processing jobs are usually recurring with predictable characteristics. By exploring such a heuristic, we present a pre-scheduling straggler mitigation framework called Lever. Lever first identifies potential stragglers and evaluates nodes' capacity by analyzing execution information of historical jobs. Then, Lever carefully pre-schedules job input data to each node before task scheduling so as to mitigate potential stragglers. We implement Lever and contribute it as an extension of Apache Spark Streaming. Our experimental results show that Lever can reduce job completion time by 30.72 to 42.19 percent over Spark Streaming, a widely adopted batched stream processing system and outperforms traditional techniques significantly.

    Original languageEnglish
    Article number8444732
    JournalIEEE Transactions on Parallel and Distributed Systems
    Volume30
    Issue number3
    Pages (from-to)710-722
    ISSN1045-9219
    DOIs
    Publication statusPublished - 1 Mar 2019

    Keywords

    • data assignment
    • recurring jobs
    • scheduling
    • straggler
    • stream processing

    Fingerprint

    Dive into the research topics of 'Towards Low-Latency Batched Stream Processing by Pre-Scheduling'. Together they form a unique fingerprint.

    Cite this