Adaptive priority-based data placement and multi-task scheduling in geo-distributed cloud systems

作者:

Highlights:

摘要

With the rapid development and the widespread use of cloud computing in various applications, the number of users distributed in different regions has grown exponentially. Therefore, the Geo-distributed cloud systems have become a research hotspot and big data processing technology has also emerged. Nowadays, the most widely used big data processing framework is Spark. However, massive amounts of data are generated every moment, and the processing procedure becomes more and more complex, the execution efficiency of Spark has been greatly affected. In the Spark frame of geo-distributed cloud systems, aiming at the data placement problem, the data placement strategy based on RDD dynamic weight is introduced. The target node is selected with a strong computation capacity to place the data. Aiming at the problems of multi-task scheduling, a task will be scheduled to a node whose computation capacity can satisfy the requirement of this task. And then considering job classification and computing node performance, the optimized task scheduling strategy is in traduced. Experiments show that our algorithms can effectively adjust the weight of node data placement according to the actual task execution information, and shorten the task execution time.

论文关键词:Distributed cloud,Data stream,Spark frame,Multi-task scheduling

论文评审过程:Received 27 January 2020, Revised 1 April 2021, Accepted 13 April 2021, Available online 24 April 2021, Version of Record 27 April 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107050