Materialized view selection using evolutionary algorithm for speeding up big data query processing

作者:Rajib Goswami, D. K Bhattacharyya, Malayananda Dutta

摘要

For speeding up query processing on Big Data, frequent sub-queries or views may be materialized such that the query processing cost is minimized with optimum cost of maintaining the materialized views and/or queries. Materializing frequent sub-queries and views means that resultant data set of the views reside in the memory of one or more nodes in the cluster, so that it reduces the MapReduce cost, submission and scheduling cost of Distributed File System jobs for query processing. We have defined materialized views as resultant data of frequent sub-queries and aggregation functions of a set of Big Data warehousing queries that are saved for enhancing query performance. The problem is defined as a multi-objective optimization problem for minimizing the total query processing MapReduce cost, MapReduce cost for maintaining the materialized views and the number of views selected for materializing with maximized total size of the views selected. We applied Differential Evolution algorithm and NSGA-II to study their performances for developing a recommendation system for selecting views for materializing in Big Data warehousing.

论文关键词:Big data warehouse, Differential evolution algorithm, Hadoop, Hive, Materialized view, Multi-objective optimization, NSGA-II

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-017-0455-6