Experimenting with big data computing for scaling data quality-aware query processing

作者:

Highlights:

• Empirical study aimed at “scaling up” data quality (DQ) management applications.

• Execution of data quality-aware queries over sensor-collected traffic data sets.

• Exploration of Apache Spark and Pandas library to speed-up query execution.

• Insights on choice of computational infrastructure to deploy DQ management tools.

摘要

•Empirical study aimed at “scaling up” data quality (DQ) management applications.•Execution of data quality-aware queries over sensor-collected traffic data sets.•Exploration of Apache Spark and Pandas library to speed-up query execution.•Insights on choice of computational infrastructure to deploy DQ management tools.

论文关键词:Data quality-aware queries,Big data computing,Empirical evaluation

论文评审过程:Received 27 November 2019, Revised 9 July 2020, Accepted 3 March 2021, Available online 13 March 2021, Version of Record 19 April 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.114858