Experimenting with big data computing for scaling data quality-aware query processing
作者:
Highlights:
• Empirical study aimed at “scaling up” data quality (DQ) management applications.
• Execution of data quality-aware queries over sensor-collected traffic data sets.
• Exploration of Apache Spark and Pandas library to speed-up query execution.
• Insights on choice of computational infrastructure to deploy DQ management tools.
摘要
•Empirical study aimed at “scaling up” data quality (DQ) management applications.•Execution of data quality-aware queries over sensor-collected traffic data sets.•Exploration of Apache Spark and Pandas library to speed-up query execution.•Insights on choice of computational infrastructure to deploy DQ management tools.
论文关键词:Data quality-aware queries,Big data computing,Empirical evaluation
论文评审过程:Received 27 November 2019, Revised 9 July 2020, Accepted 3 March 2021, Available online 13 March 2021, Version of Record 19 April 2021.
论文官网地址:https://doi.org/10.1016/j.eswa.2021.114858