SOFA: An extensible logical optimizer for UDF-heavy data flows

作者:

Highlights:

• We identify a small set of UDF properties crucial for data flow optimization.

• SOFA employs taxonomy and reasoning to enhance extensibility of data flow languages.

• SOFA is capable of rewriting DAG-shaped data flows given proper operator annotations.

• We evaluate our approach using a diverse set of data flows across different domains.

• SOFA finds more efficient plans compared to existing data flow optimizers.

• Optimization with SOFA is even more beneficial when working on very large data sets.

摘要

Highlights•We identify a small set of UDF properties crucial for data flow optimization.•SOFA employs taxonomy and reasoning to enhance extensibility of data flow languages.•SOFA is capable of rewriting DAG-shaped data flows given proper operator annotations.•We evaluate our approach using a diverse set of data flows across different domains.•SOFA finds more efficient plans compared to existing data flow optimizers.•Optimization with SOFA is even more beneficial when working on very large data sets.

论文关键词:Data flow optimization,User-defined operators,Map/reduce

论文评审过程:Received 4 February 2015, Revised 4 April 2015, Accepted 7 April 2015, Available online 20 April 2015, Version of Record 25 May 2015.

论文官网地址:https://doi.org/10.1016/j.is.2015.04.002