A provenance model for control-flow driven scientific workflows

作者:

Highlights:

摘要

Provenance in the context of workflows, both for their specification and for the data they derive, is essential for result reproducibility, sharing, and knowledge reuse in the scientific community. The information models to capture the provenance of scientific workflows, known as provenance models, have been of wide interest and studied in many fields, including the semantic Web. However, most existing provenance models rely on commonly perceived data-driven execution paradigm of scientific workflows and overlook control constructs that may appear in scientific workflows. The provenance models proposed by the semantic Web community for data-driven scientific workflows underspecify the structure of the workflows (i.e., workflow provenance). Such an underspecified workflow structure can result in the misinterpretation of a scientific experiment and precludes its conformance checking, thus restricting provenance gains.This paper shows that the design of a provenance model for control-flow constructs and the means to integrate it with the existing provenance models can maximise the provenance gains for the users of scientific workflows. In this regard, first, we specify the need for a control-flow driven scientific workflow provenance model and detail the minimal characteristics which such a model should address. Secondly, we present a formal model to specify the control-flows that may appear in scientific workflows and describe how the existing provenance models can be complemented by using the proposed model. Finally, we show that the proposed model can capture accurate provenance information for the scientific workflows, which makes it possible to understand, reproduce and validate workflows and their output.

论文关键词:Workflow provenance,Provenance model,Control-flow patterns

论文评审过程:Received 18 November 2019, Revised 11 January 2021, Accepted 2 February 2021, Available online 25 February 2021, Version of Record 10 March 2021.

论文官网地址:https://doi.org/10.1016/j.datak.2021.101877