Measuring data quality in information systems research

作者:

Highlights:

• Quality of research data in IS can be measured using a formal, rule-based approach.

• Rule-based procedures yield measurement of data quality on an ordinal scale.

• A hierarchical predicate structure can be build based on domain knowledge.

• Usage of uncertainty theories allows for handling unknown information about data.

• IS data quality thresholds can be defined based on results of a formal measurement.

摘要

Although contemporary research relies to a large extent on data, data quality in Information Systems research is a subject that has not received much attention until now. In this paper, a framework is presented for the measurement of scientific data quality using the principles of rule-based measurement. The proposed framework is capable of handling data quality problems due to both incorrect execution and incorrect description of data collection and validation processes. It is then argued that uncertainty can arise during the measurement, which complicates data quality assessment. The framework is therefore extended to handle uncertainty about the truth value of predicates. Instead of a numerical quality level, data quality is then expressed as either a probability distribution or a possibility distribution over the ordinal quality scale. Finally, it is also shown how quality thresholds can be formulated based on the results of the quality measurement. The usefulness of the proposed framework is illustrated throughout the paper with an example of the construction of a possible survey data quality measurement system and, subsequently, the application of that system on a realistic example.

论文关键词:Data quality,Rule-based measurement,Information systems,Uncertainty modelling

论文评审过程:Received 15 February 2019, Revised 17 May 2019, Accepted 16 August 2019, Available online 19 August 2019, Version of Record 4 September 2019.

论文官网地址:https://doi.org/10.1016/j.dss.2019.113138