Deep learning for detecting financial statement fraud

Highlights：

• Combining financial and text data enhances fraudulent financial statements detection.

• HAN, GPT-2, ANN and XGB detect financial misstatements based on textual cues.

• Novel NLP techniques allow to capture content and context of MD&As.

• Interpretability offered with “red-flag” sentences in the MD&As of annual reports.

• The proposed models provide decision support for stakeholders.

摘要

Financial statement fraud is an area of significant consternation for potential investors, auditing companies, and state regulators. The paper proposes an approach for detecting statement fraud through the combination of information from financial ratios and managerial comments within corporate annual reports. We employ a hierarchical attention network (HAN) to extract text features from the Management Discussion and Analysis (MD&A) section of annual reports. The model is designed to offer two distinct features. First, it reflects the structured hierarchy of documents, which previous approaches were unable to capture. Second, the model embodies two different attention mechanisms at the word and sentence level, which allows content to be differentiated in terms of its importance in the process of constructing the document representation. As a result of its architecture, the model captures both content and context of managerial comments, which serve as supplementary predictors to financial ratios in the detection of fraudulent reporting. Additionally, the model provides interpretable indicators denoted as “red-flag” sentences, which assist stakeholders in their process of determining whether further investigation of a specific annual report is required. Empirical results demonstrate that textual features of MD&A sections extracted by HAN yield promising classification results and substantially reinforce financial ratios.