Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud

Highlights：

• A novel deep learning methodology is proposed for automobile insurance fraud detection.

• LDA-based text analytics is developed to extract the features in the text description of the accidents.

• Text features and traditional numeric features are explored for the detection of fraudulent claims.

• Our research contributes to advance the computational method for analyzing and detecting insurance fraud.

摘要

Automobile insurance fraud represents a pivotal percentage of property insurance companies' costs and affects the companies' pricing strategies and social economic benefits in the long term. Automobile insurance fraud detection has become critically important for reducing the costs of insurance companies. Previous studies on automobile insurance fraud detection examined various numeric factors, such as the time of the claim and the brand of the insured car. However, the textual information in the claims has rarely been studied to analyze insurance fraud. This paper proposes a novel deep learning model for automobile insurance fraud detection that uses Latent Dirichlet Allocation (LDA)-based text analytics. In our proposed method, LDA is first used to extract the text features hiding in the text descriptions of the accidents appearing in the claims, and deep neural networks then are trained on the data, which include the text features and traditional numeric features for detecting fraudulent claims. Based on the real-world insurance fraud dataset, our experimental results reveal that the proposed text analytics-based framework outperforms a traditional one. Furthermore, the experimental results show that the deep neural networks outperform widely used machine learning models, such as random forests and support vector machine. Therefore, our proposed framework that combines deep neural networks and LDA is a suitable potential tool for automobile insurance fraud detection.