Predicting crime using Twitter and kernel density estimation

作者:

Highlights:

• We model 25 crime types in a major United States city.

• We incorporate spatiotemporally tagged Twitter messages into a kernel density model.

• Twitter messages improve the prediction of many of the 25 crime types we studied.

• The runtime of some text processing modules must be improved to be practical.

摘要

Twitter is used extensively in the United States as well as globally, creating many opportunities to augment decision support systems with Twitter-driven predictive analytics. Twitter is an ideal data source for decision support: its users, who number in the millions, publicly discuss events, emotions, and innumerable other topics; its content is authored and distributed in real time at no charge; and individual messages (also known as tweets) are often tagged with precise spatial and temporal coordinates. This article presents research investigating the use of spatiotemporally tagged tweets for crime prediction. We use Twitter-specific linguistic analysis and statistical topic modeling to automatically identify discussion topics across a major city in the United States. We then incorporate these topics into a crime prediction model and show that, for 19 of the 25 crime types we studied, the addition of Twitter data improves crime prediction performance versus a standard approach based on kernel density estimation. We identify a number of performance bottlenecks that could impact the use of Twitter in an actual decision support system. We also point out important areas of future work for this research, including deeper semantic analysis of message content, temporal modeling, and incorporation of auxiliary data sources. This research has implications specifically for criminal justice decision makers in charge of resource allocation for crime prevention. More generally, this research has implications for decision makers concerned with geographic spaces occupied by Twitter-using individuals.

论文关键词:Crime prediction,Twitter,Topic modeling,Density estimation

论文评审过程:Received 11 September 2013, Revised 28 December 2013, Accepted 13 February 2014, Available online 22 February 2014.

论文官网地址:https://doi.org/10.1016/j.dss.2014.02.003