A three-phase method for patent classification

作者:

Highlights:

摘要

An automatic patent categorization system would be invaluable to individual inventors and patent attorneys, saving them time and effort by quickly identifying conflicts with existing patents. In recent years, it has become more and more common to classify all patent documents using the International Patent Classification (IPC), a complex hierarchical classification system comprised of eight sections, 128 classes, 648 subclasses, about 7200 main groups, and approximately 72,000 subgroups. So far, however, no patent categorization method has been developed that can classify patents down to the subgroup level (the bottom level of the IPC). Therefore, this paper presents a novel categorization method, the three phase categorization (TPC) algorithm, which classifies patents down to the subgroup level with reasonable accuracy. The experimental results for the TPC algorithm, using the WIPO-alpha collection, indicate that our classification method can achieve 36.07% accuracy at the subgroup level. This is approximately a 25,764-fold improvement over a random guess.

论文关键词:Patent classification,Vector space model (VSM),IPC taxonomy,Support vector machines (SVM),K-means,K nearest neighbors (KNN)

论文评审过程:Received 13 January 2011, Revised 21 November 2011, Accepted 29 November 2011, Available online 9 January 2012.

论文官网地址:https://doi.org/10.1016/j.ipm.2011.11.001