Automated categorization of German-language patent documents

作者：

Highlights：

•

摘要

The categorization of patent documents is a difficult task that we study how to automate most accurately. We report the results of applying a variety of machine learning algorithms for training expert systems in German-language patent classification tasks. The taxonomy employed is the International Patent Classification, a complex hierarchical classification scheme in which we make use of 115 classes and 367 subclasses. The system is designed to handle natural language input in the form of the full text of a patent application. The effect on the categorization precision of indexing either the patent claims or the patent descriptions is reported. We describe several ways of measuring the categorization success that account for the attribution of multiple classification codes to each patent document. We show how the hierarchical information inherent in the taxonomy can be used to improve automated categorization precision. Our results are compared to an earlier study of automated English-language patent categorization.

论文关键词：Patent,Intellectual Property,Categorization,Hierarchical taxonomy,International Patent Classification

论文评审过程：Available online 9 August 2003.

论文官网地址：https://doi.org/10.1016/S0957-4174(03)00141-6