Toward the automatic identification of sublanguage vocabulary

作者：

Highlights：

•

摘要

A sublanguage is the language used in a restricted or specialized domain or field, such as computer science. Information about the vocabulary and structure of a sublanguage is used in any domain-related natural language processing application; however, such information is very time-consuming to gather, and much of it must be found and organized manually. Additionally, information retrieval strategies using lexical information depend on finding the appropriate dictionary entry for general and technical words. The ability to automatically identify terms belonging to a sublanguage could aid in these and other applications. In this paper, a simple but effective method is developed for automatic identification of sublanguage vocabulary words as they occur in abstracts. This procedure may significantly reduce the effort required to extract sublanguage vocabulary for sublanguage analysis and other applications, such as information retrieval. First, the sublanguage vocabulary identification procedures are described using abstracts from computer science and library and information science as the sublanguage sources. The results of the experiments are evaluated using three different criteria. Finally, the practical and theoretical significance of this research is discussed along with plans for further experiments on the vocabulary and structure of sublanguages.

论文关键词：

论文评审过程：Received 12 June 1992, Accepted 14 September 1992, Available online 12 July 2002.

论文官网地址：https://doi.org/10.1016/0306-4573(93)90101-I