Integrating natural language understanding with document structure analysis

作者:Suzanne Liebowitz Taylor, Deborah A. Dahl, Mark Lipshutz, Carl Weir, Lewis M. Norton, Roslyn Weidner Nilson, Marcia C. Linebarger

摘要

Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. We have developed a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. This paper discusses those areas of research during development of IDUS where we have found the most benefit from the integration of natural language processing and image processing: document structure analysis, optical character recognition (OCR) correction, and text analysis. We also discuss two applications which are supported by IDUS: text retrieval and automatic generation of hypertext links

论文关键词:document analysis, natural language processing, image processing, vision, optical character recognition

论文评审过程:

论文官网地址:https://doi.org/10.1007/BF00849077