Adaptive page segmentation for color technical journals' cover images1

作者:

Highlights:

摘要

Page segmentation to locate text blocks is a prior and primary step in document processing, in particular for understanding a journal's cover page. However, texts, graphics and images are usually isolated in most documents, unlike cover pages in which texts may be overlaid onto graphics or images. In this paper a new adaptive page segmentation method is proposed to extract text blocks from various types of color technical journals' cover images. Although color involves useful information to overcome the overlapping problem, color processing requires tremendous computation loads. Thus, a complexity analysis is included to adaptively adjust processing steps in our approach. In other words, simple cover images, with few colors and no text–graphics/image overlapping, can be treated as monochrome images to speed up processing time, while for complex cover images, with many colors and text–graphics/image overlapping, correct segmentation results can still be obtained but more processing time is required. To accomplish the design concept mentioned above, our method includes several components. First, in order to degrade the processing complexity on true color images, a new simple quantization method is employed to reduce the color numbers from 24-bit true colors to 42 colors or less. In the block segmentation stage, smearing, labeling and complexity analysis techniques are used together with edge and color information to find out coherent blocks adaptively. After that, in the block classification stage, some conventional and some new features are computed from each block to decide whether it is a text block or not. Finally, in the post-processing stage, some spatial relations are adopted to rectify the classification results. Experimental results prove the feasibility and practicality of the proposed approach.

论文关键词:Page segmentation,Text extraction,Color quantization,Block classification,Document processing,Complex background

论文评审过程:Received 20 September 1996, Revised 18 September 1997, Accepted 8 December 1997, Available online 5 January 1999.

论文官网地址:https://doi.org/10.1016/S0262-8856(98)00062-6