Automatic caption localization for photographs on World Wide Web pages

作者：

Highlights：

•

摘要

A variety of software tools index text of the World Wide Web, but little attention has been paid to the many photographs. We explore the indirect method of locating for indexing the likely explicit and implicit captions of photographs. We use multimodal clues including the specific words used, the syntax, the surrounding layout of the Web page, and the general appearance of the associated image. Our MARIE-3 system thus avoids full image processing and full natural-language processing, but shows a surprising degree of success. Experiments with a semi-random set of Web pages showed 41% recall with 41% precision for the task of distinguishing captions from other text, and 70% recall with 30% precision. This is much better than chance since actual captions were only 1.4% of the text on pages with photographs.

论文关键词：

论文评审过程：Received 18 November 1996, Accepted 10 July 1997, Available online 11 June 1998.

论文官网地址：https://doi.org/10.1016/S0306-4573(97)00048-4