Information extraction from calls for papers with conditional random fields and layout features

作者:Karl-Michael Schneider

摘要

For members of the research community it is vital to stay informed about conferences, workshops, and other research meetings relevant to their field. These events are typically announced in calls for papers (CFPs) that are distributed via mailing lists. We employ Conditional Random Fields for the task of extracting key information such as conference names, titles, dates, locations and submission deadlines from CFPs. Extracting this information from CFPs automatically has applications in building automated conference calendars and search engines for CFPs. We combine a variety of features, including generic token classes, domain-specific dictionaries and layout features. Layout features prove particularly useful in the absence of grammatical structure, improving average F1 by 30% in our experiments.

论文关键词:Information extraction, Layout features, Conditional random fields

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-007-9019-4