Using TREC for developing semantic information retrieval benchmark for Urdu
作者:
Highlights:
• Large corpus of over 2,887,169 Urdu documents in the TREC defined SGML format.
• A collection of 35 Urdu queries from 14 domains for assessment.
• Human benchmark of candidate relevant documents using a pooling-based.
• Non-binary relevance judgement at four-levels.
摘要
•Large corpus of over 2,887,169 Urdu documents in the TREC defined SGML format.•A collection of 35 Urdu queries from 14 domains for assessment.•Human benchmark of candidate relevant documents using a pooling-based.•Non-binary relevance judgement at four-levels.
论文关键词:Information Retrieval,Benchmark dataset,Urdu news documents,Non-binary ranking,Urdu language processing,Information retrieval queries,Text REtrieval Conference (TREC)
论文评审过程:Received 20 December 2021, Revised 27 March 2022, Accepted 3 April 2022, Available online 30 April 2022, Version of Record 30 April 2022.
论文官网地址:https://doi.org/10.1016/j.ipm.2022.102939