Improving educational web search for question-like queries through subject classification

作者:

Highlights:

• We build an educational subject classifier for queries. There is no publicly available dataset for such queries in Turkish. Therefore we utilize educational questions posted in educational Q&A websites and manually label them with educational subjects. Since students also tend to submit natural language queries, questions can be safely regarded as queries.

• We utilize a diverse set of lexical, syntactic and semantic features, query results from a search engine and query expansion for classifying educational subject of queries at a resulting 83.58% accuracy.

• We propose point-wise and list-wise re-ranking mechanisms that optimize the ranking order based on predictions of the educational subject classifier. Two of our re-ranking methods achieve statistically significant results in Normalized Discounted Cumulative Gain (NDCG) metric compared to the original ranking obtained from a large-scale commercial web search engine.

• To the best of our knowledge, this is thefirst manuscript that works on Turkish question answering and search engines in the educational domain with this number of features and datasets. Additionally, we implemented some features for Turkish that are already available in English and there is no obvious obstacle to the implementation of our general solution for other languages. For some features that have good implementations in English but not in Turkish (e.g., headwords) we used replacement features (e.g., object and subject) and derived other features from them (e.g., object and subject phrases, semantic features).

摘要

•We build an educational subject classifier for queries. There is no publicly available dataset for such queries in Turkish. Therefore we utilize educational questions posted in educational Q&A websites and manually label them with educational subjects. Since students also tend to submit natural language queries, questions can be safely regarded as queries.•We utilize a diverse set of lexical, syntactic and semantic features, query results from a search engine and query expansion for classifying educational subject of queries at a resulting 83.58% accuracy.•We propose point-wise and list-wise re-ranking mechanisms that optimize the ranking order based on predictions of the educational subject classifier. Two of our re-ranking methods achieve statistically significant results in Normalized Discounted Cumulative Gain (NDCG) metric compared to the original ranking obtained from a large-scale commercial web search engine.•To the best of our knowledge, this is thefirst manuscript that works on Turkish question answering and search engines in the educational domain with this number of features and datasets. Additionally, we implemented some features for Turkish that are already available in English and there is no obvious obstacle to the implementation of our general solution for other languages. For some features that have good implementations in English but not in Turkish (e.g., headwords) we used replacement features (e.g., object and subject) and derived other features from them (e.g., object and subject phrases, semantic features).

论文关键词:Educational web search,Question classification,Search engine result page ranking,K-12

论文评审过程:Received 24 February 2018, Revised 7 October 2018, Accepted 16 October 2018, Available online 24 October 2018, Version of Record 24 October 2018.

论文官网地址:https://doi.org/10.1016/j.ipm.2018.10.013