Search-by-example over SQL repositories using structural and intent-driven similarity

作者:

Highlights:

摘要

Searching the query log of a database system has a variety of applications. In a complex database, relevant queries in the log can serve as an initial example for query formulation, or may elucidate how to query the data in an optimized manner. Searching for queries that may cause a security or a privacy breach could be used to detect leaks of sensitive data. In general, queries in the query log can provide valuable information about how data have been accessed and used. Finding relevant queries requires conducting search over a repository of SQL queries. However, expressing the information need, to specify which queries should be retrieved, is not easy. In this paper we study the approach of search-by-example, where, given an SQL query Q, the goal is to retrieve queries that are similar to Q. We distinguish between two types of search—structural search and intent-driven search. In structural search, queries are considered similar if their textual formulations are similar, i.e., a small number of edit operations transform one query into the other. In intent-driven search, two queries are deemed similar if they were written for the same task. We illustrate these two types of similarity and the differences between them. We present four heuristics for testing query similarity. Two of the methods are exhaustive and two are less accurate and efficient. We explain how to utilize the efficient methods to boost a search using the exhaustive methods. An experimental evaluation and a user study illustrate the effectiveness of the methods.

论文关键词:SQL,Structural search,Intention-driven search,Semantic search,Syntactic search,Query similarity,Tree edit distance,SQL vector model

论文评审过程:Received 26 December 2018, Revised 4 December 2019, Accepted 12 March 2020, Available online 16 March 2020, Version of Record 5 August 2020.

论文官网地址:https://doi.org/10.1016/j.datak.2020.101811