Surfacing code in the dark: an instant clone search approach
作者:Jin-woo Park, Mu-Woong Lee, Jong-Won Roh, Seung-won Hwang, Sunghun Kim
摘要
In this paper, we study how to “surface” code for instant reference. A traditional mode of surfacing code has been treating code as text and applying keyword search techniques. However, many prior work observes the limitation of such approach: (1) semantic description of code is limited to comments and (2) syntactic keyword is often not selective enough. In contrast, we discuss enabling techniques and scenarios of instant semantic-based surfacing. For example, developers, during a development session, may reference the existing code sharing similar semantics, using his code so far as a query. In addition to such semantic-based surfacing, we also enhance keyword-based surfacing with semantics, by instantly adding semantic tags for code submitted to the repository. To achieve this goal, we first propose scalable indexing structures on vector abstractions of code. Our experimental results show our techniques outperform a state-of-the-art tool in efficiency without compromising accuracy. We then deploy our technique for instant search and tagging scenarios: For instant code search scenario, we demonstrate an instant clone search tool using our techniques, supporting sub-second search over 54 million LOC. For instant code tagging scenario, we propose an automatic instant code tagging algorithm to mine the meaningful tags from clones.
论文关键词:Code indexing, Instant code search, Instant code tagging, Software development
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10115-013-0677-z