Systems Engineering and Electronics ›› 2020, Vol. 42 ›› Issue (10): 2399-2408.doi: 10.3969/j.issn.1001-506X.2020.10.31

Previous Articles    

Identification method of similar bugs based on historical software repository and abstract syntax tree

Dan GONG1,2(), Tiantian WANG1(), Xiaohong SU1(), Meihan DONG1()   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    2. Department of Computer Science and Technology, Harbin Huade University, Harbin 150001, China
  • Received:2020-01-29 Online:2020-10-01 Published:2020-09-19

Abstract:

In the process of software development, software developers often search the historical software repository (HSR), and then copy/paste the code required to realize software reuse. Bugs and the fixing information of the reused codes are stored in the HSR, which can assist in dealing with the similar bugs. Therefore, a similar bug identification method based on HSR mining is proposed. Firstly, based on the analysis of the change log, the modules with known bugs are extracted from the HSR, then the bug module library is established. Then, the similarity code detection method based on abstract syntax tree (AST) is used to identify the similar code both in the software to be tested and the bug module library. With the help of the corresponding bug and the fix information stored in the HSR, the module that may contain potential bugs in the software to be tested is identified. At the same time, in order to improve the recognition accuracy of the similar codes, the code feature measurement based on AST is optimized. The experimental results on 18 C programs and 164 clone codes show that the proposed method can identify all the similar codes and its performance is better than the existing tools. The effect of code similarity on similar bug identification is verified on the manually built bug module library. Finally, an empirical study on 8 large real-world C projects is proceeded. The average bug recall rate is 94%, which, shows that mining HSR can effectively support bug understanding on circumstance of the similar codes spreading across projects.

Key words: software reuse, historical software repository (HSR), clone code, similar bug, abstract syntax tree (AST)

CLC Number: 

[an error occurred while processing this directive]