系统工程与电子技术 ›› 2020, Vol. 42 ›› Issue (10): 2399-2408.doi: 10.3969/j.issn.1001-506X.2020.10.31

• 可靠性 • 上一篇    

基于软件历史仓库和抽象语法树的相似缺陷识别方法

龚丹1,2(), 王甜甜1(), 苏小红1(), 董美含1()   

  1. 1. 哈尔滨工业大学计算机科学与技术学院, 黑龙江 哈尔滨 150001
    2. 哈尔滨华德学院计算机科学与技术系, 黑龙江 哈尔滨 150001
  • 收稿日期:2020-01-29 出版日期:2020-10-01 发布日期:2020-09-19
  • 作者简介:龚丹(1979-),女,副教授,博士研究生,主要研究方向为软件工程可靠性。E-mail:gongdan1979@hotmail.com|王甜甜(1980-),女,副教授,硕士研究生导师,博士,主要研究方向为程序分析、软件自动调试。E-mail:sweetwtt@126.com|苏小红(1966-),女,教授,博士研究生导师,博士,主要研究方向为程序分析、软件错误定位。E-mail:sxh@hit.edu.cn|董美含(1994-),女,硕士,主要研究方向为软件自动调试。E-mail:dongmeihanhaomi@163.com
  • 基金资助:
    国家自然科学基金(61672191);“十三五”国家重点研发计划(2017YFC0702204)

Identification method of similar bugs based on historical software repository and abstract syntax tree

Dan GONG1,2(), Tiantian WANG1(), Xiaohong SU1(), Meihan DONG1()   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    2. Department of Computer Science and Technology, Harbin Huade University, Harbin 150001, China
  • Received:2020-01-29 Online:2020-10-01 Published:2020-09-19

摘要:

软件开发过程中,软件开发人员常常通过搜索软件历史仓库(historical software repository, HSR),再经复制/粘贴以实现软件复用。HSR中会保存被复用的代码的缺陷及修复信息,辅助处理相似缺陷。基于此,提出一种基于HSR挖掘的相似缺陷识别方法。首先,基于变更日志的分析,从HSR中提取出已知缺陷的模块,建立bug模块库。然后,采用基于抽象语法树(abstract syntax tree, AST)的相似代码检测方法,识别待测试软件与bug模块库中相似的代码,并借助HSR中保存的相应缺陷及修复信息,完成待测试软件中可能包含潜在缺陷的模块的识别。同时,为提高相似代码的识别精度,优化基于AST的代码特征度量。在18个C程序、164对克隆代码上进行实验,结果表明所提方法能够识别出全部相似代码且性能优于已有工具。在人工构建的bug模块库上验证了代码相似性对相似缺陷识别的作用。最后,在8个真实的大型C项目上进行验证,平均缺陷召回率达到94%,表明挖掘HSR可以有效地为跨项目传播的相似代码提供缺陷理解支持。

关键词: 软件复用, 软件历史仓库, 克隆代码, 相似缺陷, 抽象语法树

Abstract:

In the process of software development, software developers often search the historical software repository (HSR), and then copy/paste the code required to realize software reuse. Bugs and the fixing information of the reused codes are stored in the HSR, which can assist in dealing with the similar bugs. Therefore, a similar bug identification method based on HSR mining is proposed. Firstly, based on the analysis of the change log, the modules with known bugs are extracted from the HSR, then the bug module library is established. Then, the similarity code detection method based on abstract syntax tree (AST) is used to identify the similar code both in the software to be tested and the bug module library. With the help of the corresponding bug and the fix information stored in the HSR, the module that may contain potential bugs in the software to be tested is identified. At the same time, in order to improve the recognition accuracy of the similar codes, the code feature measurement based on AST is optimized. The experimental results on 18 C programs and 164 clone codes show that the proposed method can identify all the similar codes and its performance is better than the existing tools. The effect of code similarity on similar bug identification is verified on the manually built bug module library. Finally, an empirical study on 8 large real-world C projects is proceeded. The average bug recall rate is 94%, which, shows that mining HSR can effectively support bug understanding on circumstance of the similar codes spreading across projects.

Key words: software reuse, historical software repository (HSR), clone code, similar bug, abstract syntax tree (AST)

中图分类号: