Systems Engineering and Electronics ›› 2020, Vol. 42 ›› Issue (10): 2399-2408.doi: 10.3969/j.issn.1001-506X.2020.10.31
Dan GONG1,2(
), Tiantian WANG1(
), Xiaohong SU1(
), Meihan DONG1(
)
Received:2020-01-29
Online:2020-10-01
Published:2020-09-19
CLC Number:
Dan GONG, Tiantian WANG, Xiaohong SU, Meihan DONG. Identification method of similar bugs based on historical software repository and abstract syntax tree[J]. Systems Engineering and Electronics, 2020, 42(10): 2399-2408.
Table 1
Information of the subject programs"
| 文件名 | 代码行数 | 语句数 | 函数数 |
| fannkuch | 105 | 65 | 2 |
| n-body | 141 | 68 | 4 |
| nsieve-bits | 36 | 26 | 1 |
| partialsums | 68 | 52 | 3 |
| puzzle | 84 | 58 | 7 |
| recursive | 55 | 30 | 6 |
| spectral-norm | 53 | 37 | 5 |
| Bubblesort | 171 | 92 | 5 |
| FloatMM | 160 | 84 | 6 |
| IntMM | 159 | 83 | 6 |
| Oscar | 323 | 169 | 10 |
| Perm | 169 | 90 | 7 |
| Puzzle | 225 | 174 | 8 |
| Queens | 188 | 103 | 6 |
| Quicksort | 174 | 103 | 6 |
| RealMM | 161 | 84 | 6 |
| Towers | 218 | 129 | 12 |
| Treesort | 187 | 118 | 8 |
Table 5
Manually implanted bug module library"
| 类型 | 源文件(错误植入行) | |
| 相似块 | 不相似块 | |
| 1 | Towers.c(137), Towers.c(150) | fannkuch.c(70), Queens.c(169) |
| 2 | Bubblesort.c(137), Quicksort.c(136), Treesort.c(138) | Treesort.c(153) |
| 3 | Bubblesort.c(169), Perm.c(166), Quicksort.c(171), Towers.c(216), Treesort.c(185) | Bubblesort.c(149) |
| 4 | FloatMM.c(140), IntMM.c(140), RealMM.c(142) | Bubblesort.c(132) |
| 5 | Bubblesort.c(136), Quicksort.c(135), Treesort.c(137) | Queens.c(145) |
| 6 | Bubblesort.c(135), Quicksort.c(134), Treesort.c(136) | n-body.c(66) |
| 7 | FloatMM.c(129), IntMM.c(129), RealMM.c(131) | FloatMM.c(152) |
| 8 | Bubblesort.c(116), Puzzle.c(116), Queens.c(116), Towers.c(116), Treesort.c(116) | FloatMM.c(120) |
Table 8
Identification results of manually implanted similar bugs"
| 类型 | 分组 | TP | FP | TN | FN | 准确率 | 召回率 |
| 1 | similar | 2 | 0 | 25 | 0 | 1.00 | 1.00 |
| all | 2 | 2 | 30 | 2 | 0.89 | 0.50 | |
| 2 | similar | 3 | 0 | 24 | 0 | 1.00 | 1.00 |
| all | 3 | 1 | 31 | 1 | 0.94 | 0.75 | |
| 3 | similar | 5 | 0 | 22 | 0 | 1.00 | 1.00 |
| all | 5 | 1 | 29 | 1 | 0.94 | 0.83 | |
| 4 | similar | 3 | 0 | 24 | 0 | 1.00 | 1.00 |
| all | 3 | 1 | 31 | 1 | 0.94 | 0.75 | |
| 5 | similar | 3 | 0 | 24 | 0 | 1.00 | 1.00 |
| all | 3 | 1 | 31 | 1 | 0.94 | 0.75 | |
| 6 | similar | 3 | 0 | 24 | 0 | 1.00 | 1.00 |
| all | 4 | 0 | 32 | 0 | 1.00 | 1.00 | |
| 7 | similar | 3 | 0 | 24 | 0 | 1.00 | 1.00 |
| all | 3 | 1 | 31 | 1 | 0.94 | 0.75 | |
| 8 | similar | 5 | 0 | 22 | 0 | 1.00 | 1.00 |
| all | 6 | 0 | 30 | 0 | 1.00 | 1.00 | |
| 合计 | similar | 27 | 0 | 189 | 0 | 1.00 | 1.00 |
| all | 29 | 7 | 245 | 7 | 0.95 | 0.81 |
Table 9
Bug identification results of real bug module library"
| 项目 | P | N | TP | FP | Recall | FPP |
| gmp | 14 | 7 | 13 | 3 | 0.93 | 0.43 |
| gzip | 75 | 36 | 72 | 21 | 0.96 | 0.58 |
| libtiff | 1 675 | 822 | 1 566 | 411 | 0.93 | 0.50 |
| lighttpd | 187 | 91 | 170 | 47 | 0.91 | 0.52 |
| php | 737 | 363 | 730 | 139 | 0.99 | 0.38 |
| python | 284 | 139 | 261 | 100 | 0.92 | 0.72 |
| valgrind | 187 | 87 | 156 | 67 | 0.83 | 0.77 |
| wireshark | 403 | 198 | 386 | 112 | 0.96 | 0.57 |
| total | 3 562 | 1 743 | 3 354 | 900 | 0.94 | 0.52 |
| 1 | LI J Y, ERNST M D. CBCD: cloned buggy code detector[C]//Proc.of the International Conference on Software Engineering, 2012: 310-320. |
| 2 | LI Z M, LU S, MYAGMAR S, et al. CP-miner: a tool for finding copy-paste and related bugs in operating system code[C]//Proc.of the 6th Conference on Symposium on Operating Systems Design & Implementation, 2004. |
| 3 | JUERGENS E, DEISSENBOECK F, HUMMEI B, et al. Do code clones matter?[C]//Proc.of the 31st IEEE International Conference on Software Engineering, 2009: 485-495. |
| 4 | 苏小红, 张凡龙. 面向管理的克隆代码研究综述[J]. 计算机学报, 2018, 41 (3): 628- 651. |
| SU X H , ZHANG F L . A survey for management-oriented code clone research[J]. Chinese Journal of Computers, 2018, 41 (3): 628- 651. | |
| 5 | KRINKE J, GOLD N, JIA Y, et al. Cloning and copying between GNOME projects[C]//Proc.of the 7th IEEE Working Conference on Mining Software Repositories, 2010: 98-101. |
| 6 | BAUER V, HAUPTMANN B. Assessing cross-project clones for reuse optimization[C]//Proc.of the 7th International Workshop on Software Clones, 2013: 60-61. |
| 7 | Simian-similarity analyser.[EB/OL].[2019-12-01].http://www.harukizaemon.com/simian/index.html. |
| 8 | ROY C K, CORDY J R. Nicad: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]//Proc.of the 16th IEEE International Conference on Program Comprehension, 2008: 172-181. |
| 9 | KAMIYA T , KUSUMOTO S , INOUE K . CCFinder: a multilinguistic token-based code clone detection system for large scale source code[J].IEEE Trans.on Software Engineering, |
| 10 | GODE N, KOSCHKE R. Incremental clone detection[C]//Proc.of the 13th European Conference on Software Maintenance and Reengineering, 2009: 219-228. |
| 11 | JIANG L X. DECKARD: scalable and accurate tree-based detection of code clones[C]//Proc.of the 29th International Conference on Software Engineering, 2007: 96-105. |
| 12 | BULYCHEV P, MINEA M. Duplicate code detection using anti-unification[EB/OL].[2018-10-8].http://cyberleninka.ru/article/n/duplicate-code-detection-using-anti-unification. |
| 13 | KRINKE J. Identifying similar code with program dependence graphs[C]//Proc.of the Conference on Reverse Engineering, 2001: 301-309. |
| 14 | GABEL M, JIANG L X, SU Z D. Scalable detection of semantic clones[C]//Proc.of the 30th ACM/IEEE International Conference on Software Engineering, 2008: 321-330. |
| 15 |
KONTOGIANNIS K A , DEMORI R , MERLO E , et al. Pattern matching for clone and concept detection[J]. Automated Software Engineering, 1996, 3, 77- 108.
doi: 10.1007/BF00126960 |
| 16 | MAYRAND J, LEBLANC C, MERLO E M. Experiment on the automatic detection of function clones in a software system using metrics[C]//Proc.of the International Conference on Software Maintenance, 1996: 244-253. |
| 17 | LI Z, ZOU D Q, XU S H, et al. VulPecker: an automated vulnerability detection system based on code similarity analysis[C]//Proc.of the ACM International Conference Proceeding Series on Computer Security Applications, 2016: 201-213. |
| 18 | ZHANG T, YANG G, LEE B, et al. Predicting severity of bug report by mining bug repository with concept profile[C]//Proc.of the 30th Annual ACM Symposium on Applied Computing, 2015: 1553-1558. |
| 19 | BHATTACHARYA P, NEAMTIU I. Bug-fix time prediction models: can we do better?[C]//Proc.of the 8th International Working Conference on Mining Software Repositories, 2011: 207-210. |
| 20 | ROCHA H, VALENTE M T, MARQUES-NETO H, et al. An empirical study on recommendations of similar bugs[C]//Proc.of the 23rd International Conference on Software Analysis, Evolution and Reengineering, 2016. |
| 21 | LAZAR A, RITCHEY S, SHARIF B. Improving the accuracy of duplicate bug report detection using textual similarity mea-sures[C]//Proc.of the International Conference on Software Engineering, 2014: 308-311. |
| 22 | KEVIC K, MULLER S C, FRITZ T, et al. Collaborative bug triaging using textual similarities and change set analysis[C]//Proc.of the 6th International Workshop on Cooperative and Human Aspects of Software Engineering, 2013: 17-24. |
| 23 | Clang: a C language family frontend for LLVM[EB/OL].[2020-8-25].http://clang.llvm.org/. |
| 24 | Git[EB/OL].[2020-8-25].https://git-scm.com/. |
| 25 | Test-suite guide[EB/OL].[2020-8-25].http://www.llvm.org/docs/TestSuiteGuide.html. |
| 26 | LLVM download page[EB/OL].[2020-8-25].http://releases.llvm.org/download.html. |
| 27 | GOUES L C, HOLTSCHULTE N, SMITH K E, et al. Manybugs and introClass benchmarks for automated repair of C programs[EB/OL].[2020-8-25]. https://repairbenchmarks.cs.umass.edu/. |
| 28 |
GOUES L C , HOLTSCHULTE N , SMITHK E , et al. The manybugs and introclass benchmarks for automated repair of C programs[J]. IEEE Trans.on Software Engineering, 2015, 41 (12): 1236- 1256.
doi: 10.1109/TSE.2015.2454513 |
| 29 | LE G C, DEWEY-VOGT M, FORREST S, et al. A systema-tic study of automated program repair: fixing 55 out of 105 bugs for MYM8 each[C]//Proc.of the 34th International Conference on Software Engineering, 2012: 3-13. |
| 30 | WEIMER W, FRY Z P, FORREST S. Leveraging program equi-valence for adaptive program repair: models and first results[C]//Proc.of the 28th IEEE/ACM International Conference on Automated Software Engineering, 2013: 356-366. |
| 31 | LONG F, RINARD M. Staged program repair with condition synthesis[C]//Proc.of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2015: 166-178. |
| 32 | MECHTAEV S, JOOYONG Y, ROYCHOUDHURY A. Angelix: scalable multiline program patch synthesis via symbolic analysis[C]//Proc.of the 38th IEEE/ACM International Conference on Software Engineering, 2016: 691-701. |
| 33 | PAN K , KIM S , WHITEHEAD E J . Toward an understanding of bug fix patterns[J]. Empirical Software Engineering, 2009, 14 (3): 286- 315. |
| 34 | CAMPOS E C, MAIA M D A. Common bug-fix patterns: a large-scale observational study[C]//Proc.of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2017: 404-413. |
| [1] | WU Cai-hua, ZHU Xiao-dong, LIU Jun-tao, WANG Yi-gang. New software reliability growth model [J]. Journal of Systems Engineering and Electronics, 2009, 31(8): 2024-2028. |
| [2] | LI Hai-feng, LU Min-yan, WANG Xue-cheng. Modified Jelinski-Moranda model with right-censored data [J]. Journal of Systems Engineering and Electronics, 2009, 31(6): 1496-1499. |
| [3] | SUN Meng-lin, GAN Zhi-qiang. Quality measurement evaluation implement in aerospace software [J]. Journal of Systems Engineering and Electronics, 2009, 31(4): 956-959. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||