基于图表示和MHGAT的代码漏洞静态检测方法

doi:10.12305/j.issn.1001-506X.2023.05.31

摘要/Abstract

摘要：

针对现有的静态分析技术难以及时、准确地检测软件安全漏洞的问题, 提出了一种基于图表示和多头图注意力网络(multi-head graph attention network, MHGAT)的代码漏洞静态检测方法。首先, 通过程序切片从源代码的系统依赖图中提取漏洞代码片段, 根据系统依赖图构建不同语句间连接关系的邻接矩阵, 并采用嵌入算法获取代码片段的特征矩阵; 然后, 将多个代码片段的邻接矩阵和特征矩阵以不相交图的形式进行拼接; 最后, 使用多个卷积-池化基本块获取代码图数据在不同层次上的特征, 并利用跳跃知识网络集成各个基本块的输出。实验结果表明, 相比其他漏洞检测方法, 所提方法通过数据表征形式和算法上的改进, 有效提高了漏洞检测的效率和效果。

关键词: 漏洞检测, 程序切片, 图表征学习, 图注意力网络, 多头自注意力

Abstract:

Aiming at the problem that the existing static analysis technology is difficult to detect software security vulnerabilities timely and accurately, a code vulnerability static detection method based on graph representation and multi-head graph attention network (MHGAT) is proposed. Firstly, vulnerability code snippets are extracted from the system dependency graph of source code by program slicing, adjacency matrix of connection relation between different statements is constructed according to the system dependency graph, and feature matrix of code snippet is obtained by embedding algorithm. Then, the adjacency matrix and feature matrix of multiple code snippets are spliced in the form of disjoint graph. Finally, multiple convolution-pooling basic blocks are used to obtain the characteristics of code graph data at different levels, and the output of each basic block is integrated by jumping knowledge network. Experimental results show that compared with other vulnerability detection methods, the proposed method can effectively improve the efficiency and effectiveness of vulnerability detection through the improvement of data representation form and algorithm.

Key words: vulnerability detection, program slicing, graph representation learning, graph attention network, multi-head self-attention

中图分类号:

TP393.08

程靖云, 王布宏, 罗鹏. 基于图表示和MHGAT的代码漏洞静态检测方法[J]. 系统工程与电子技术, 2023, 45(5): 1535-1543.

Jingyun CHENG, Buhong WANG, Peng LUO. Code vulnerability static detection method based on graphrepresentation and MHGAT[J]. Systems Engineering and Electronics, 2023, 45(5): 1535-1543.

图/表 19

图1

图2

图3

图4

图5

图6

表1

表2

表3

图7

图8

图9

图10

图11

表4

图12

表5

图13

图14

参考文献 34

1	HANIF H , NASIR M H N , FAIZAL M , et al. The rise of software vulnerability: taxonomy of software vulnerabilities detection and machine learning approaches[J]. Journal of Network and Computer Applications, 2021, 179, 103009. doi: 10.1016/j.jnca.2021.103009
2	JI T T, WU Y, WANG C, et al. The coming era of alphahacking? A survey of automatic software vulnerability detection, exploitation and patching techniques[C]//Proc. of the IEEE 3rd International Conference on Data Science in Cyberspace, 2018: 53-60.
3	吴世忠, 郭涛, 董国伟, 等. 软件漏洞分析技术进展[J]. 清华大学学报(自然科学版), 2012, 52 (10): 1309- 1319.
	WU S Z , GUO T , DONG G W , et al. Software vulnerability analyses: a road map[J]. Journal of Tsinghua University (Science and Technology), 2012, 52 (10): 1309- 1319.
4	WU J J. Literature review on vulnerability detection using NLP technology[EB/OL]. [2022-01-10]. https://arxiv.53yu.com/pdf/2104.11230.pdf.
5	ZOU D Q , ZHU Y W , XU S H , et al. Interpreting deep learning-based vulnerability detector predictions based on heuristic searching[J]. ACM Trans.on Software Engineering and Methodology, 2021, 30 (2): 1- 31.
6	李韵, 黄辰林, 王中锋, 等. 基于机器学习的软件漏洞挖掘方法综述[J]. 软件学报, 2020, 31 (7): 2040- 2061. doi: 10.13328/j.cnki.jos.006055
	LI Y , HUANG C L , WANG Z F , et al. Survey of software vulnerability mining methods based on machine learning[J]. Journal of Software, 2020, 31 (7): 2040- 2061. doi: 10.13328/j.cnki.jos.006055
7	顾绵雪, 孙鸿宇, 韩丹, 等. 基于深度学习的软件安全漏洞挖掘[J]. 计算机研究与发展, 2021, 58 (10): 2140- 2162. doi: 10.7544/issn1000-1239.2021.20210620
	GU M X , SUN H Y , HAN D , et al. Software security vulnerability mining based on deep learning[J]. Journal of Computer Research and Development, 2021, 58 (10): 2140- 2162. doi: 10.7544/issn1000-1239.2021.20210620
8	ZAGANE M , ABDI M K , ALENEZI M . Deep learning for software vulnerabilities detection using code metrics[J]. IEEE Access, 2020, 8, 74562- 74570. doi: 10.1109/ACCESS.2020.2988557
9	常超, 刘克胜, 赵军, 等. 基于复用代码检测的缺陷发现方法[J]. 系统工程与电子技术, 2017, 39 (9): 2157- 2164.
	CHANG C , LIU K S , ZHAO J , et al. Clone flaw detection method based on clone code detection[J]. Systems Engineering and Electronics, 2017, 39 (9): 2157- 2164.
10	KIM S, WOO S, LEE H, et al. Vuddy: a scalable approach for vulnerable code clone discovery[C]//Proc. of the IEEE Symposium on Security and Privacy, 2017: 595-614.
11	CAO D F, HUANG J, ZHANG X Y, et al. FTCLNet: convolutional LSTM with Fourier transform for vulnerability detection[C]//Proc. of the IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, 2020: 539-546.
12	LIU S G , LIN G J , QU L E , et al. CD-VulD: cross-domain vulnerability discovery based on deep domain adaptation[J]. IEEE Trans.on Dependable and Secure Computing, 2022, 19 (1): 438- 451. doi: 10.1109/TDSC.2020.2984505
13	段旭, 吴敬征, 罗天悦, 等. 基于代码属性图及注意力双向LSTM的漏洞挖掘方法[J]. 软件学报, 2020, 31 (11): 3404- 3420.
	DUAN X , WU J Z , LUO T Y , et al. Vulnerability mining method based on code property graph and attention BiLSTM[J]. Journal of Software, 2020, 31 (11): 3404- 3420.
14	王晓萌, 管志斌, 辛伟, 等. 基于深度卷积神经网络的源代码缺陷检测方法[J]. 清华大学学报(自然科学版), 2021, 61 (11): 1267- 1272.
	WANG X M , GUAN Z B , XIN W , et al. Source code defect detection using deep convolutional neural networks[J]. Journal of Tsinghua University (Science and Technology), 2021, 61 (11): 1267- 1272.
15	ZHOU Y Q, LIU S Q, SIOW J, et al. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Proc. of the Annual Conference on Neural Information Processing Systems, 2019: 10197-10207.
16	CHENG X , WANG H Y , HUA J Y , et al. DeepWukong: statically detecting software vulnerabilities using deep graph neural network[J]. ACM Trans.on Software Engineering and Methodology, 2021, 30 (3): 1- 33.
17	LI Z , ZOU D Q , XU S H , et al. SySeVR: a framework for using deep learning to detect software vulnerabilities[J]. IEEE Trans.on Dependable and Secure Computing, 2022, 19 (4): 2244- 2258.
18	ZHENG W N, JIANG Y, SU X H. VulSPG: vulnerability detection based on slice property graph representation learning[EB/OL]. [2022-01-14]. https://arxiv.53yu.com/pdf/2109.02527.pdf.
19	LI Z , ZOU D Q , XU S H , et al. Vuldeepecker: a deep learning-based system for vulnerability detection[J]. IEEE Trans.on Dependable and Secure Computing, 2021, 18 (5): 2224- 2236.
20	LI X , WANG L , XIN Y , et al. Automated vulnerability detection in source code using minimum intermediate representation learning[J]. Applied Sciences, 2020, 10 (5): 1692.
21	LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proc. of the International Conference on Machine Learning, 2014: 1188-1196.
22	GRATTAROLA D , ALIPPI C . Graph neural networks in tensor flow and Keras with spektral[J]. IEEE Computational Intelligence Magazine, 2021, 16 (1): 99- 106.
23	XU K Y L, LI C T, TIAN Y L, et al. Representation learning on graphs with jumping knowledge networks[C]//Proc. of the International Conference on Machine Learning, 2018: 5453-5462.
24	VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. [2022-01-14]. https://arxiv.org/pdf/1710.10903.pdf.
25	LEE J, LEE I, KANG J. Self-attention graph pooling[C]//Proc. of the International Conference on Machine Learning, 2019: 3734-3743.
26	National Institute of Standards and Technology. Software assurance reference dataset[EB/OL]. [2022-01-20]. https://samate.nist.gov/SRD/view.php?tsID=108.
27	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2022-01-20]. https://arxiv.53yu.com/pdf/1609.02907.pdf.
28	HAMILTON W, YING Z, LESKOVEC J. Inductive representation learning on large graphs[C]//Proc. of the 31st International Conference on Neural Information Processing Systems, 2017: 1025-1035.
29	BIANCHI F M , GRATTAROLA D , LIVI L , et al. Graph neural networks with convolutional ARMA filters[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2022, 44 (7): 3496- 3507.
30	WANG Y , SUN Y B , LIU Z W , et al. Dynamic graph CNN for learning on point clouds[J]. ACM Trans.on Graphics, 2019, 38 (5): 1- 12.
31	GAO H Y, JI S W. Graph u-nets[C]//Proc. of the International Conference on Machine Learning, 2019: 2083-2092.
32	YING R, YOU J, MORRIS C, et al. Hierarchical graph representation learning with differentiable pooling[C]//Proc. of the 32nd International Conference on Neural Information Processing Systems, 2018: 4805-4815.
33	BIANCHI F M, GRATTAROLA D, ALIPPI C. Spectral clustering with graph neural networks for graph pooling[C]//Proc. of the International Conference on Machine Learning, 2020: 874-883.
34	DAVID A W. Flawfinder[EB/OL]. [2022-02-10]. https://dwheeler.com/flawfinder/.

项目名称	数据
含漏洞	56 795
不含漏洞	157 247
节点数量	3 243 342
边数量	6 872 625
CWE编号	CWE 23 CWE 36 CWE 78 CWE121 CWE122 CWE124 CWE126 CWE127 CWE134 CWE190 CWE400 CWE401 CWE606 CWE761 CWE762 CWE789

测试结果		预测
测试结果		含漏洞	不含漏洞
实际	含漏洞	TP	FN
实际	不含漏洞	FP	TN

参数	设置
特征向量维度	50
迭代轮次	50
批次大小	128
GAT神经元个数	64
注意力头数	3
GTA激活函数	ReLU
SAG池化比例	0.85
SAG激活函数	sigmoid
GAP神经元个数	16
全连接层神经元个数	32/16
全连接层激活函数	ReLU
Dropout比例	0.5
优化函数	Adamax
损失函数	categorical_crossentrop

模型	Acc	F₁	Rec	Pre
GCN	90.87	90.61	88.12	93.25
GraphSAGE	95.31	95.12	91.66	98.86
ARMA	95.68	95.54	92.60	98.68
Edge	96.34	96.27	94.38	98.23
GCS	96.86	96.83	95.88	97.79
GAT-1	94.85	94.78	93.47	96.12
GAT-2	96.51	96.48	95.66	97.32
GAT-3	97.50	97.50	97.52	97.48

池化方式	Acc	F₁	Rec	Pre
Diff	95.18	95.22	96.08	94.38
MinCut	93.76	93.79	94.28	93.31
TopK	95.40	95.35	94.30	96.43
SAG	97.50	97.50	97.52	97.48