基于异构图神经网络的PHP污点型漏洞检测方法

    PHP taint-style vulnerability detection method based on heterogeneous graph neural networks

    • 摘要: PHP缺乏内置验证函数的设计缺陷和其用户交互特性,导致了大量的污点型漏洞。当前基于图神经网络的漏洞检测方法显著提升了检测的准确性和精度。然而,采用代码属性图进行漏洞表征时往往包含大量冗余信息,在实际应用中仍面临较高的误报和漏报。为此,提出了一种基于子属性图和异构图神经网络的PHP污点型漏洞检测方法HG-VulD(Heterogeneous Graph-Vulnerability Detection)。首先,通过从污点型漏洞的汇点到污点源的逆向遍历,保留了与污点型漏洞相关的语义和结构信息,剔除了无关节点,构建了子属性图,显著降低了图的复杂性。其次,HG-VulD结合语义特征与类型特征,将代码节点转换为向量表示,引入了异构图神经网络,对抽象语法树、控制流图和程序依赖图所承载的结构信息、控制信息和依赖信息进行独立学习,并通过注意力机制对多种边信息进行加权聚合,从而增强模型的分类性能。最后,在包含26万个文件的人工数据集上评估结果表明,HG-VulD的F1分数达到96.05%。此外,在真实软件漏洞数据集上,HG-VulD在XSS和SQLI两类污点型漏洞检测任务中分别实现了73.82%和67.81%的检测准确率,表明该模型在实际应用场景中具有良好的泛化能力。

       

      Abstract: PHP’s lack of built-in validation functions and user-interactive nature leads to widespread taint-style vulnerabilities. While graph neural networks improve detection accuracy, code property graphs often introduce redundancy, resulting in high false positives/negatives. For this purpose, a PHP taint-style vulnerability detection method named HG-VulD(Heterogeneous Graph-Vulnerability Detection) is proposed, which is based on sub-property graphs and heterogeneous graph neural networks. Sub-Property Graph(SPG) retains vulnerability-relevant semantics and structure by reverse traversal (sinks to sources), removing irrelevant nodes to reduce complexity. HG-VulD encodes code nodes via semantic and type features, using Heterogeneous Graph Neural Networks(HGNN) to independently learn syntax(Abstract Syntax Tree, AST), control flow(Control Flow Graph, CFG), and dependency information(Program Dependence Graph, PDG), with attention-based edge aggregation enhancing classification. Evaluations on a 260 k-file synthetic dataset show 96.05% F1, surpassing RIPS, WAP, and VulEye. Real-world tests achieve 73.82% (XSS) and 67.81% (SQLI) accuracy, demonstrating practical generalization.

       

    /

    返回文章
    返回