Abstract:
PHP’s lack of built-in validation functions and user-interactive nature leads to widespread taint-style vulnerabilities. While graph neural networks improve detection accuracy, code property graphs often introduce redundancy, resulting in high false positives/negatives. For this purpose, a PHP taint-style vulnerability detection method named HG-VulD(Heterogeneous Graph-Vulnerability Detection) is proposed, which is based on sub-property graphs and heterogeneous graph neural networks. Sub-Property Graph(SPG) retains vulnerability-relevant semantics and structure by reverse traversal (sinks to sources), removing irrelevant nodes to reduce complexity. HG-VulD encodes code nodes via semantic and type features, using Heterogeneous Graph Neural Networks(HGNN) to independently learn syntax(Abstract Syntax Tree, AST), control flow(Control Flow Graph, CFG), and dependency information(Program Dependence Graph, PDG), with attention-based edge aggregation enhancing classification. Evaluations on a 260 k-file synthetic dataset show 96.05%
F1, surpassing RIPS, WAP, and VulEye. Real-world tests achieve 73.82% (XSS) and 67.81% (SQLI) accuracy, demonstrating practical generalization.