xAST (Application Security Testing Technology) plays an increasingly important role in ensuring software security and reliability. Currently, there are dozens of commercial or open-source products available for each type of product (SAST/IAST/DAST/SCA/MAST, etc.), and some first-party enterprises are also developing their own xAST products. Whether it's commercial procurement, choosing open-source products, or self-developed products, everyone faces a common problem: how to objectively measure the technical level of an xAST product?
Currently, neither the industry nor the academic community has established evaluation standards for the technical capabilities of xAST products. Typically, the evaluation is based on the test results of several vulnerability sample sets. Commonly used vulnerability sample sets in the industry are shown in Figure 1, and an example of test results is shown in Figure 2.
On the one hand, as can be seen from Figure 1, there are huge differences between various sample sets (from tens of samples to hundreds of thousands of samples). The root cause lies in the lack of systematic design of vulnerability sample sets, which are simply stacked with vulnerability samples, and the integrity of the test results is not guaranteed. The distribution of functional points tested by vulnerability sample sets is uneven, and the test results lack rationality.
On the other hand, as shown in Figure 2, due to the lack of evaluation system, the evaluation results are like a "black box" to users. The evaluation results can only provide overall recall rate and false positive rate data, and cannot finely characterize the technical strengths and weaknesses of the products.
Addressing the pain point in the industry of lacking effective standards to measure technical capabilities in the xAST field, the Ant Security Team, in collaboration with more than 20 experts and scholars from Ant Program Analysis Team and the Network Space Security Institute of Zhejiang University, jointly designed the xAST evaluation system and its test sample suite Benchmark, aiming to become the "metric" of application security testing tools.
Objectives: To create a xAST capability evaluation system with industry consensus technical standards.
Values: Measure the technical capabilities of xAST products, guide the direction of xAST technology development, and assist enterprises in product selection.
Traditional vulnerability sample sets generally lack evaluation item design. They typically demonstrate their "completeness" simply by stacking samples, which may result in many samples testing the same functional point. This inevitably leads to the inability to ensure the completeness and rationality of the test results.
Before designing the test sample set, we first designed a set of evaluation systems containing evaluation items in various dimensions, which is the first time in the industry. Then, based on the evaluation system, we designed corresponding test sample sets, which improves completeness and rationality compared to traditional methods.
Traditional vulnerability sample sets generally use vulnerability types as the evaluation perspective, and different types of vulnerabilities constitute the sample set. Different types of xAST products are tested on the same set of vulnerability samples. However, xAST technologies have different principles, and it is difficult for sample sets to be used universally across different types of products. For example, testing static analysis SAST path-sensitive samples may not be very meaningful for dynamic analysis IAST. This also affects the rationality of the test results.
In response to this situation, we changed the evaluation perspective, transforming from a vulnerability perspective to a tool-centric perspective for the first time in the industry. Different tools have different evaluation items, different languages have different evaluation items, and the design of evaluation items and samples is more reasonable.
Essentially, the capabilities of xAST are hierarchical. Some capabilities are relatively low-level, such as the ability to track tainted data, which is usually difficult to implement and costly, and requires users' attention. Some capabilities belong to the upper layer, such as support for a certain rule or framework's sink points, which can be supported by simple configuration. The importance is relatively lower compared to engine capabilities. Traditional vulnerability sample sets do not distinguish between these capabilities, and the test results cannot distinguish whether the lack of support is due to the absence of configuration of the rule or the lack of engine capabilities.
We propose for the first time in the industry to divide the capabilities of an xAST into three layers: engine capabilities, rule capabilities, and productization capabilities. Evaluation systems and test samples are designed for these three layers respectively, reducing the complexity of each layer's evaluation and enabling test results to directly reflect the layer where the problem lies.
Traditional vulnerability sample sets lack guidance from evaluation systems, and the "test functional points" of each sample are vague. Evaluation results are like a "black box", providing only recall rate and accuracy data, unable to provide more fine-grained information.
Based on the evaluation system, each evaluation item corresponds to a test sample, giving each test sample clear "test functional points", making the test results resemble a detailed "physical examination report", with fine-grained interpretation, knowing the reasons behind them.
To ensure the completeness of the evaluation system and its Benchmark, we have also cross-validated with commonly used Benchmarks in the industry to ensure that the test functional points of these common Benchmarks are reflected in our evaluation system, further ensuring the completeness of the evaluation system.
The overall structure of the project can be seen in Figure 8, consisting of several sub-modules such as SAST (Static Application Security Testing), IAST (Interactive Application Security Testing), DAST (Dynamic Application Security Testing), SCA (Software Composition Analysis), MAST (Mobile Application Security Testing Technology), large-scale models/machine learning vulnerability detection, etc.
Each sub-module of the evaluation system includes engine capability evaluation system (differentiating by development language) and rule capability evaluation system. Taking SAST-Java engine capability evaluation system as an example, the evaluation system consists of evaluation indicator items and test sample code Benchmark based on evaluation indicator items, as shown in Figure 9.
Users can fully understand the capabilities of the tested product by combining the actual test results of various xAST products on the Benchmark with the evaluation indicator items.
xast-contact@service.alipay.com
This project is licensed under the Apache License 2.0