SeCloud project on classifying node.js sinks and sources. Based on OWASP list of JavaScript vulnerabilities. Inspired by the paper by Rasthofer et al. A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks
App for classifying can be found in the secloudapp folder.
Data is extracted from the multiple files downloaded from node.js and located in json folder.
Currently only the 'textRaw' and 'params' are taken into account. Those are aggregated in data.json.
Format is as follows:
{
"cl": 0,
"params": [
"value",
"message"
],
"textRaw": "assert(value[, message])"
}
Param "cl"
refers to the class. There are three classes in this dataset:
neither: 0
source: 1
sink: 2
For unknown class:
cl: -1
The python file that handles parsing is processJSON.py
For handcrafted features to be used as input look at helperJSON.py
Currently features are binary(is a feature present) and extracted from method names. Features are based on OWASP list of JavaScript vulnerabilities e.g. get
usually is a source of information. There are 15 such features extracted.
- Dataset is small with 265 hand annotated examples.
- Hand crafted features do not cover all possible cases of a source or a sink in Node.js hence some valuable info for classification is missing.