A tool used to detect botnet based on existing P2P botnet packet dataset and health packet dataset. Using machine learning to differentiate botnet trace out of normal trace.
- Extract features with Tshark and numpy
- Train and generate result with sklearn extraTreesClassifier
- High true rate of 99%
dataExtraction.py
: Extracting packet data from pcap files and save asname.csv
generateFlow.py
: Combining packets with same sent IP and receive IP into flows and save flows intoname.flow.csv
featuresExtraction.py
: Extracting features from flows and save asname.features.csv
flowMix
: Generate train and test file by combining normal dataset and malicious dataset and save astest.csv
,train.csv
andtestStandard.csv
getResult
: To train and get result and get true rate.
- Put a healthy pcap dataset and a botnet/suspicious pcap dataset in root
- Modify
constants.py
, put the file names you want to use in FILENAMES. - Modify
all.sh
, in the fourth line, change second and third parameters intohealthy pcap filename
+.features.csv
andmalicious pcap filename
+.features.csv
. In my program, I use half total data to train and anothor half to test. You can modify the ratio on your own. - In the terminal, use command
. all.sh