By Thejineaswar Guhan, Krishna Yadav part of SAI LAB
We trained the global model across six different datasets that varies in their number of features.
- CICIDS 2018
- CICIDS 2017
- BOT IoT
- NSL_KDD
- TON_IoT
- UNSW_NB15
Our result suggest that the global model was able to generalize across the intrusion classes present across all the datasets.
System heterogeneity is widely seen in IoT devices these days. If an IoT device is deployed as a sensor node and is collecting outside information, the dataset generated by it varies in terms of a number of features. When we want one global model to be trained through the collaborative machine learning process like federated learning, we want the same number of features across all the peer nodes. We have solved the problem by reducing the number of features to the common features with the help of an autoencoder such that all the datasets can take part in federated training. The trained global model is able to generalize the labels present across all the datasets.
- To get the Autoencoder processed dataset, run the
AE_feature_extraction/ae_all_datasets_with_hyper_mae.py
- If you need to change the dimensions of AE bottleneck layer change the value of the variable
AE_HIDDEN_UNITS
- If you need to change the dimensions of AE bottleneck layer change the value of the variable
- Next we need to process the labels, for the same you need to execute
AE_feature_extraction/ae_merge_all_labels.py
. This only needs to be executed once - The data and labels are stored in
Dataset/AE_formed_data
. Note that wheneverAE_feature_extraction/ae_all_datasets_with_hyper_mae.py
is executed the contents in the directory will get replaced - If you have changed the Bottleneck dimension, then you got to change the key
num_columns
infederated_client_server/model_params.py
. - Execute
federated_client_server/server.py
to execute FL code. Change the number of epochs and training rounds if necessary