Skip to content

MaherDissem/Unsupervised-Anomaly-Detection-in-Noisy-Time-Series-Data-for-Enhancing-Load-Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised anomaly detection and imputation in noisy time series data for enhancing load forecasting

Overview

This project introduces an innovative approach for unsupervised anomaly detection and imputation, specially designed for noisy time series data environments and aimed at enhancing the precision of load forecasting models.

Our system involves synthesizing realistic load anomalies, contaminating load data, and employing a custom pipeline to detect and impute these anomalies. The ultimate goal is to compare the performance of a load forecasting model trained on contaminated data with one trained on the cleaned data.

Modules

  • Data Processing

    Prepare data by preprocessing, generating and injecting synthetic anomalies, and saving data in a convenient format.

    Separate scripts are provided for each dataset, facilitating both anomaly detection and forecasting stages, with different customizable parameters such as the contamination rate, the sliding window's size and stride, the data split ratios, etc.

  • Anomaly Detection

    Train, evaluate and save the AD model. This model generates initial time series features, fills a memory bank with patch features extracted through a backbone and denoises the bank as train data may contain anomalies. An anomaly score is then calculated during inference as a distance to the saved features.

    Execute with python src/anomaly_detection/main.py.

  • Anomaly Imputation

    Train a bi-LSTM-based denoising recurrent autoencoder for imputing sequences of missing values in time serie data. This model is trained by randomly omitting values in anomaly-free samples.

    Execute with python src/anomaly_imputation/main.py.

  • Load Forecasting

    Train and evaluate a forecasting model on either the contaminated or the "cleaned" data where detected anomalies are imputed.

    We train and evaluate the following models given parameters like the sequence size, forecast horizon, etc.

    • Seq2seq: a GRU-based seq2seq model for time series forecasting.

    • SCINet: a recursive downsample-convolve-interact architecture.

    Execute with python src/forecasting/main.py --model_choice seq2seq.

Pipeline

All these modules can be called individually using their corresponding arguments (refer to corresponding main.py files). Plus, the sequential execution of the training and evaluation of every module in this pipeline for a set of given parameters is automated with python /src/pipeline.py.

Datasets

In our experiments, we leverage the following datasets:

  • Australian Energy Market Operator:

    Aggregated electricity demand for the states of Australia.

    Collect data: python src/data_processing/collect_aemo_data.py.

  • Industrial Park:

    Load data for 4 different types of buildings (commercial, office, public and residential). Data is obtained from here.

  • Predis-MHI:

    Load data collected in the GreEn-ER living lab (contains genuine unlabeled anomalies). This is a private dataset that's available upon request from the owner, link.

Results Replication

To replicate our results, run the following:

python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt

python src/run_parallel_experiments.py

Results metrics, visualizations and weights will be saved to results/ and logged to an MLflow server. Start it with $ mlflow ui -p 8080.

Acknowledgement

Our codebase builds heavily on the following projects:

  • SoftPatch: Anomaly detection for image data.

  • SCINet: One of the forecasting models we employ in our experiments.

Thanks for open-sourcing!

Citation

If you find this repository useful for your research work, please consider citing it as follows:

@Article{Dissem2024,
    author={Dissem, Maher
    and Amayri, Manar},
    title={Unsupervised anomaly detection and imputation in noisy time series data for enhancing load forecasting},
    journal={Applied Intelligence},
    year={2024},
    month={Nov},
    day={22},
    volume={55},
    number={1},
    pages={11},
    issn={1573-7497},
    doi={10.1007/s10489-024-05856-6},
    url={https://doi.org/10.1007/s10489-024-05856-6}
}