Skip to content

SageMaker implementation of LSTM-FCN model for time series classification.

Notifications You must be signed in to change notification settings

fg-research/lstm-fcn-sagemaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSTM-FCN SageMaker Algorithm

The Time Series Classification (LSTM-FCN) Algorithm from AWS Marketplace performs time series classification with the Long Short-Term Memory Fully Convolutional Network (LSTM-FCN). It implements both training and inference from CSV data and supports both CPU and GPU instances. The training and inference Docker images were built by extending the PyTorch 2.1.0 Python 3.10 SageMaker containers. The algorithm can be used for binary, multiclass and multilabel classification of univariate time series.

Model Description

The LSTM-FCN model includes two blocks: a recurrent block and a convolutional block. The recurrent block consists of a single LSTM layer (either general or with attention) followed by a dropout layer. The convolutional block consists of three convolutional layers, each followed by batch normalization and ReLU activation, and of a global average pooling layer.

The input time series are passed to both blocks. The convolutional block processes each time series as a single feature observed across multiple time steps, while the recurrent block processes each time series as multiple features observed at a single time step (referred to as dimension shuffling). The outputs of the two blocks are concatenated and passed to a final linear layer.

LSTM-FCN architecture (source: doi: 10.1109/ACCESS.2017.2779939)

Model Resources: [Paper] [Code]

SageMaker Algorithm Description

The algorithm implements the model as described above with no changes. However, the algorithm implements only the general LSTM layer, the attention LSTM layer is not implemented. Furthermore, the algorithm allows for multiple LSTM layers, instead of only a single LSTM layer.

Training

The training algorithm has two input data channels: training and validation. The training channel is mandatory, while the validation channel is optional.

The training and validation datasets should be provided as CSV files. The column names of the one-hot encoded class labels should start with "y" (e.g. "y1", "y2", ...), while the column names of the time series values should start with "x" (e.g. "x1", "x2", ...).

All the time series should have the same length and should not contain missing values. The time series are scaled internally by the algorithm, there is no need to scale the time series beforehand.

See the sample input files train.csv and valid.csv.

See notebook.ipynb for an example of how to launch a training job.

Distributed Training

The algorithm supports multi-GPU training on a single instance, which is implemented through torch.nn.DataParallel. The algorithm does not support multi-node (or distributed) training across multiple instances.

Incremental Training

The algorithm supports incremental training. The model artifacts generated by a previous training job can be used to continue training the model on the same dataset or to fine-tune the model on a different dataset.

Hyperparameters

The training algorithm takes as input the following hyperparameters:

  • num-layers: int. The number of LSTM layers.
  • hidden-size: int. The number of hidden units of each LSTM layer.
  • dropout: float. The dropout rate applied after each LSTM layer.
  • filters-1: int. The number of filters of the first convolutional layer.
  • filters-2: int. The number of filters of the second convolutional layer.
  • filters-3: int. The number of filters of the third convolutional layer.
  • kernel-size-1: int. The size of the kernel of the first convolutional layer.
  • kernel-size-2: int. The size of the kernel of the second convolutional layer.
  • kernel-size-3: int. The size of the kernel of the third convolutional layer.
  • lr: float. The learning rate used for training.
  • batch-size: int. The batch size used for training.
  • epochs: int. The number of training epochs.
  • task: str. The type of classification task, either "binary", "multiclass" or "multilabel".

All the hyperparameters are tunable, excluding the type of classification task, which needs to be defined beforehand.

Metrics

The training algorithm logs the following metrics:

  • train_loss: float. Training loss.
  • train_accuracy: float. Training accuracy.

If the validation channel is provided, the training algorithm also logs the following additional metrics:

  • valid_loss: float. Validation loss.
  • valid_accuracy: float. Validation accuracy.

See notebook.ipynb for an example of how to launch a hyperparameter tuning job.

Inference

The inference algorithm takes as input a CSV file containing the time series values. The column names of the time series values should start with "x" (e.g. "x1", "x2", ...).

All the time series should have the same length and should not contain missing values. The time series are scaled internally by the algorithm, there is no need to scale the time series beforehand.

See the sample input file test_data.csv in the data/inference/input folder.

The inference algorithm outputs the predicted class labels, which are returned in CSV format. See the sample output files batch_predictions.csv and real_time_predictions.csv.

See notebook.ipynb for an example of how to launch a batch transform job.

Endpoints

The algorithm supports only real-time inference endpoints. The inference image is too large to be uploaded to a serverless inference endpoint.

See notebook.ipynb for an example of how to deploy the model to an endpoint, invoke the endpoint and process the response.

Additional Resources: [Sample Notebook] [Blog Post]

References

  • F. Karim, S. Majumdar, H. Darabi and S. Chen, "LSTM Fully Convolutional Networks for Time Series Classification," in IEEE Access, vol. 6, pp. 1662-1669, 2018, doi: 10.1109/ACCESS.2017.2779939.
  • F. Karim, S. Majumdar and H. Darabi, "Insights Into LSTM Fully Convolutional Networks for Time Series Classification," in IEEE Access, vol. 7, pp. 67718-67725, 2019, doi: 10.1109/ACCESS.2019.2916828.