This repository is an official PyTorch implementation of CATS: Are Self-Attentions Effective for Time Series Forecasting?
CATS removes self-attention and retains only cross-attention in its transformer architecture. This design choice aims to better preserve temporal information in time series forecasting, addressing the potential loss of such information during the embedding process in traditional transformer models.
CATS achieves improved time and memory efficiency compared to traditional self-attention-based transformers. While self-attention complexity grows quadratically with input length
CATS implements extensive parameter sharing across all layers and dimensions for each horizon-dependent query. This approach, including shared projection layers, significantly reduces parameter count and improves computational efficiency in both training and inference phases.
We conducted extensive experiments to compare CATS with other state-of-the-art models for long input sequences. Our results demonstrate that CATS outperforms existing models in both efficiency and effectiveness.
CATS maintains robust performance as input length increases, unlike some complex models that suffer from increased computational burdens.
We pushed CATS further by testing it with significantly longer input sequences (2880 time steps) and compared it to other models using shorter inputs (512 time steps). The results were remarkable:
- CATS demonstrated better efficiency in terms of parameters, running time, and memory usage, even when processing nearly 5 times more data.
- It achieved this while maintaining superior forecasting performance.
To better understand how CATS processes time series data, we visualized its cross-attention mechanisms. We used a simple time series composed of two independent signals with different periodicities (where
These maps reveal CATS' ability to capture both shocks and periodicities in the signal:
- The left score map shows higher attention scores for patches containing shocks in the same direction.
- The right score map clearly demonstrates the correlation over 24 steps, reflecting the model's capture of signal periodicity.
This visualization confirms CATS' effectiveness in leveraging periodic information for accurate predictions.
CATS demonstrates superior performance across most datasets and forecasting horizons. CATS shows competitive results, often achieving the best or second-best scores in various time series forecasting tasks.
To set up the environment, follow these steps:
- Install Python 3.9
- Install the required packages:
pip install -r requirements.txt
To replicate the experiments in our paper, follow these steps:
- Download the dataset from Autoformer.
- Create a folder named
./dataset
in the root directory of this project. - Place all downloaded files and folders within the
./dataset
folder.
We provide various scripts for different datasets and input lengths. Here are a couple of examples:
- For the ETTm1 dataset with 512 input length:
bash ./scripts/ETTm1_512_input.sh
- Specifically, for the Traffic dataset with large input (2880):
bash ./scripts/Traffic_2880_Large_input.sh
You can find more scripts in the ./scripts
folder for other datasets and input lengths.
If you find this repo useful for your research, please cite our paper:
@inproceedings{kim2024self,
title={Are Self-Attentions Effective for Time Series Forecasting?},
author={Kim, Dongbin and Park, Jinseong and Lee, Jaewook and Kim, Hoki},
booktitle={Advances in Neural Information Processing Systems},
volume={37},
year={2024}
}
We would like to express our appreciation for the following GitHub repositories, which provided valuable code bases and datasets:
If you have any questions or want to use code, please contact dongbin413@snu.ac.kr