Neural stock price forecasting system using fundamental analysis and technical analysis to predict the trend of stocks from the S&P 500 index. The main contributions of this work are summarized as follows:
- Develop the first approach with Pytorch Lightning as a learning framework, employing attention and Recurrent Neural Networks (RNNs). For further insights, read the dedicated report or the related notebook.
- Develop a distributed approach with Pytorch, PySpark, and Petastorm, leveraging a cluster of nodes to parallelize the computation. It builds on top of the former and extends it introducing the powerful Spark's SQL queries, enabling the system to scale with a large amount of data. For an overview of the system, see the slides or the related notebook.
We use data from Kaggle's public challenges, namely a first dataset with financial reports from S&P 500 from 2003 to 2013, and a second dataset containing stock market data. By aligning the two datasets and removing outliers (refer to the notebooks to see how the alignment is performed), we get an enriched dataset that can be used to perform both fundamental and technical analysis.
A benchmark showing the performance of our trading strategy algorithm (details in the slides, pages 14-16).
MSE | R2 | Adjusted R2 | Operation accuracy | Profit | |
DecisionTreeRegressor | 0.078 | 0.852 | - | 55.45% | 35.97% |
RandomForestRegressor | 0.104 | 0.803 | - | 57.01% | 51.61% |
LSTM | 0.021 | 0.939 | 0.897 | 56.52% | 58.35% |
In case you would like to install and configure PySpark on your local machine, please follow the instructions described here. Otherwise, you can clone the notebook and import it into Databricks as described here.
For a simple and ready-to-use test, simply run the test/evaluate.py
script that refers to the distributed system with pre-trained weights for the LSTM model. Otherwise, you can re-train the system using a model of your choice, and use the new weights to perform the evaluation.
.
├── data/ # Stock prices and fundamental data
├── report/
│ ├── main.pdf # Project report for the dlai-2021 course
│ ├── main.tex
│ └── ...
├── test/
│ ├── data/ # Model weights and test data
│ ├── evaluate.py # Evaluation script
│ └── ...
├── dist_forecasting.ipynb # PySpark distributed stock prediction system
├── forecasting.ipynb # Stock prediction system
├── environment.yml # Training environment
└── ...