This project implements a time series multivariate analysis using RNN/LSTM for stock price predictions. A deep RNN model was created and trained on five years of historical Google stock price data to forecast the stock performance over a two-month period.
Data Set (Google Stock Price)
The dataset utilized comprises historical records for the stock price of Alphabet Inc. (GOOG), captured on daily basis.
The dataset is sourced from Yahoo Finance and contains the following fields: Opening price, Highest price, Lowest price, Closing price, Adjusted closing price, and Trading volume.
Initially, the entire dataset was explored and then a specific time period was selected to perform training, validation, and predictions as follows:
- Training data: from Jan 2019 till June 2023.
- Validation data: from July 2023 till Dec 2023.
- Testing data: first two months of 2024.
The raw, interim, and preprocessed datasets can be located in their corresponding subfolders in the main data directory.
The project is implemented in three consecutive phases simulating the essential data processing and analysis steps.
- Each phase is represented in a corresponding notebook inside the notebooks directory.
- Intermediary data files are stored inside the data directory.
- Auxiliary and final models are stored inside the models directory.
Corresponding notebook: data-explanatory-analysis.ipynb
Implemented data exploration tasks:
- Download and load the raw dataset file.
- Explore dataset summary and statistics.
- Perform initial data cleaning and type validation.
- Analyze the stock performance data over time.
- Select a specific period for analysis and filter data accordingly.
- Store filtered dataset file to a local folder.
Corresponding notebook: data-preprocessing.ipynb
Implemented data processing and transformation tasks:
- Load the filtered dataset file.
- Validate and correct data types.
- Select independent and target features.
- Create training, validation, and testing splits.
- Scale datasets to a [0,1] range using MinMaxScaler.
- Store processed data files (train, validate, test) to a local folder.
Corresponding notebook: model-training.ipynb
Implemented training and prediction tasks:
- Load preprocessed dataset files (train, validate, test).
- Construct data structures by creating input sequences.
- Build LSTM Model using TenserFlow Sequential:
- Compile LSTM model:
- Optimizer: Adam
- Loss: Mean Squared Error
- Train LSTM model:
- Epochs: 200
- Batch size: 64
- Evaluate model performance: training and validation losses.
- Predict future stock prices across training, validation, and testing periods.
- Inverse scale predictions to their original distribution.
- Visualize and analyze predictions.
Google stock price predictions with LSTM:
Google stock price predictions with LSTM (last 50 financial days):