This project shows how to do time series data prediction with machine learning models.
The data is Historical Hourly Weather Data 2012-2017
from Kaggle.
Two regression models of Tensorflow are used in this project:
These two models are used to do one step ahead prediction.
Three models are used to evaluate the performance:
- The
Baseline
method is to use the value of last step as the predicted value of next step; - The
DNN
model is a simple shallow nerual networks with three hidden layers: 32 x 16 x 16; - The
Linear
model has the same feature set as theDNN
model.
Following table gives the inital RMSE
of the three models without much tune:
method | train-RMSE | test-RMSE |
---|---|---|
Baseline | - | 1.6046 |
DNN | 1.3849 | 1.1665 |
Linear | 1.3946 | 1.1378 |
Holt-Winter-D | 1.9 | 1.6045 |
Holt-Winter | 1.9 | 1.6063 |
Download temperature.csv
from Kaggle project;
and put it to ./data/temperature.csv
python extract.py
By default, it will extract the data for Denver
, output=./data/denver.csv
python gen_data.py
This will genereate features for the training data and test data. Currently, the features are:
- the values of last
10
time steps; - the deltas of the the last
10
time steps; - the
10
values of previous days aroud current time; - the deltas the previous days values;
- the month of the data;
- the day of the data;
- the hour of the data;
By default, output will be ./data/denver-features-test.csv
and ./data/denver-features-train.csv
python main.py
DNNRegressor
or LinearRegressor
can be selected in main.py
manually before the training.
python compare.py