Skip to content

Latest commit

 

History

History

ml

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

machine learning for time series Prediction

This project shows how to do time series data prediction with machine learning models. The data is Historical Hourly Weather Data 2012-2017 from Kaggle.

Two regression models of Tensorflow are used in this project:

These two models are used to do one step ahead prediction.

Prediction performance

Three models are used to evaluate the performance:

  • The Baseline method is to use the value of last step as the predicted value of next step;
  • The DNN model is a simple shallow nerual networks with three hidden layers: 32 x 16 x 16;
  • The Linear model has the same feature set as the DNN model.

Following table gives the inital RMSE of the three models without much tune:

method train-RMSE test-RMSE
Baseline - 1.6046
DNN 1.3849 1.1665
Linear 1.3946 1.1378
Holt-Winter-D 1.9 1.6045
Holt-Winter 1.9 1.6063

Figure-1 DNN model for Hourly temperature prediction.

Figure-2 Holt-Winter-D model for Hourly temperature prediction.

Train the models

0. Download the dataset

Download temperature.csv from Kaggle project; and put it to ./data/temperature.csv

1. Extract the data for one city

python extract.py

By default, it will extract the data for Denver, output=./data/denver.csv

2. Generate training/test data sets

python gen_data.py

This will genereate features for the training data and test data. Currently, the features are:

  • the values of last 10 time steps;
  • the deltas of the the last 10 time steps;
  • the 10 values of previous days aroud current time;
  • the deltas the previous days values;
  • the month of the data;
  • the day of the data;
  • the hour of the data;

By default, output will be ./data/denver-features-test.csv and ./data/denver-features-train.csv

3. Train the model

python main.py

DNNRegressor or LinearRegressor can be selected in main.py manually before the training.

4. calculate the RMSE and draw the data

python compare.py