pyteller

An open source project from Data to AI Lab at MIT.

pyteller

Time series forecasting using MLPrimitives

Documentation: https://signals-dev.github.io/pyteller
Homepage: https://github.com/signals-dev/pyteller

Overview

pyteller is a time series forecasting library built with the end user in mind.

Data Format

Input

The expected input to pyteller pipelines is a .csv file with data in one of the following formats:

Targets Table

Option 1: Single Entity (Academic Form)

The user must specify the following:

timestamp_col: the string denoting which column contains the pandas timestamp objects or python datetime objects corresponding to the time at which the observation is made
target_signal: an integer or float column with the observed target values at the indicated timestamps

This is an example of such table, where the timestamp_col is 'timestamp' and the target_signal is 'value'

timestamp	value
7/1/14 1:00	6210
7/1/14 1:30	4656
7/1/14 2:00	3820
7/1/14 1:30	4656
7/1/14 2:00	3820
7/1/14 2:30	2873

|

Option 2: Multiple Entity (Flat Form)

The user must specify the following:

timestamp_col: the string denoting which column contains the pandas timestamp objects or python datetime objects corresponding to the time at which the observation is made
entities: the list denoting the columns the user wants to make forecasts for

This is an example of such table, where the timestamp_col is 'timestamp' and the entities can be ['taxi 1','taxi 3']

timestamp	taxi 1	taxi 2	taxi 3
7/1/14 1:00	6210	510	6230
7/1/14 1:30	4656	5666	656
7/1/14 2:00	3820	2420	3650
7/1/14 1:30	4656	4664	380
7/1/14 2:00	3820	3520	320
7/1/14 2:30	2873	1373	3640

Option 3: Multiple Entity (Long Form)

The user must specify the following:

timestamp_col: the string denoting which column contains the pandas timestamp objects or python datetime objects corresponding to the time at which the observation is made
entity_col: the string denoting which column contains the entities you will seperately make forecasts for
target_signal: the string denoting which columns contain the observed target value that you want to forecast for

This is an example of such table, where the timestamp_col is 'timestamp', the entity_col is 'region', and the target_signal is 'demand'.

timestamp	region	demand	Temp	Rain
9/27/20 21:20	DAYTON	1841.6	65.78	0
9/27/20 21:20	DEOK	2892.5	75.92	0
9/27/20 21:20	DOM	11276	55.29	0
9/27/20 21:20	DPL	2113.7	75.02	0.06
9/27/20 21:25	DAYTON	1834.1	65.72	0
9/27/20 21:25	DEOK	2880.2	75.92	0
9/27/20 21:25	DOM	11211.7	55.54	0
9/27/20 21:25	DPL	2086.6	75.02	0.06

Output

The output of the pyteller Pipelines is another table that contains the timestamp and the forecasting value(s), matching the format of the input targets table.

Datasets in the library

For development and evaluation of pipelines, we include the following datasets:

NYC taxi data

Found on the nyc website, or the processed version maintained by Numenta here.
No modifications were made from the Numenta version

Wind data

Found here on kaggle
After downloading the FasTrak 5-Minute .txt files the .txt files for each day from 1/1/13-1/8/18 were compiled into one .csv file

Weather data

Maintained by Iowa State University's IEM
The downloaded data was from the selected network of 8A0 Albertville and the selected date range was 1/1/16 0:15 - 2/16/16 0:55

Traffic data

Found on Caltrans PeMS
No modifications were made from the Numenta version

Energy data

Found on kaggle
No modifications were made after downloading pjm_hourly_est.csv We also use PJM electricity demand data found here.

Current Available Pipelines

The pipelines are included as JSON files, which can be found in the subdirectories inside the pyteller/pipelines folder.

This is the list of pipelines available so far, which will grow over time:

name	location	description
Persistence	pyteller/pipelines/sandbox/persistence	uses the latest input to the model as the next output

Install

Requirements

pyteller has been developed and tested on Python 3.5, 3.6, 3.7 and 3.8

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system in which pyteller is run.

These are the minimum commands needed to create a virtualenv using python3.6 for pyteller:

pip install virtualenv
virtualenv -p $(which python3.6) pyteller-venv

Afterwards, you have to execute this command to activate the virtualenv:

source pyteller-venv/bin/activate

Remember to execute it every time you start a new console to work on pyteller!

Install from source

With your virtualenv activated, you can clone the repository and install it from source by running make install on the stable branch:

git clone git@github.com:signals-dev/pyteller.git
cd pyteller
git checkout stable
make install

Install for Development

If you want to contribute to the project, a few more steps are required to make the project ready for development.

Please head to the Contributing Guide for more details about this process.

Quick Start

In this short tutorial we will guide you through a series of steps that will help you getting started with pyteller.

1. Load the data

In the first step we will load the Alabama Weather data into a dataframe from the demo datasets in the data folder. This represents all of the data up-to-date that will be used to train the model.

from pyteller.data import load_data
current_data=load_data('../pyteller/data/AL_Weather_current.csv')

The output is a dataframe:

    station     valid       tmpf        dwpf        relh        drct        sknt        p01i        alti      vsby        feel
0     8A0    1/1/16 0:15   41.000     39.200       93.240      350.000      6.000      0.000       30.250    10.000      36.670
1     4A6    1/1/16 0:15   41.000     39.000       70.080      360.000      5.000      0.000       30.300    10.000      37.080
2     8A0    1/1/16 0:35   39.200     37.400       93.190      360.000      6.000      0.000       30.250    10.000      34.200
3     4A6    1/1/16 0:35   41.000     32.000       70.080      360.000      5.000      0.000       30.290    10.000      37.080
4     8A0    1/1/16 0:55   37.400     37.400       100.000     360.000      8.000      0.000       30.250    10.000      30.760

Once we have the data, create an instance of the Pyteller class, where the input arguments are the forecast settings.

from pyteller.core import Pyteller
pyteller = Pyteller (
    pipeline='persistence',
    pred_length=3,
    offset=5,
)

2. Fit the data

The user now calls the pyteller.fit method to fit the data to the pipeline. The inputs are the loaded data and the column names. The user also specifies which signal or entities they want to predict for here.

pyteller.fit(
    data=current_data,
    timestamp_col='valid',
    target_signal='tmpf',
    entity_col='station')

3. Save the trained model

At this point, the user has a trained model that can be pickled by calling the pyteller.save method, inputting the desired output path:

pyteller.save('../fit_models/persistence')

4. Load the new data

Once the user gets new data that they want to use to make a prediction, they can load it in the same way they loaded the training data.

input_data=load_data('../pyteller/data/AL_Weather_input.csv')

5. Forecast

To make a forecast, the user calls the pyteller.forecast method, which will output the forecasts for all signals and all entities.

forecast = pyteller.forecast(input_data)

The output is a dataframe of all the predictions:

 timestamp        8A0            4A6
 2/4/16 18:15    42.800        44.800
 2/4/16 18:35    42.800        42.600
 2/4/16 18:55    44.800        43.000






# What's next?

For more details about **pyteller** and all its possibilities
and features, please check the [documentation site](
https://signals-dev.github.io/pyteller/).

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github		.github
benchmark		benchmark
docs		docs
examples		examples
pyteller		pyteller
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyteller

Overview

Table of Contents

Data Format

Input

Targets Table

Option 1: Single Entity (Academic Form)

Option 2: Multiple Entity (Flat Form)

Option 3: Multiple Entity (Long Form)

Output

Datasets in the library

NYC taxi data

Wind data

Weather data

Traffic data

Energy data

Current Available Pipelines

Install

Requirements

Install from source

Install for Development

Quick Start

1. Load the data

2. Fit the data

3. Save the trained model

4. Load the new data

5. Forecast

About

Releases 1

Packages

Contributors 2

Languages

License

sintel-dev/pyteller

Folders and files

Latest commit

History

Repository files navigation

pyteller

Overview

Table of Contents

Data Format

Input

Targets Table

Option 1: Single Entity (Academic Form)

Option 2: Multiple Entity (Flat Form)

Option 3: Multiple Entity (Long Form)

Output

Datasets in the library

NYC taxi data

Wind data

Weather data

Traffic data

Energy data

Current Available Pipelines

Install

Requirements

Install from source

Install for Development

Quick Start

1. Load the data

2. Fit the data

3. Save the trained model

4. Load the new data

5. Forecast

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages