Structural is a Python library for structural time series modeling and forecasting of daily univariate sequential data.
This code is released under the BSD 3-Clause license, which I've included in this repo under /LICENSE
.
Clone the repo if you want the source code
git clone https://github.com/kyleclo/structural.git
pip install -r requirements.txt
Or install the module using
pip install git+git://github.com/kyleclo/structural.git#egg=structural
Typical usage looks like this:
import pandas as pd
from structural import LinearTrend
df = pd.read_csv(DATA_FILEPATH)
model = LinearTrend(STAN_MODEL_FILEPATH)
model.fit(df)
yhat_fitted = model.predict(df)
new_df = pd.DataFrame({'ds': model.make_forecast_dates(h=100)})
yhat_forecasted = model.predict(new_df)
See /example.py
for a sample script. I recommend using it as starter code for those looking to deploy an automated forecasting service.
I'm currently working on (in no particular order):
- Adding in MCMC sampling + variability estimation
- Adding in manual changepoint selection
- Adding in manual holiday indicators
- Adding a prior over changepoint locations
- Adding in ARMA errors
- Adding a testing suite
- Creating a pypi distribution
- Generalizing LinearTrend to allow for link functions
- Refactor code structure so we're "building" each structural component into the model. (This probably requires piecing together Stan code snippets at runtime)
This project was inspired by Prophet, an open-source library released by the good folks over at Facebook. Check out their repo here: https://github.com/facebookincubator/prophet.
I developed this library while working at CDK Global on some projects involving large-scale time series forecasting. I tried using Prophet at first, but their library seems more geared toward use by analysts who will be tuning models manually. I needed a more general-purpose, automated procedure for my project, so I ended up writing this library instead. I've documented the differences below, but overall I've maintained a similar API.
Major differences include:
-
In Structural,
Structural
is actually an abstract base class. Users implement subclasses ofStructural
and instantiation is handled byStructural.create()
at runtime. I found this more extensible than Prophet'sProphet
class, which uses if-statements in its methods to switch between "linear" and "logistic" growth models. -
In Structural, Stan models are compiled and imported using
Structural.compile_stan_model()
andStructural.import_stan_model()
, as opposed to Prophet, which compiles Stan models during package installation. I made this change because I wanted the flexibility to add / modify / select different Stan models at runtime, without having to re-run the package installation. -
In Structural, automated changepoint generation places changepoints at the first of each month (excluding the first and last) in the training data, as opposed to Prophet, which generates changepoints using
np.linspace
over the first 80% of training set dates. I simply felt this was a more intuitive default choice when no changepoints are specified.
Other differences are more for style / personal preference:
-
Stylistically, I've written Structural such that the only methods that can set members are the constructor and
Structural.fit()
, while all other methods will return a result. Hence, I've removed/rewritten methods with side-effects likeProphet.setup_dataframe()
andProphet.set_changepoints()
. -
The user can now specify
yearly_order
andmonthly_order
for the fourier expansion when instantiating aStructural
object. In Prophet, these values are hard-coded to10
and3
within theProphet.make_all_seasonality_features()
method. -
I've rewritten
Structural.make_seasonality_df()
andStructural.make_changepoint_df()
to have consistent style: They both only create thezeros
feature vector when specifying no seasonality or no changepoints, respectively. -
In Structural,
Structural.make_changepoint_df()
returns a feature vector of zeros (instead of ones) in the no-changepoint case, so the fitteddelta
will be zero. In Prophet, when specifying no changepoints,Prophet.get_changepoint_matrix()
will still return a feature vector of ones. This results in fitting a non-zerodelta
term, which requires an additional correction step inProphet.fit()
by settingk = k + delta
anddelta = 0
.
Structural currently doesn't have support for:
- MCMC sampling
- Variability estimation (based on (1))
- Plotting
which Prophet provides.
Note: I'm making these comparisons based on what I saw in Prophet v0.0.post1 which have changed by now