Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimisation of historical forecast for regression models #1885

Merged
merged 50 commits into from
Aug 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
b7a2040
added optimized_historical_forecasts method to forecasting models
dennisbader May 30, 2023
9b43b51
reduced historical forecastable index generation
dennisbader May 31, 2023
c37193e
feat: historical forecasts optimization for simple regression model
madtoinou Jul 4, 2023
d47ff70
fix: bug in reduction to boundaries
madtoinou Jul 5, 2023
4ed475a
fix: reduce intermediary step
madtoinou Jul 5, 2023
7231bab
fix: improved support for stride > 1and forecast_horizon > 1, still s…
madtoinou Jul 5, 2023
b261cde
fix: improved generalization, support stride and forecast_horizon > 1
madtoinou Jul 6, 2023
3b24330
fix: improved comments and error messages
madtoinou Jul 6, 2023
a319d99
fix: bug for stride and forecast horizon > 1
madtoinou Jul 7, 2023
76263f2
modularizing the code, debugged last_points_only = False for all scen…
madtoinou Jul 7, 2023
c4896fa
fix: last_points_only with model.output_chunk_length != forecast_horizon
madtoinou Jul 7, 2023
e64e962
fix: improved arguments names and type hinting
madtoinou Jul 7, 2023
147e1f8
Merge branch 'master' into refactor/hist_fc_regression
madtoinou Jul 7, 2023
307c0e9
fix: simplified if/else
madtoinou Jul 10, 2023
b8a1e94
feat: support for num_samples > 1 when last_points_only = True
madtoinou Jul 10, 2023
6d8ea99
feat: support num_samples > 1 for last_points_only=False
madtoinou Jul 10, 2023
2fd1971
fix: improved type hinting
madtoinou Jul 10, 2023
9af8539
fix: support for RangeIndex with start > 0 and static covariates
madtoinou Jul 11, 2023
9a372bd
fix: bug for RangeIndex with step > 1
madtoinou Jul 11, 2023
9cd0a8f
fix: show_warning argument is correctly propagated, simplified some i…
madtoinou Jul 11, 2023
b2a5185
fix: bug in forecastable index, pos/neg aspect of future lags was not…
madtoinou Jul 13, 2023
30d5b7b
fix: forecastable index must be further shifted in not multi_models
madtoinou Jul 13, 2023
685417d
fix: remove duplicated code
madtoinou Jul 13, 2023
c309a44
fix: stored covariates are retrieved when necessary in the optimised …
madtoinou Jul 13, 2023
c5f0236
Merge branch 'master' into refactor/hist_fc_regression
madtoinou Jul 13, 2023
a544128
Merge branch 'master' into refactor/hist_fc_regression
dennisbader Jul 13, 2023
9e479b6
doc: added entry in the changelog
madtoinou Jul 17, 2023
935a426
fix: revert changes to get_forecastable_time_index
madtoinou Jul 17, 2023
c8e74e4
feat: created a utils module dedicated to historical forecast, addres…
madtoinou Jul 17, 2023
f5d936e
fix: slicing the covariates as much as possible
madtoinou Jul 17, 2023
76534a1
fix: _optimised_historical_forecasts returns an empty list
madtoinou Jul 25, 2023
dff4652
Merge branch 'master' into refactor/hist_fc_regression
madtoinou Jul 27, 2023
84748ad
feat: improved modularity, light performance gain for un-optimised hi…
madtoinou Jul 27, 2023
97aa91e
fix: further modularization, added typing
madtoinou Jul 27, 2023
78b97da
doc: improved typing and added some docstrings
madtoinou Jul 27, 2023
09959a5
fix: clearly separated the retrain=True/False for historical_forecast…
madtoinou Jul 27, 2023
be7dd88
merge with master
madtoinou Jul 27, 2023
2a45588
fix: predict_likelihood_parameters in optimized historical fct
madtoinou Jul 27, 2023
7f4952c
Merge branch 'master' into refactor/hist_fc_regression
dennisbader Jul 28, 2023
b4b21a0
Merge branch 'master' into refactor/hist_fc_regression
madtoinou Jul 31, 2023
ce3b00f
fix: addressing review comments
madtoinou Jul 31, 2023
29e7211
fix: renamed optimised to optimized (US spelling)
madtoinou Jul 31, 2023
2ddb537
fix: renamed optimised to optimized (US spelling)
madtoinou Jul 31, 2023
cd0078d
Merge branch 'refactor/hist_fc_regression' of https://github.com/unit…
madtoinou Jul 31, 2023
e277d80
feat: added tests for optimized historical forecast
madtoinou Jul 31, 2023
572721b
fix: properly copying dict in tests
madtoinou Jul 31, 2023
0edd779
fix: support for model with encoders
madtoinou Jul 31, 2023
8c98d48
fix: improved encoders support for opti hist fct
madtoinou Jul 31, 2023
102178d
include index gap between train and inference index for covariates in…
dennisbader Jul 31, 2023
ce508ea
reduce testing time
dennisbader Aug 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
- General model improvements:
- Added support for `PathLike` to the `save()` and `load()` functions of all non-deep learning based models. [#1754](https://github.com/unit8co/darts/pull/1754) by [Simon Sudrich](https://github.com/sudrich).
- Improved efficiency of `historical_forecasts()` and `backtest()` for all models giving significant process time reduction for larger number of predict iterations and series. [#1801](https://github.com/unit8co/darts/pull/1801) by [Dennis Bader](https://github.com/dennisbader).
- Optimized `historical_forecasts()` for `RegressionModel` when `retrain=False` and `forecast_horizon <= output_chunk_length` by vectorizing the prediction. [#1885](https://github.com/unit8co/darts/pull/1885) by [Antoine Madrona](https://github.com/madtoinou).
- Added model property `ForecastingModel.supports_multivariate` to indicate whether the model supports multivariate forecasting. [#1848](https://github.com/unit8co/darts/pull/1848) by [Felix Divo](https://github.com/felixdivo).
- `Prophet` now supports conditional seasonalities, and properly handles all parameters passed to `Prophet.add_seasonality()` and model creation parameter `add_seasonalities` [#1829](https://github.com/unit8co/darts/pull/#1829) by [Idan Shilon](https://github.com/id5h).
- Added support for direct prediction of the likelihood parameters to probabilistic models using a likelihood (regression and torch models). Set `predict_likelihood_parameters=True` when calling `predict()`. [#1811](https://github.com/unit8co/darts/pull/1811) by [Antoine Madrona](https://github.com/madtoinou).
Expand Down
9 changes: 8 additions & 1 deletion darts/dataprocessing/encoders/encoder_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,14 @@ def generate_train_inference_idx(
inference_idx, _ = self.generate_inference_idx(
n=n, target=target, covariates=covariates
)
return train_idx.__class__.union(train_idx, inference_idx), target_end
# generate index end is inclusive, should not be a problem when taking union
gap = generate_index(
start=train_idx[-1], end=inference_idx[0] - target.freq, freq=target.freq
)
return (
train_idx.__class__.union(train_idx, gap).union(inference_idx),
target_end,
)

@property
@abstractmethod
Expand Down
410 changes: 128 additions & 282 deletions darts/models/forecasting/forecasting_model.py
dennisbader marked this conversation as resolved.
Show resolved Hide resolved
madtoinou marked this conversation as resolved.
Show resolved Hide resolved

Large diffs are not rendered by default.

122 changes: 121 additions & 1 deletion darts/models/forecasting/regression_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,10 @@
if their static covariates do not have the same size, the shorter ones are padded with 0 valued features.
"""
from collections import OrderedDict
from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

from darts.logging import get_logger, raise_if, raise_if_not, raise_log
Expand All @@ -40,6 +41,10 @@
create_lagged_component_names,
create_lagged_training_data,
)
from darts.utils.historical_forecasts import (
_optimized_historical_forecasts_regression_all_points,
_optimized_historical_forecasts_regression_last_points_only,
)
from darts.utils.multioutput import MultiOutputRegressor
from darts.utils.utils import (
_check_quantiles,
Expand Down Expand Up @@ -860,6 +865,121 @@ def supports_future_covariates(self) -> bool:
def supports_static_covariates(self) -> bool:
return True

@property
def supports_optimized_historical_forecasts(self) -> bool:
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
return True

def _check_optimizable_historical_forecasts(
self,
forecast_horizon: int,
retrain: Union[bool, int, Callable[..., bool]],
show_warnings=bool,
) -> bool:
"""
Historical forecast can be optimized only if `retrain=False` and `forecast_horizon <= self.output_chunk_length`
(no auto-regression required).
"""

supported_retrain = (retrain is False) or (retrain == 0)
supported_forecast_horizon = forecast_horizon <= self.output_chunk_length
if supported_retrain and supported_forecast_horizon:
return True

if show_warnings:
if not supported_retrain:
logger.warning(
"`enable_optimization=True` is ignored because `retrain` is not `False`"
"To hide this warning, set `show_warnings=False` or `enable_optimization=False`."
)
if not supported_forecast_horizon:
logger.warning(
"`enable_optimization=True` is ignored because "
"`forecast_horizon > self.output_chunk_length`."
"To hide this warning, set `show_warnings=False` or `enable_optimization=False`."
)

return False

def _optimized_historical_forecasts(
self,
series: Optional[Sequence[TimeSeries]],
past_covariates: Optional[Sequence[TimeSeries]] = None,
future_covariates: Optional[Sequence[TimeSeries]] = None,
num_samples: int = 1,
start: Optional[Union[pd.Timestamp, float, int]] = None,
forecast_horizon: int = 1,
stride: int = 1,
overlap_end: bool = False,
last_points_only: bool = True,
verbose: bool = False,
show_warnings: bool = True,
predict_likelihood_parameters: bool = False,
) -> Union[
TimeSeries, List[TimeSeries], Sequence[TimeSeries], Sequence[List[TimeSeries]]
]:
"""
TODO: support forecast_horizon > output_chunk_length (auto-regression)
"""
if not self._fit_called:
raise_log(
ValueError("Model has not been fit yet."),
logger,
)
if forecast_horizon > self.output_chunk_length:
raise_log(
ValueError(
"`forecast_horizon > model.output_chunk_length` requires auto-regression which is not "
"supported in this optimized routine."
),
logger,
)

# manage covariates, usually handled by RegressionModel.predict()
if past_covariates is None and self.past_covariate_series is not None:
Copy link
Collaborator

@dennisbader dennisbader Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current implemenetation models using encoders are not yet optimizable.
Do you think we could add support for this?

Edit: I'm thinking about adding something like a generate_fit_predict_encodings which would make this a bit easier. Maybe we could drop optimization support for encoders until then.

Edit 2: I added the generate_fit_predict_encodings in #1925. Would be cool to merge that one and then add the support for optimization with encodings here as well :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I tend to forget about the encoders... Thank you for implementing this, I will adjust this PR as soon as the other one is merged.

past_covariates = [self.past_covariate_series] * len(series)
if future_covariates is None and self.future_covariate_series is not None:
future_covariates = [self.future_covariate_series] * len(series)

self._verify_static_covariates(series[0].static_covariates)

if self.encoders.encoding_available:
past_covariates, future_covariates = self.generate_fit_predict_encodings(
n=forecast_horizon,
series=series,
past_covariates=past_covariates,
future_covariates=future_covariates,
)

# TODO: move the loop here instead of duplicated code in each sub-routine?
if last_points_only:
return _optimized_historical_forecasts_regression_last_points_only(
model=self,
series=series,
past_covariates=past_covariates,
future_covariates=future_covariates,
num_samples=num_samples,
start=start,
forecast_horizon=forecast_horizon,
stride=stride,
overlap_end=overlap_end,
show_warnings=show_warnings,
predict_likelihood_parameters=predict_likelihood_parameters,
)
else:
return _optimized_historical_forecasts_regression_all_points(
model=self,
series=series,
past_covariates=past_covariates,
future_covariates=future_covariates,
num_samples=num_samples,
start=start,
forecast_horizon=forecast_horizon,
stride=stride,
overlap_end=overlap_end,
show_warnings=show_warnings,
predict_likelihood_parameters=predict_likelihood_parameters,
)


class _LikelihoodMixin:
"""
Expand Down
161 changes: 158 additions & 3 deletions darts/tests/models/forecasting/test_historical_forecasts.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import unittest
from typing import Union

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -625,6 +626,156 @@ def test_regression_auto_start_multiple_no_cov(self):
f"Expected {theorical_forecast_length}, got {len(forecasts[0])} and {len(forecasts[1])}",
)

@pytest.mark.slow
def test_optimized_historical_forecasts_regression(self):
start_ts = pd.Timestamp("2000-01-01")
ts_univariate = tg.linear_timeseries(
start_value=1, end_value=100, length=20, start=start_ts
)
ts_multivariate = ts_univariate.stack(
tg.sine_timeseries(length=20, start=start_ts)
)
# slightly longer to not affect the last predictable timestamp
ts_covs = tg.gaussian_timeseries(length=30, start=start_ts)
start = 14
model_cls = LinearRegressionModel
for ts in [ts_univariate, ts_multivariate]:
# cover several covariates combinations and several regression models
for _, model_kwargs, _ in (
models_reg_no_cov_cls_kwargs + models_reg_cov_cls_kwargs
):
for multi_models in [True, False]:
for forecast_horizon in [1, 5]:
# ocl == forecast horizon
model_kwargs_same = model_kwargs.copy()
model_kwargs_same["output_chunk_length"] = forecast_horizon
model_kwargs_same["multi_models"] = multi_models
model_same = model_cls(**model_kwargs_same)
model_same.fit(
series=ts[:start],
past_covariates=ts_covs
if model_same.supports_past_covariates
else None,
future_covariates=ts_covs
if model_same.supports_future_covariates
else None,
)
# ocl >= forecast horizon
model_kwargs_diff = model_kwargs.copy()
model_kwargs_diff["output_chunk_length"] = 5
model_kwargs_diff["multi_models"] = multi_models
model_diff = model_cls(**model_kwargs_same)
model_diff.fit(
series=ts[:start],
past_covariates=ts_covs
if model_diff.supports_past_covariates
else None,
future_covariates=ts_covs
if model_diff.supports_future_covariates
else None,
)
for model in [model_same, model_diff]:
for last_points_only in [True, False]:
for stride in [1, 2]:
hist_fct = model.historical_forecasts(
series=ts,
past_covariates=ts_covs
if model.supports_past_covariates
else None,
future_covariates=ts_covs
if model.supports_future_covariates
else None,
start=start,
retrain=False,
last_points_only=last_points_only,
stride=stride,
forecast_horizon=forecast_horizon,
enable_optimization=False,
)

# manually packing the series in list to match expected inputs
opti_hist_fct = (
model._optimized_historical_forecasts(
series=[ts],
past_covariates=[ts_covs]
if model.supports_past_covariates
else None,
future_covariates=[ts_covs]
if model.supports_future_covariates
else None,
start=start,
last_points_only=last_points_only,
stride=stride,
forecast_horizon=forecast_horizon,
)
)
# pack the output to generalize the tests
if last_points_only:
hist_fct = [hist_fct]
opti_hist_fct = [opti_hist_fct]

for fct, opti_fct in zip(hist_fct, opti_hist_fct):
self.assertTrue(
(
fct.time_index == opti_fct.time_index
).all()
)
np.testing.assert_array_almost_equal(
fct.all_values(), opti_fct.all_values()
)

def test_optimized_historical_forecasts_regression_with_encoders(self):
for use_covs in [False, True]:
series_train, series_val = self.ts_pass_train, self.ts_pass_val
model = LinearRegressionModel(
lags=3,
lags_past_covariates=2,
lags_future_covariates=[2, 3],
add_encoders={
"cyclic": {"future": ["month"]},
"datetime_attribute": {"past": ["dayofweek"]},
},
output_chunk_length=5,
)
if use_covs:
pc = tg.gaussian_timeseries(
start=series_train.start_time() - 2 * series_train.freq,
end=series_val.end_time(),
freq=series_train.freq,
)
fc = tg.gaussian_timeseries(
start=series_train.start_time() + 3 * series_train.freq,
end=series_val.end_time() + 4 * series_train.freq,
freq=series_train.freq,
)
else:
pc, fc = None, None

model.fit(self.ts_pass_train, past_covariates=pc, future_covariates=fc)

hist_fct = model.historical_forecasts(
series=self.ts_pass_val,
past_covariates=pc,
future_covariates=fc,
retrain=False,
last_points_only=True,
forecast_horizon=5,
enable_optimization=False,
)

opti_hist_fct = model._optimized_historical_forecasts(
series=[self.ts_pass_val],
past_covariates=[pc],
future_covariates=[fc],
last_points_only=True,
forecast_horizon=5,
)

self.assertTrue((hist_fct.time_index == opti_hist_fct.time_index).all())
np.testing.assert_array_almost_equal(
hist_fct.all_values(), opti_hist_fct.all_values()
)

@pytest.mark.slow
@unittest.skipUnless(
TORCH_AVAILABLE,
Expand Down Expand Up @@ -1221,11 +1372,13 @@ def retrain_f_delayed_true(

# test int
helper_hist_forecasts(10, 0.9)
expected_msg = "Model has not been fit before the first predict iteration at prediction point (in time)"
expected_msg = "Model has not been fit yet."
# `retrain=0` with not-trained model, encountering directly a predictable time index
with pytest.raises(ValueError) as error_msg:
helper_hist_forecasts(0, 0.9)
self.assertTrue(str(error_msg.value).startswith(expected_msg))
self.assertTrue(
str(error_msg.value).startswith(expected_msg), str(error_msg.value)
)

# test bool
helper_hist_forecasts(True, 0.9)
Expand All @@ -1251,7 +1404,9 @@ def test_predict_likelihood_parameters(self):
"""standard checks that historical forecasts work with direct likelihood parameter predictions
with regression and torch models."""

def create_model(ocl, use_ll=True, model_type="regression"):
def create_model(
ocl, use_ll=True, model_type="regression"
) -> Union[LinearRegressionModel, NLinearModel]:
if model_type == "regression":
return LinearRegressionModel(
lags=3,
Expand Down
1 change: 0 additions & 1 deletion darts/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
"""
from .utils import (
_build_tqdm_iterator,
_historical_forecasts_general_checks,
_parallel_apply,
_with_sanity_checks,
retain_period_common_to_all,
Expand Down
9 changes: 9 additions & 0 deletions darts/utils/historical_forecasts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from .optimized_historical_forecasts import (
_optimized_historical_forecasts_regression_all_points,
_optimized_historical_forecasts_regression_last_points_only,
)
from .utils import (
_get_historical_forecast_boundaries,
_historical_forecasts_general_checks,
_historical_forecasts_start_warnings,
)
Loading
Loading