Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 #974

Closed
TamerAbdelmigid opened this issue May 26, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@TamerAbdelmigid
Copy link

Describe the bug
There is a problem with RegressionModel and timeseries that have range index that does not start at one.
When trying to use any method (predict, historical_forecasts, backtest) I get errors, which goes away when I create a pandas date range index.

To Reproduce

series1 = TimeSeries.from_dataframe(df1)
series2= TimeSeries.from_dataframe(df2)
series0= TimeSeries.from_dataframe(df0)
future_cov = series1.concatenate(series2, axis=1)

scaler1 = Scaler()

idx = int(len(series0) * 0.9)
train, val = series0[:idx], series0[idx:]
train_scaled = scaler1.fit_transform(train).astype(np.float32)
val_scaled = scaler1.transform(val).astype(np.float32)
series_scaled = train_scaled.concatenate(val_scaled)

fcov_train, fcov_val = future_cov[:idx], future_cov[idx:]
fcov_train_scaled = scaler1.fit_transform(fcov_train).astype(np.float32)
fcov_val_scaled = scaler1.transform(fcov_val).astype(np.float32)
fcov_series_scaled = fcov_train_scaled.concatenate(fcov_val_scaled)

ensemble_model = LinearRegressionModel(
    lags=15, lags_lags_future_covariates=(15, 15), output_chunk_length=1,
)

ensemble_model.fit(series=train_scaled, future_covariates=fcov_series_scaled)

ensemble_model.backtest(
    series=series_scaled,
    future_covariates=fcov_series_scaled,
    start=0.9,
    forecast_horizon=15,
    stride=15,
    retrain=False,
    verbose=True,
    metric=r2_score,
)

pred_series = ensemble_model.historical_forecasts(
    series=series_scaled,
    future_covariates=fcov_series_scaled,
    start=idx,
    forecast_horizon=15,
    stride=15,
    retrain=False,
    verbose=True,
    overlap_end=False,
    last_points_only=False,
)

ensemble_model.predict(n=15, series=train_scaled, future_covariates=fcov_series_scaled )

Expected behavior
To work normally and not to produce any errors

System (please complete the following information):

  • Python version: [e.g. 3.8]
  • darts version [e.g. 0.19.0]

Additional context
series 0, series 1, series 2 shape = (360090, 1)

error in case of backtest and historical_forecasts:

[2022-05-26 13:47:53,106] ERROR | darts.timeseries | ValueError: point (int) should be a valid index in series
2022-05-26 13:47:53 darts.timeseries ERROR: ValueError: point (int) should be a valid index in series
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13184/3025966321.py in <module>
----> 1 pred_series = ensemble_model.historical_forecasts(
      2     series=series_scaled,
      3     future_covariates=fcov_series_scaled,
      4     start=idx,
      5     forecast_horizon=15,

~\anaconda3\envs\GPU\lib\site-packages\darts\utils\utils.py in sanitized_method(self, *args, **kwargs)
    170 
    171                 getattr(self, sanity_check_method)(*only_args.values(), **only_kwargs)
--> 172             return method_to_sanitize(self, *only_args.values(), **only_kwargs)
    173 
    174         return sanitized_method

~\anaconda3\envs\GPU\lib\site-packages\darts\models\forecasting\forecasting_model.py in historical_forecasts(self, series, past_covariates, future_covariates, num_samples, train_length, start, forecast_horizon, stride, retrain, overlap_end, last_points_only, verbose)
    430         for pred_time in iterator:
    431             # build the training series
--> 432             train = series.drop_after(pred_time)
    433             if train_length and len(train) > train_length:
    434                 train = train[-train_length:]

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in drop_after(self, split_point)
   1541             A new TimeSeries, after `ts`.
   1542         """
-> 1543         return self.split_before(split_point)[0]
   1544 
   1545     def drop_before(self, split_point: Union[pd.Timestamp, float, int]):

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in split_before(self, split_point)
   1524             and the second contains the remaining ones.
   1525         """
-> 1526         return self._split_at(split_point, after=False)
   1527 
   1528     def drop_after(self, split_point: Union[pd.Timestamp, float, int]):

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in _split_at(self, split_point, after)
   1474     ) -> Tuple["TimeSeries", "TimeSeries"]:
   1475 
-> 1476         point_index = self.get_index_at_point(split_point, after)
   1477         return (
   1478             self[: point_index + (1 if after else 0)],

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in get_index_at_point(self, point, after)
   1420             point_index = int((len(self) - 1) * point)
   1421         elif isinstance(point, (int, np.int64)):
-> 1422             raise_if(
   1423                 point not in range(len(self)),
   1424                 "point (int) should be a valid index in series",

~\anaconda3\envs\GPU\lib\site-packages\darts\logging.py in raise_if(condition, message, logger)
    108         if `condition` is satisfied
    109     """
--> 110     raise_if_not(not condition, message, logger)
    111 
    112 

~\anaconda3\envs\GPU\lib\site-packages\darts\logging.py in raise_if_not(condition, message, logger)
     82     if not condition:
     83         logger.error("ValueError: " + message)
---> 84         raise ValueError(message)
     85 
     86 

ValueError: point (int) should be a valid index in series

Error in case of predict:

[2022-05-26 13:49:26,639] ERROR | darts.timeseries | ValueError: The time series array must not be empty.
2022-05-26 13:49:26 darts.timeseries ERROR: ValueError: The time series array must not be empty.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13184/393503889.py in <module>
----> 1 ensemble_model.predict(n=15, series=train_scaled, future_covariates=fcov_series_scaled)

~\anaconda3\envs\GPU\lib\site-packages\darts\models\forecasting\regression_model.py in predict(self, n, series, past_covariates, future_covariates, num_samples, **kwargs)
    555                         # include last_req_ts when slicing series with integer indices
    556                         covariate_matrices[cov_type].append(
--> 557                             cov[first_req_ts : last_req_ts + 1].values()
    558                         )
    559 

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in __getitem__(self, key)
   3282                     xa_
   3283                 )  # indexing may discard the freq so we restore it...
-> 3284                 return self.__class__(xa_)
   3285             elif isinstance(key.start, pd.Timestamp) or isinstance(
   3286                 key.stop, pd.Timestamp

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in __init__(self, xa)
     75             logger,
     76         )
---> 77         raise_if_not(xa.size > 0, "The time series array must not be empty.", logger)
     78         raise_if_not(
     79             len(xa.shape) == 3,

~\anaconda3\envs\GPU\lib\site-packages\darts\logging.py in raise_if_not(condition, message, logger)
     82     if not condition:
     83         logger.error("ValueError: " + message)
---> 84         raise ValueError(message)
     85 
     86 

ValueError: The time series array must not be empty.
@TamerAbdelmigid TamerAbdelmigid added bug Something isn't working triage Issue waiting for triaging labels May 26, 2022
@hrzn
Copy link
Contributor

hrzn commented May 27, 2022

Could you try not splitting your future covariates in training and validation? Keep the full-length series and try again. Darts will slice it for you. I suspect this error might be due to a covariate series that's too short.

@TamerAbdelmigid TamerAbdelmigid changed the title [BUG] RegressionModel can't handle timeseries with rangeindex that doesn't start at 0 [BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 Aug 30, 2022
@TamerAbdelmigid
Copy link
Author

TamerAbdelmigid commented Aug 30, 2022

Error persist in Darts Version 0.21

The problem exists with historical_forecasts() it gives an error
point (int) should be a valid index in series
the problem is encountered regardless of the model used, it happens when the series to be backtested has a range index of type int64 and starts with any number other that 0.

I fixed it by reindexing the series to start with 0, but this negate the reason why index is used in the first place. My current model is TFT and it autoencode the index and use it, wouldn't re-indexing mess up the results?

P.S. last time I fixed it by re-indexing from 0.

@hrzn
Copy link
Contributor

hrzn commented Aug 31, 2022

Hi, I suspect that your series starting at 1 does not contain the necessary time steps (its last time step is too early).
Could you share a minimal code snippet to reproduce the issue?

@TamerAbdelmigid
Copy link
Author

Hey @hrzn, sadly that's not the case here.
Here is a minimum code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from darts import TimeSeries
from darts.metrics import mape, smape
from darts.dataprocessing.transformers import Scaler
from darts.models import TCNModel

start = 10
data = np.random.random(1000)
index = pd.RangeIndex(start=start, stop=len(data)+start, step=1)
columns = ['random series']
df = pd.DataFrame(data=data, index=index, columns=columns, dtype=np.float32)


series = TimeSeries.from_dataframe(df)
model = TCNModel(input_chunk_length=12, output_chunk_length=6)
model.fit(series, verbose=True, epochs=10)

model.historical_forecasts(series=series, start=0.5, forecast_horizon=model.output_chunk_length, stride=model.output_chunk_length, retrain=False, verbose=True)

Running this code will give you an error:
2022-08-31 20:43:01 darts.timeseries ERROR: ValueError: point (int) should be a valid index in series

but change the value of the variable start to be 0, everything runs smoothly.

==============================================================================

Additionally, I noticed weird behavior:

1 - when you set verbose=False in model fit, and start=0. I get an error
ValueError: conflicting sizes for dimension 'time': length 83 on the data but length 493 on coordinate 'time'
this does not happens when verbose=True in the model fit.

2 - when I set verbose=True in model fit, and verbose = False in historical_forecasts, it shows progress bars anyway.

shall I open different tickets for these errors with the same code?

@hrzn
Copy link
Contributor

hrzn commented Sep 1, 2022

Many thanks! That's an issue, I'll fix it.

@TamerAbdelmigid
Copy link
Author

@hrzn Thanks a lot for your efforts.

@hrzn hrzn reopened this Sep 2, 2022
@hrzn hrzn removed the triage Issue waiting for triaging label Sep 2, 2022
@hrzn hrzn self-assigned this Sep 2, 2022
@hrzn hrzn closed this as completed Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants