[BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 #974

TamerAbdelmigid · 2022-05-26T12:07:05Z

Describe the bug
There is a problem with RegressionModel and timeseries that have range index that does not start at one.
When trying to use any method (predict, historical_forecasts, backtest) I get errors, which goes away when I create a pandas date range index.

To Reproduce

series1 = TimeSeries.from_dataframe(df1)
series2= TimeSeries.from_dataframe(df2)
series0= TimeSeries.from_dataframe(df0)
future_cov = series1.concatenate(series2, axis=1)

scaler1 = Scaler()

idx = int(len(series0) * 0.9)
train, val = series0[:idx], series0[idx:]
train_scaled = scaler1.fit_transform(train).astype(np.float32)
val_scaled = scaler1.transform(val).astype(np.float32)
series_scaled = train_scaled.concatenate(val_scaled)

fcov_train, fcov_val = future_cov[:idx], future_cov[idx:]
fcov_train_scaled = scaler1.fit_transform(fcov_train).astype(np.float32)
fcov_val_scaled = scaler1.transform(fcov_val).astype(np.float32)
fcov_series_scaled = fcov_train_scaled.concatenate(fcov_val_scaled)

ensemble_model = LinearRegressionModel(
    lags=15, lags_lags_future_covariates=(15, 15), output_chunk_length=1,
)

ensemble_model.fit(series=train_scaled, future_covariates=fcov_series_scaled)

ensemble_model.backtest(
    series=series_scaled,
    future_covariates=fcov_series_scaled,
    start=0.9,
    forecast_horizon=15,
    stride=15,
    retrain=False,
    verbose=True,
    metric=r2_score,
)

pred_series = ensemble_model.historical_forecasts(
    series=series_scaled,
    future_covariates=fcov_series_scaled,
    start=idx,
    forecast_horizon=15,
    stride=15,
    retrain=False,
    verbose=True,
    overlap_end=False,
    last_points_only=False,
)

ensemble_model.predict(n=15, series=train_scaled, future_covariates=fcov_series_scaled )

Expected behavior
To work normally and not to produce any errors

System (please complete the following information):

Python version: [e.g. 3.8]
darts version [e.g. 0.19.0]

Additional context
series 0, series 1, series 2 shape = (360090, 1)

error in case of backtest and historical_forecasts:

[2022-05-26 13:47:53,106] ERROR | darts.timeseries | ValueError: point (int) should be a valid index in series
2022-05-26 13:47:53 darts.timeseries ERROR: ValueError: point (int) should be a valid index in series
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13184/3025966321.py in <module>
----> 1 pred_series = ensemble_model.historical_forecasts(
      2     series=series_scaled,
      3     future_covariates=fcov_series_scaled,
      4     start=idx,
      5     forecast_horizon=15,

~\anaconda3\envs\GPU\lib\site-packages\darts\utils\utils.py in sanitized_method(self, *args, **kwargs)
    170 
    171                 getattr(self, sanity_check_method)(*only_args.values(), **only_kwargs)
--> 172             return method_to_sanitize(self, *only_args.values(), **only_kwargs)
    173 
    174         return sanitized_method

~\anaconda3\envs\GPU\lib\site-packages\darts\models\forecasting\forecasting_model.py in historical_forecasts(self, series, past_covariates, future_covariates, num_samples, train_length, start, forecast_horizon, stride, retrain, overlap_end, last_points_only, verbose)
    430         for pred_time in iterator:
    431             # build the training series
--> 432             train = series.drop_after(pred_time)
    433             if train_length and len(train) > train_length:
    434                 train = train[-train_length:]

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in drop_after(self, split_point)
   1541             A new TimeSeries, after `ts`.
   1542         """
-> 1543         return self.split_before(split_point)[0]
   1544 
   1545     def drop_before(self, split_point: Union[pd.Timestamp, float, int]):

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in split_before(self, split_point)
   1524             and the second contains the remaining ones.
   1525         """
-> 1526         return self._split_at(split_point, after=False)
   1527 
   1528     def drop_after(self, split_point: Union[pd.Timestamp, float, int]):

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in _split_at(self, split_point, after)
   1474     ) -> Tuple["TimeSeries", "TimeSeries"]:
   1475 
-> 1476         point_index = self.get_index_at_point(split_point, after)
   1477         return (
   1478             self[: point_index + (1 if after else 0)],

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in get_index_at_point(self, point, after)
   1420             point_index = int((len(self) - 1) * point)
   1421         elif isinstance(point, (int, np.int64)):
-> 1422             raise_if(
   1423                 point not in range(len(self)),
   1424                 "point (int) should be a valid index in series",

~\anaconda3\envs\GPU\lib\site-packages\darts\logging.py in raise_if(condition, message, logger)
    108         if `condition` is satisfied
    109     """
--> 110     raise_if_not(not condition, message, logger)
    111 
    112 

~\anaconda3\envs\GPU\lib\site-packages\darts\logging.py in raise_if_not(condition, message, logger)
     82     if not condition:
     83         logger.error("ValueError: " + message)
---> 84         raise ValueError(message)
     85 
     86 

ValueError: point (int) should be a valid index in series

Error in case of predict:

[2022-05-26 13:49:26,639] ERROR | darts.timeseries | ValueError: The time series array must not be empty.
2022-05-26 13:49:26 darts.timeseries ERROR: ValueError: The time series array must not be empty.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13184/393503889.py in <module>
----> 1 ensemble_model.predict(n=15, series=train_scaled, future_covariates=fcov_series_scaled)

~\anaconda3\envs\GPU\lib\site-packages\darts\models\forecasting\regression_model.py in predict(self, n, series, past_covariates, future_covariates, num_samples, **kwargs)
    555                         # include last_req_ts when slicing series with integer indices
    556                         covariate_matrices[cov_type].append(
--> 557                             cov[first_req_ts : last_req_ts + 1].values()
    558                         )
    559 

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in __getitem__(self, key)
   3282                     xa_
   3283                 )  # indexing may discard the freq so we restore it...
-> 3284                 return self.__class__(xa_)
   3285             elif isinstance(key.start, pd.Timestamp) or isinstance(
   3286                 key.stop, pd.Timestamp

~\anaconda3\envs\GPU\lib\site-packages\darts\timeseries.py in __init__(self, xa)
     75             logger,
     76         )
---> 77         raise_if_not(xa.size > 0, "The time series array must not be empty.", logger)
     78         raise_if_not(
     79             len(xa.shape) == 3,

~\anaconda3\envs\GPU\lib\site-packages\darts\logging.py in raise_if_not(condition, message, logger)
     82     if not condition:
     83         logger.error("ValueError: " + message)
---> 84         raise ValueError(message)
     85 
     86 

ValueError: The time series array must not be empty.

The text was updated successfully, but these errors were encountered:

hrzn · 2022-05-27T07:06:50Z

Could you try not splitting your future covariates in training and validation? Keep the full-length series and try again. Darts will slice it for you. I suspect this error might be due to a covariate series that's too short.

TamerAbdelmigid · 2022-08-30T17:45:38Z

Error persist in Darts Version 0.21

The problem exists with historical_forecasts() it gives an error
point (int) should be a valid index in series
the problem is encountered regardless of the model used, it happens when the series to be backtested has a range index of type int64 and starts with any number other that 0.

I fixed it by reindexing the series to start with 0, but this negate the reason why index is used in the first place. My current model is TFT and it autoencode the index and use it, wouldn't re-indexing mess up the results?

P.S. last time I fixed it by re-indexing from 0.

hrzn · 2022-08-31T13:21:38Z

Hi, I suspect that your series starting at 1 does not contain the necessary time steps (its last time step is too early).
Could you share a minimal code snippet to reproduce the issue?

TamerAbdelmigid · 2022-08-31T18:58:05Z

Hey @hrzn, sadly that's not the case here.
Here is a minimum code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from darts import TimeSeries
from darts.metrics import mape, smape
from darts.dataprocessing.transformers import Scaler
from darts.models import TCNModel

start = 10
data = np.random.random(1000)
index = pd.RangeIndex(start=start, stop=len(data)+start, step=1)
columns = ['random series']
df = pd.DataFrame(data=data, index=index, columns=columns, dtype=np.float32)


series = TimeSeries.from_dataframe(df)
model = TCNModel(input_chunk_length=12, output_chunk_length=6)
model.fit(series, verbose=True, epochs=10)

model.historical_forecasts(series=series, start=0.5, forecast_horizon=model.output_chunk_length, stride=model.output_chunk_length, retrain=False, verbose=True)

Running this code will give you an error:
2022-08-31 20:43:01 darts.timeseries ERROR: ValueError: point (int) should be a valid index in series

but change the value of the variable start to be 0, everything runs smoothly.

==============================================================================

Additionally, I noticed weird behavior:

1 - when you set verbose=False in model fit, and start=0. I get an error
ValueError: conflicting sizes for dimension 'time': length 83 on the data but length 493 on coordinate 'time'
this does not happens when verbose=True in the model fit.

2 - when I set verbose=True in model fit, and verbose = False in historical_forecasts, it shows progress bars anyway.

shall I open different tickets for these errors with the same code?

hrzn · 2022-09-01T09:18:03Z

Many thanks! That's an issue, I'll fix it.

TamerAbdelmigid · 2022-09-01T11:17:32Z

@hrzn Thanks a lot for your efforts.

TamerAbdelmigid added bug Something isn't working triage Issue waiting for triaging labels May 26, 2022

TamerAbdelmigid closed this as completed Jun 2, 2022

TamerAbdelmigid reopened this Aug 30, 2022

TamerAbdelmigid changed the title ~~[BUG] RegressionModel can't handle timeseries with rangeindex that doesn't start at 0~~ [BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 Aug 30, 2022

TamerAbdelmigid closed this as completed Sep 1, 2022

hrzn reopened this Sep 2, 2022

hrzn removed the triage Issue waiting for triaging label Sep 2, 2022

hrzn self-assigned this Sep 2, 2022

hrzn mentioned this issue Sep 2, 2022

Fix/issue with timestamp at point #1180

Closed

hrzn mentioned this issue Sep 6, 2022

Feat/improve integer indexes #1191

Merged

hrzn closed this as completed Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 #974

[BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 #974

TamerAbdelmigid commented May 26, 2022

hrzn commented May 27, 2022

TamerAbdelmigid commented Aug 30, 2022 •

edited

Loading

hrzn commented Aug 31, 2022

TamerAbdelmigid commented Aug 31, 2022

hrzn commented Sep 1, 2022

TamerAbdelmigid commented Sep 1, 2022

[BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 #974

[BUG] historical_forecasts() can't handle timeseries with rangeindex that doesn't start at 0 #974

Comments

TamerAbdelmigid commented May 26, 2022

hrzn commented May 27, 2022

TamerAbdelmigid commented Aug 30, 2022 • edited Loading

hrzn commented Aug 31, 2022

TamerAbdelmigid commented Aug 31, 2022

Additionally, I noticed weird behavior:

hrzn commented Sep 1, 2022

TamerAbdelmigid commented Sep 1, 2022

TamerAbdelmigid commented Aug 30, 2022 •

edited

Loading