Should forecast(*, newdata) produce forecasts outside the range of newdata? #202

hongooi73 · 2020-05-22T01:10:55Z

Something I just came across:

library(fable)
library(fabletools)
library(tsibbledata)

aus_retail_2013_tr <- aus_retail %>%
    filter(Month <= yearmonth("2013 Dec"))
aus_retail_2013_vl <- aus_retail %>%
    filter(Month > yearmonth("2013 Dec"))

mods_2013 <- model(aus_retail_2013_tr,
    sdrift=SNAIVE(log(Turnover) ~ drift())
)

qll <- filter(mods_2013, State == "Queensland", Industry == "Liquor retailing")
qll_fcasts_2013 <- forecast(qll, new_data=aus_retail_2013_vl)
range(qll_fcasts_2013$Month)

[1] "2010 Mar" "2012 Feb"

Why is the forecast only for 2010 to 2012, when the dataset provided is from 2013 to 2018? Is this intended?

The text was updated successfully, but these errors were encountered:

hongooi73 · 2020-05-22T01:21:13Z

Session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.0     future_1.17.0     fable_0.2.0       feasts_0.1.3     
[5] fabletools_0.1.3  tsibble_0.8.6     tsibbledata_0.1.0 dplyr_0.8.5      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6       pillar_1.4.4       compiler_3.6.3     base64enc_0.1-3   
 [5] tools_3.6.3        digest_0.6.25      jsonlite_1.6.1     lubridate_1.7.8   
 [9] evaluate_0.14      lifecycle_0.2.0    tibble_3.0.1       gtable_0.3.0      
[13] anytime_0.3.7      pkgconfig_2.0.3    rlang_0.4.6        cli_2.0.2         
[17] parallel_3.6.3     yaml_2.2.1         xfun_0.14          withr_2.2.0       
[21] stringr_1.4.0      knitr_1.28         globals_0.12.5     generics_0.0.2    
[25] vctrs_0.3.0        grid_3.6.3         tidyselect_1.1.0   glue_1.4.1        
[29] listenv_0.8.0      R6_2.4.1           fansi_0.4.1        future.apply_1.5.0
[33] rmarkdown_2.1      farver_2.0.3       purrr_0.3.4        tidyr_1.0.3       
[37] magrittr_1.5       codetools_0.2-16   scales_1.1.1       ellipsis_0.3.1    
[41] htmltools_0.4.0    assertthat_0.2.1   colorspace_1.4-1   labeling_0.3      
[45] utf8_1.1.4         stringi_1.4.6      munsell_0.5.0      crayon_1.3.4

mitchelloharawild · 2020-05-22T01:28:47Z

This is a bug, but the correct result should be an empty fable (but it is defaulting to using h=2*m as new_data is being passed as NULL instead of an empty tsibble).

This is due to the State=="Queensland", Industry == "Liquor Retailing" ending in 2010 Feb. The new_data that you have provided includes no values of this series, and in effect no future values should be forecasted.

As the series in aus_retail have different lengths and end times, the creation of the training dataset needs to be done with more care. Perhaps using slice() with group_by_key() to select all but the last few observations. Alternatively, you could drop the series which have been discontinued.

Additionally, if you wanted to forecast a specific period (say 2013:2018), you can pass this in via new_data as you have done. You'll probably need to create this future tsibble yourself though. However as forecast.ETS does not yet support discontiguous forecast ranges, series that end early should probably error..

hongooi73 · 2020-05-22T01:32:15Z

Thanks for the info @mitchelloharawild. Instead of failing on a discontiguous forecast range, I'd suggest that forecast should extend the forecast values to cover the gap (and then clamp the actual output to only the desired range).

mitchelloharawild · 2020-05-22T01:42:15Z

Yes, ideally all forecast() methods can return forecasts for the desired range. However I think this is to be implemented by each forecast method. As it is not yet implemented by forecast.ETS, it should error.

This is because some forecast methods will not need to forecast the gap, and instead can directly make predictions about any future value.

hongooi73 · 2020-05-22T01:52:04Z

Out of interest, do you know WHY this particular time series ends in 2010? You can still buy booze in Qld....

mitchelloharawild · 2020-05-22T02:06:38Z

As per the docs, the source of aus_retail is:

Australian Bureau of Statistics, catalogue number 8501.0, table 11.

At the time of download, some series were incomplete. A copy of the data with code to create aus_retail can be found here: https://github.com/tidyverts/tsibbledata/tree/master/data-raw/aus_retail
As for why it is incomplete, 🤷. However it looks like the latest release of this catalogue has more complete data, so I'll update the dataset in the next release of tsibbledata.

mitchelloharawild · 2020-05-22T02:10:10Z

Actually, looks like the updated catalogue doesn't introduce more data for this.

mitchelloharawild closed this as completed in 01e9c9e May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should forecast(*, newdata) produce forecasts outside the range of newdata? #202

Should forecast(*, newdata) produce forecasts outside the range of newdata? #202

hongooi73 commented May 22, 2020

hongooi73 commented May 22, 2020

mitchelloharawild commented May 22, 2020

hongooi73 commented May 22, 2020

mitchelloharawild commented May 22, 2020

hongooi73 commented May 22, 2020

mitchelloharawild commented May 22, 2020

mitchelloharawild commented May 22, 2020

Should forecast(*, newdata) produce forecasts outside the range of newdata? #202

Should forecast(*, newdata) produce forecasts outside the range of newdata? #202

Comments

hongooi73 commented May 22, 2020

hongooi73 commented May 22, 2020

mitchelloharawild commented May 22, 2020

hongooi73 commented May 22, 2020

mitchelloharawild commented May 22, 2020

hongooi73 commented May 22, 2020

mitchelloharawild commented May 22, 2020

mitchelloharawild commented May 22, 2020