Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should forecast(*, newdata) produce forecasts outside the range of newdata? #202

Closed
hongooi73 opened this issue May 22, 2020 · 7 comments
Closed

Comments

@hongooi73
Copy link
Contributor

Something I just came across:

library(fable)
library(fabletools)
library(tsibbledata)

aus_retail_2013_tr <- aus_retail %>%
    filter(Month <= yearmonth("2013 Dec"))
aus_retail_2013_vl <- aus_retail %>%
    filter(Month > yearmonth("2013 Dec"))

mods_2013 <- model(aus_retail_2013_tr,
    sdrift=SNAIVE(log(Turnover) ~ drift())
)

qll <- filter(mods_2013, State == "Queensland", Industry == "Liquor retailing")
qll_fcasts_2013 <- forecast(qll, new_data=aus_retail_2013_vl)
range(qll_fcasts_2013$Month)
[1] "2010 Mar" "2012 Feb"

Why is the forecast only for 2010 to 2012, when the dataset provided is from 2013 to 2018? Is this intended?

@hongooi73
Copy link
Contributor Author

Session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.0     future_1.17.0     fable_0.2.0       feasts_0.1.3     
[5] fabletools_0.1.3  tsibble_0.8.6     tsibbledata_0.1.0 dplyr_0.8.5      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6       pillar_1.4.4       compiler_3.6.3     base64enc_0.1-3   
 [5] tools_3.6.3        digest_0.6.25      jsonlite_1.6.1     lubridate_1.7.8   
 [9] evaluate_0.14      lifecycle_0.2.0    tibble_3.0.1       gtable_0.3.0      
[13] anytime_0.3.7      pkgconfig_2.0.3    rlang_0.4.6        cli_2.0.2         
[17] parallel_3.6.3     yaml_2.2.1         xfun_0.14          withr_2.2.0       
[21] stringr_1.4.0      knitr_1.28         globals_0.12.5     generics_0.0.2    
[25] vctrs_0.3.0        grid_3.6.3         tidyselect_1.1.0   glue_1.4.1        
[29] listenv_0.8.0      R6_2.4.1           fansi_0.4.1        future.apply_1.5.0
[33] rmarkdown_2.1      farver_2.0.3       purrr_0.3.4        tidyr_1.0.3       
[37] magrittr_1.5       codetools_0.2-16   scales_1.1.1       ellipsis_0.3.1    
[41] htmltools_0.4.0    assertthat_0.2.1   colorspace_1.4-1   labeling_0.3      
[45] utf8_1.1.4         stringi_1.4.6      munsell_0.5.0      crayon_1.3.4   

@mitchelloharawild
Copy link
Member

This is a bug, but the correct result should be an empty fable (but it is defaulting to using h=2*m as new_data is being passed as NULL instead of an empty tsibble).

This is due to the State=="Queensland", Industry == "Liquor Retailing" ending in 2010 Feb. The new_data that you have provided includes no values of this series, and in effect no future values should be forecasted.

As the series in aus_retail have different lengths and end times, the creation of the training dataset needs to be done with more care. Perhaps using slice() with group_by_key() to select all but the last few observations. Alternatively, you could drop the series which have been discontinued.

Additionally, if you wanted to forecast a specific period (say 2013:2018), you can pass this in via new_data as you have done. You'll probably need to create this future tsibble yourself though. However as forecast.ETS does not yet support discontiguous forecast ranges, series that end early should probably error..

@hongooi73
Copy link
Contributor Author

Thanks for the info @mitchelloharawild. Instead of failing on a discontiguous forecast range, I'd suggest that forecast should extend the forecast values to cover the gap (and then clamp the actual output to only the desired range).

@mitchelloharawild
Copy link
Member

Yes, ideally all forecast() methods can return forecasts for the desired range. However I think this is to be implemented by each forecast method. As it is not yet implemented by forecast.ETS, it should error.

This is because some forecast methods will not need to forecast the gap, and instead can directly make predictions about any future value.

@hongooi73
Copy link
Contributor Author

Out of interest, do you know WHY this particular time series ends in 2010? You can still buy booze in Qld....

@mitchelloharawild
Copy link
Member

As per the docs, the source of aus_retail is:

Australian Bureau of Statistics, catalogue number 8501.0, table 11.

At the time of download, some series were incomplete. A copy of the data with code to create aus_retail can be found here: https://github.com/tidyverts/tsibbledata/tree/master/data-raw/aus_retail
As for why it is incomplete, 🤷. However it looks like the latest release of this catalogue has more complete data, so I'll update the dataset in the next release of tsibbledata.

@mitchelloharawild
Copy link
Member

Actually, looks like the updated catalogue doesn't introduce more data for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants