You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, dear authors, I appreciate your thorough work on exploring design choices of time series foundation models. However, noticing the significant improvement in comparison with baseline models, I doubt that the setting of TS prompt might influence the validity of the experiment results.
Quite often, time series forecasting models use the statistics of train set (means and standard deviations) to normalize inputs. Using statistics of the validation set and test set is considered to bring about information leaks. A recent work on NIPS 2024 has made a similar mistake, mentioned in ForestsKing/GLAFF#5 and other posts in their github repo issues.
In this work, TS prompts (manually extracted features) are computed on the whole validation set and test set, which provide models with information of whole val/test set in evaluation.
(see promt_generation() in ltsm/prompt_bank/stat-prompt
/prompt_generate_split.py)
It would be very appreciated if the authors could provide experiment results to address this concern. E.g. using TS prompts over a past period instead of the whole val/test set.
The text was updated successfully, but these errors were encountered:
Hi, dear authors, I appreciate your thorough work on exploring design choices of time series foundation models. However, noticing the significant improvement in comparison with baseline models, I doubt that the setting of TS prompt might influence the validity of the experiment results.
Quite often, time series forecasting models use the statistics of train set (means and standard deviations) to normalize inputs. Using statistics of the validation set and test set is considered to bring about information leaks. A recent work on NIPS 2024 has made a similar mistake, mentioned in ForestsKing/GLAFF#5 and other posts in their github repo issues.
In this work, TS prompts (manually extracted features) are computed on the whole validation set and test set, which provide models with information of whole val/test set in evaluation.
(see promt_generation() in ltsm/prompt_bank/stat-prompt
/prompt_generate_split.py)
It would be very appreciated if the authors could provide experiment results to address this concern. E.g. using TS prompts over a past period instead of the whole val/test set.
The text was updated successfully, but these errors were encountered: