-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
对于训练过程,我有一些疑惑 #23
Comments
Thank you for your attention in our work. We publish the threshold because we used to conduct rolling tests (train on 10 Qs, validate on 1 Q, then test on the next Q, repeatedly) and we refer to IC to help decide the threshold, which finally come to an empirical sweet point bringing profit. In the industry, some people believe such rolling tests are more practical and faithful because we validate for a prolonged period. If you don't buy in any empirical threshold, you can train and compare MASTER with other models in the way you like, although it may not bring out the best of our model. It is worth mentioning that stock data is nothing like other ML domains such as CV. And, currently, empirical tuning is very important for stock data. For most ML stock price forecasting models, metrics like IC, AR, and MSE loss will not move in parallel and the performances of different test time spans dramatically change, which is quite different from some domains that have smooth learning curves and clear converge stages. The inherent reason is that the task is very difficult. Well, for many other domains, only IC>0.7 can be considered a strong model, but for stock, maybe IC=0.1 will already make you very rich :>. People attempt to gain a margin of performance instead of revealing the mechanism of finance. Given that MSE=0 corresponds to an oracle model that somehow stimulates the entire stock market during decades, MSE around 0.95 is not that irrational. It will be a sophisticated research question to discuss how to train and evaluate the ML models fairly on stock data. Often the case, the single chronologically split validation set is not an ideal reflection of the model's performance on the test because of the inherent differences in data distribution. I would like to raise an even more interesting question that a model may "expire" in the stock domain because more and more people are using it for investment, then they will compete and eventually affect the market. So, how to fairly train and evaluate a stock price forecasting model, and is the latest profits the golden metric? I think many questions are to be discussed in this emerging domain of quantitative investment, but It is really beyond the scope of our paper. |
很感谢你的回答,对于滚动训练的问题,这毫无疑问是目前行业的合理选择。尽管这个阈值可以通过滚动训练的方式得出经验,但是这种模型选择方式同样可以根据每个epoch的测试集表现来选择,所以我认为这仍不是一个合理的选择方式。尽管在股票领域中存在样本外误差,使用验证集loss不那么有效,但我仍认为这是一种更公平的做法。 此外,这种问题通常不只是分布误差的问题,训练指标mse和主要测评指标ic的优化方向也是不完全一致的,可能也会带来模型选择的问题。 此外,SJTU-DMTai库中的StockMixer的文章,其代码实现的RIC其实是IR,这点应该是一个显著错误,如果你认识对方,也可以提醒他进行修改。 |
I would be very excited if other people like MASTER and want to test it under other settings. The experiment part of our paper only provides a set of measures under controlled and limited conditions. MASTER focuses on designing new architectures that model 1) realistic correlation, and 2) market influence that previous models cannot. Our experiments (especially those besides the Overall Comparison) targeted to discuss the effectiveness of designs for those two aspects. I think all the other questions about data selection, potential temporal distribution drifts, etc., are very important and need to be carefully studied, but personally, I did not dive into them, and now I no longer work with stock data. Therefore, please understand I am not confident in answering them, |
很感谢你们的论文和代码分享,我正是因为对你们的工作非常感兴趣才会认真复现你们的代码。对于上述的问题,有可能是我时间仓促下的误解,如果错误感谢指出。在我的简单对比下,你们的模型表现在csi300中是明显好于GRU的,效果还是让人兴奋的。我希望我们都能设置更公平的训练、评测方式,推动领域的高质量发展。
The text was updated successfully, but these errors were encountered: