对于训练过程，我有一些疑惑 #23

nnstake · 2024-12-13T13:43:55Z

验证集检验。正常情况下，神经网络的学习按照训练集、验证集和测试集三部分，模型通过训练集训练，由测试集评估，最终会选择测试集指标最好的参数保存，而代码里的验证集根本没有作用。
关于代码中的loss阈值问题。这似乎不是一种正常的模型停止方法，阈值超参数的选择大概率是基于测试集的表现来的，也就是大概率是数据泄露的问题的。虽然调超参往往都有这个问题，但是这种直接阈值停止的做法在我看来更为直接，我认为这是不好的。
关于阈值的选择。代码中阈值选择为0.95，在一个数据几乎在-0.1到0.1波动的数据，mse误差在0.95就停止学习，这似乎不是一个合格收敛的模型吧，0.95的mse误差对于这么小的数据过于大了，很难认为模型有正确学习。

很感谢你们的论文和代码分享，我正是因为对你们的工作非常感兴趣才会认真复现你们的代码。对于上述的问题，有可能是我时间仓促下的误解，如果错误感谢指出。在我的简单对比下，你们的模型表现在csi300中是明显好于GRU的，效果还是让人兴奋的。我希望我们都能设置更公平的训练、评测方式，推动领域的高质量发展。

LITONG99 · 2024-12-14T08:23:24Z

Thank you for your attention in our work.

We publish the threshold because we used to conduct rolling tests (train on 10 Qs, validate on 1 Q, then test on the next Q, repeatedly) and we refer to IC to help decide the threshold, which finally come to an empirical sweet point bringing profit. In the industry, some people believe such rolling tests are more practical and faithful because we validate for a prolonged period. If you don't buy in any empirical threshold, you can train and compare MASTER with other models in the way you like, although it may not bring out the best of our model.

It is worth mentioning that stock data is nothing like other ML domains such as CV. And, currently, empirical tuning is very important for stock data. For most ML stock price forecasting models, metrics like IC, AR, and MSE loss will not move in parallel and the performances of different test time spans dramatically change, which is quite different from some domains that have smooth learning curves and clear converge stages. The inherent reason is that the task is very difficult. Well, for many other domains, only IC>0.7 can be considered a strong model, but for stock, maybe IC=0.1 will already make you very rich :>. People attempt to gain a margin of performance instead of revealing the mechanism of finance. Given that MSE=0 corresponds to an oracle model that somehow stimulates the entire stock market during decades, MSE around 0.95 is not that irrational.

It will be a sophisticated research question to discuss how to train and evaluate the ML models fairly on stock data. Often the case, the single chronologically split validation set is not an ideal reflection of the model's performance on the test because of the inherent differences in data distribution. I would like to raise an even more interesting question that a model may "expire" in the stock domain because more and more people are using it for investment, then they will compete and eventually affect the market. So, how to fairly train and evaluate a stock price forecasting model, and is the latest profits the golden metric? I think many questions are to be discussed in this emerging domain of quantitative investment, but It is really beyond the scope of our paper.

nnstake · 2024-12-15T11:40:11Z

很感谢你的回答，对于滚动训练的问题，这毫无疑问是目前行业的合理选择。尽管这个阈值可以通过滚动训练的方式得出经验，但是这种模型选择方式同样可以根据每个epoch的测试集表现来选择，所以我认为这仍不是一个合理的选择方式。尽管在股票领域中存在样本外误差，使用验证集loss不那么有效，但我仍认为这是一种更公平的做法。

此外，这种问题通常不只是分布误差的问题，训练指标mse和主要测评指标ic的优化方向也是不完全一致的，可能也会带来模型选择的问题。

此外，SJTU-DMTai库中的StockMixer的文章，其代码实现的RIC其实是IR，这点应该是一个显著错误，如果你认识对方，也可以提醒他进行修改。

nnstake · 2024-12-15T12:34:36Z

我还有个问题想请教一下。我们都知道金融数据的分布差异巨大，实际中还会使用滚动训练，但是我看论文中训练集：验证集：测试集通常为6：2：2，而且光验证集的时间跨度就很大，这不会导致模型很容易在未来的市场中失效吗？此外，我注意到MASTER中的数据量更大，12年的训练集和2年的的测试集，这样子会不会导致非常容易出现模型失效的问题，以及为什么要选择这么大的数据？很期待得到你的回答，感谢。

LITONG99 · 2024-12-17T14:40:37Z

For Loss v.s. IC in validation, we use IC.
Why MASTER uses 12-year data to train? We just use all the data we collected. As we argued in the paper, the domain is in data shortage, so we did not abandon any collected data. Actually, most of the experiments were done during 2023/3-2023/7, so we are using almost the latest stock data of 2 years to test.

I would be very excited if other people like MASTER and want to test it under other settings.

The experiment part of our paper only provides a set of measures under controlled and limited conditions. MASTER focuses on designing new architectures that model 1) realistic correlation, and 2) market influence that previous models cannot. Our experiments (especially those besides the Overall Comparison) targeted to discuss the effectiveness of designs for those two aspects. I think all the other questions about data selection, potential temporal distribution drifts, etc., are very important and need to be carefully studied, but personally, I did not dive into them, and now I no longer work with stock data. Therefore, please understand I am not confident in answering them,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

对于训练过程，我有一些疑惑 #23

对于训练过程，我有一些疑惑 #23

nnstake commented Dec 13, 2024

LITONG99 commented Dec 14, 2024 •

edited

Loading

nnstake commented Dec 15, 2024

nnstake commented Dec 15, 2024

LITONG99 commented Dec 17, 2024

对于训练过程，我有一些疑惑 #23

对于训练过程，我有一些疑惑 #23

Comments

nnstake commented Dec 13, 2024

LITONG99 commented Dec 14, 2024 • edited Loading

nnstake commented Dec 15, 2024

nnstake commented Dec 15, 2024

LITONG99 commented Dec 17, 2024

LITONG99 commented Dec 14, 2024 •

edited

Loading