New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

询问data_process.py中ConcatDataset的使用 #3

Open

jt-dcw opened this issue Sep 24, 2024 · 1 comment

jt-dcw commented Sep 24, 2024

首先非常团队的出色工作，但是我在复现时有一个问题不是很理解，问题如下：
假设进行预训练时所使用的数据是['CAD4-1', 'NYC_TAXI']，这两个数据的特征数量分别是621和263，在特征数量不同的情况下是如何使用ConcatDataset是如何进行数据拼接呢

Collaborator

LZH-YS1998 commented Oct 15, 2024

感谢您的关注！可以使用ConcatDataset进行拼接是因为：ConcatDataset 通过保持一个内部的索引，将传入的多个数据集按照顺序连接。与每个数据集的样本内容或样本的尺寸无关。

您可以参考data_process.py部分的代码实现，以及pytorch提供的ConcatDataset官方文档。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment