Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The original data is only 1000, which is not enough to generate 5 H5 files #7

Open
gqsmmz opened this issue Jul 26, 2024 · 4 comments

Comments

@gqsmmz
Copy link

gqsmmz commented Jul 26, 2024

When running python generate_pre_data.py, I found that the training data bc_train_check. json only had 10000 pieces of data, and the validation dataset only had 2953 pieces of data. It is not enough to generate 5 h5 files from [5999,11999,17999,23999,25616] like the following equation.

image

Is this because only a portion of the original data was uploaded?

@whcpumpkin
Copy link
Owner

Hi,
Did you download the raw_trajectory_dataset.zip in the Materials Download in OneDrive?
unzip this zip file, and run the following code:

import json
with open('bc_train_check.json', 'r') as f:
    data = json.load(f)
print("total number of bc_train_check.json: ", len(data))

the output is total number of bc_train_check.json: 25617

I am not sure whether the zip file is different bewteen in onedrive and googledrive even though they are the same in size (6.2GB)

@whcpumpkin
Copy link
Owner

I also recommend downloading the processed data directly in the onedrive

@gqsmmz
Copy link
Author

gqsmmz commented Jul 26, 2024

Thank you for your reply, there is no problem with this data!

@AlooTikkiii
Copy link

I hope ur problem is with the code file : generate_pre_data line : 81. The issue is with min(len(data), args.end) where args.end is 10k. Hope that solves ur issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants