Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm element order in slices of datasets matches original order of unsliced datasets #895

Closed
ng390 opened this issue Nov 5, 2020 · 4 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ng390
Copy link
Contributor

ng390 commented Nov 5, 2020

Confirm order in slices of datasets matches original order of unsliced datasets. Ordering issue noted in UCF101 test dataset--may only apply to larger datasets.

@ng390 ng390 added the bug Something isn't working label Nov 5, 2020
@davidslater
Copy link
Contributor

@ng390 Can you provide the example you spoke of that doesn't match?

@ng390
Copy link
Contributor Author

ng390 commented Nov 5, 2020

For the UCF101 scenario, if we run a scenario on the MARS model with "eval_split": "test[[332,333]]" we do not get a shape warning, however if we run a scenario with the full test split, then the 333rd example does give a warning.

@davidslater davidslater added this to the Version 0.14 milestone Nov 6, 2020
@davidslater davidslater self-assigned this Nov 6, 2020
@davidslater
Copy link
Contributor

The slice operator in TFDS does not guarantee ordering between different types of splits, only that the same split is repeatable. Generally, it's meant as an easy way to break things into large test/validate/train sets.

The primary other operator that we could use is tf.data.datasets.split(), but this still requires full processing, so it's probably better to handle it in our data generator logic.
tensorflow/tensorflow#44008 (comment)

@davidslater davidslater modified the milestones: Version 0.13, March-25 Feb 26, 2021
@davidslater
Copy link
Contributor

The skip function does not present computation of dataset items, it just discards them. https://www.tensorflow.org/api_docs/python/tf/data/Dataset#skip

Therefore, I think it is much easier to handle this in our own code. The main challenge will be when the data spans multiple batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants