Confirm element order in slices of datasets matches original order of unsliced datasets #895

ng390 · 2020-11-05T16:53:07Z

Confirm order in slices of datasets matches original order of unsliced datasets. Ordering issue noted in UCF101 test dataset--may only apply to larger datasets.

davidslater · 2020-11-05T20:27:07Z

@ng390 Can you provide the example you spoke of that doesn't match?

ng390 · 2020-11-05T20:36:47Z

For the UCF101 scenario, if we run a scenario on the MARS model with "eval_split": "test[[332,333]]" we do not get a shape warning, however if we run a scenario with the full test split, then the 333rd example does give a warning.

davidslater · 2020-11-06T21:29:57Z

The slice operator in TFDS does not guarantee ordering between different types of splits, only that the same split is repeatable. Generally, it's meant as an easy way to break things into large test/validate/train sets.

The primary other operator that we could use is tf.data.datasets.split(), but this still requires full processing, so it's probably better to handle it in our data generator logic.
tensorflow/tensorflow#44008 (comment)

davidslater · 2021-03-04T00:06:40Z

The skip function does not present computation of dataset items, it just discards them. https://www.tensorflow.org/api_docs/python/tf/data/Dataset#skip

Therefore, I think it is much easier to handle this in our own code. The main challenge will be when the data spans multiple batches.

ng390 added the bug Something isn't working label Nov 5, 2020

davidslater added this to the Version 0.14 milestone Nov 6, 2020

davidslater self-assigned this Nov 6, 2020

davidslater modified the milestones: Version 0.13, March-25 Feb 26, 2021

davidslater mentioned this issue Mar 9, 2021

add indexing to datasets #1003

Merged

5 tasks

davidslater closed this as completed Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confirm element order in slices of datasets matches original order of unsliced datasets #895

Confirm element order in slices of datasets matches original order of unsliced datasets #895

ng390 commented Nov 5, 2020

davidslater commented Nov 5, 2020

ng390 commented Nov 5, 2020

davidslater commented Nov 6, 2020

davidslater commented Mar 4, 2021

Confirm element order in slices of datasets matches original order of unsliced datasets #895

Confirm element order in slices of datasets matches original order of unsliced datasets #895

Comments

ng390 commented Nov 5, 2020

davidslater commented Nov 5, 2020

ng390 commented Nov 5, 2020

davidslater commented Nov 6, 2020

davidslater commented Mar 4, 2021