Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior when loading an wnsemble with sort and sorted flags #412

Open
wilsonbb opened this issue Mar 28, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@wilsonbb
Copy link
Collaborator

When loading an Ensemble, you can currently set whether the data is already sorted and whether to sort it via the respective sorted and sort boolean flags.

However currently loading an unsorted dataset with sort=False and sorted=True will produce an unsorted index.

Screenshot 2024-03-28 at 10 17 39 AM

Note however that though each individual partition is unsorted, the divisions are still set (all values within a partition are within a given range that doesn't overlap with other partitions). However we should still make sure the data is sorted or change the description of the flag.

When we load with sort=True and sorted=False, we do get a sorted index but now there are warnings that the divisions are not set.

Screenshot 2024-03-28 at 10 19 59 AM

@dougbrn noted that this seems related to dask-expr issue dask/dask-expr#975 where we lose divisions with only one partition after a reset_index call. This is mildly annoying to fix since our npartitions parameter only triggers the repartition call after we have reset the index and thus already lost the divisions.

@wilsonbb wilsonbb added the bug Something isn't working label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant