Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ytdata io: use data_file.start and .end index range #4595

Merged
merged 1 commit into from
Jul 24, 2023

Conversation

chrishavlin
Copy link
Contributor

Ok, this one should actually Close #4565

The problem was that the ytdata IOHandlerYTDataContainerHDF5 was not using the data_file start and end index ranges when processing each chunk, so every iteration was loading the full index range. Given that _count_particles does use the proper index range, it seemed better to correct _read_particle_coords and _read_particle_data_file to also use the index range rather than disable chunking for particles here.

Looking through the other ytdata IOHanlders, it does seem that those should be updated as well. I'm going to open another issue for that though, as I'd like to get this one through on the faster side if possible.

Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome catch!!!

@@ -308,12 +316,14 @@ def _read_particle_fields(self, chunks, ptf, selector):
yield (ptype, field), data


def _get_position_array(ptype, f, ax):
def _get_position_array(ptype, f, ax, index_mask=None):
if index_mask is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are slice objects mutable ? I couldn't find a reference to support it (or its contrary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually know either... If you try to modify attributes after instantiating you get errors:

a = slice(0, 10)
a.start = 1

raises AttributeError: readonly attribute

so immutable? maybe?

I could instead pass in the integer start/end indices here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't, but this follows the idiom we've used in the past.

>>> a = slice(None)
>>> a.start = 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: readonly attribute

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If slices are immutable (which seems to be the case) I would prefer the default value tu be a slice instead of None. One way to know for sure is to call ˋhash(slice(None))` (hashability implies immutability)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the default value tu be a slice instead of None

I actually started with def _get_position_array(ptype, f, ax, index_mask=slice(None)): but one of the linters raised an objection with calling a function within an argument definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff didn't like it

ruff.....................................................................Failed
- hook id: ruff
- exit code: 1

yt/frontends/ytdata/io.py:319:50: B008 Do not perform function call `slice` in argument defaults

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it. I'd prefer we not change the linter and violate idioms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me.

@chrishavlin
Copy link
Contributor Author

Confirmed that the fix does indeed fix the reported issue (with the full data, not just my reproducer). @neutrinoceros did you have any other comments? If not, good to merge I think!

@neutrinoceros
Copy link
Member

No further questions, thanks for your work !

@neutrinoceros neutrinoceros merged commit 6a91640 into yt-project:main Jul 24, 2023
10 checks passed
@lumberbot-app
Copy link

lumberbot-app bot commented Jul 24, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout yt-4.2.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 6a9164031dba19cf816e5e6f70979930f9a67158
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #4595: ytdata io: use data_file.start and .end index range'
  1. Push to a named branch:
git push YOURFORK yt-4.2.x:auto-backport-of-pr-4595-on-yt-4.2.x
  1. Create a PR against branch yt-4.2.x, I would have named this PR:

"Backport PR #4595 on branch yt-4.2.x (ytdata io: use data_file.start and .end index range)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

@neutrinoceros
Copy link
Member

@chrishavlin I won't be able to perform the manual backport until tomorrow, but I'd happily do it then if it's still needed

@chrishavlin
Copy link
Contributor Author

OK, tomorrow is totally fine, thanks!!

@neutrinoceros
Copy link
Member

The issue with backporting it is that this PR is built on top of #4579, which isn't (and shouldn't be ?) backported.
Since I only followed this work from afar, I don't feel it's my place to decide what's best, but I see 3 options:

@chrishavlin
Copy link
Contributor Author

I personally don't feel strongly about backporting this so I'm fine with option 2.

@neutrinoceros neutrinoceros removed this from the 4.2.2 milestone Jul 25, 2023
@neutrinoceros neutrinoceros added this to the 4.3.0 milestone Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Long load times on saved dataset
3 participants