ytdata io: use data_file.start and .end index range #4595

chrishavlin · 2023-07-24T15:11:13Z

Ok, this one should actually Close #4565

The problem was that the ytdata IOHandlerYTDataContainerHDF5 was not using the data_file start and end index ranges when processing each chunk, so every iteration was loading the full index range. Given that _count_particles does use the proper index range, it seemed better to correct _read_particle_coords and _read_particle_data_file to also use the index range rather than disable chunking for particles here.

Looking through the other ytdata IOHanlders, it does seem that those should be updated as well. I'm going to open another issue for that though, as I'd like to get this one through on the faster side if possible.

matthewturk

Awesome catch!!!

neutrinoceros · 2023-07-24T15:25:19Z

yt/frontends/ytdata/io.py

@@ -308,12 +316,14 @@ def _read_particle_fields(self, chunks, ptf, selector):
                        yield (ptype, field), data


-def _get_position_array(ptype, f, ax):
+def _get_position_array(ptype, f, ax, index_mask=None):
+    if index_mask is None:


Are slice objects mutable ? I couldn't find a reference to support it (or its contrary).

I don't actually know either... If you try to modify attributes after instantiating you get errors:

a = slice(0, 10) a.start = 1

raises AttributeError: readonly attribute

so immutable? maybe?

I could instead pass in the integer start/end indices here.

They aren't, but this follows the idiom we've used in the past.

>>> a = slice(None) >>> a.start = 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: readonly attribute

If slices are immutable (which seems to be the case) I would prefer the default value tu be a slice instead of None. One way to know for sure is to call ˋhash(slice(None))` (hashability implies immutability)

I would prefer the default value tu be a slice instead of None

I actually started with def _get_position_array(ptype, f, ax, index_mask=slice(None)): but one of the linters raised an objection with calling a function within an argument definition.

ruff didn't like it

ruff.....................................................................Failed - hook id: ruff - exit code: 1 yt/frontends/ytdata/io.py:319:50: B008 Do not perform function call `slice` in argument defaults

Leave it. I'd prefer we not change the linter and violate idioms.

Fine by me.

chrishavlin · 2023-07-24T17:09:14Z

Confirmed that the fix does indeed fix the reported issue (with the full data, not just my reproducer). @neutrinoceros did you have any other comments? If not, good to merge I think!

neutrinoceros · 2023-07-24T17:12:55Z

No further questions, thanks for your work !

lumberbot-app · 2023-07-24T17:13:22Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

git checkout yt-4.2.x
git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

git cherry-pick -x -m1 6a9164031dba19cf816e5e6f70979930f9a67158

You will likely have some merge/cherry-pick conflict here, fix them and commit:

git commit -am 'Backport PR #4595: ytdata io: use data_file.start and .end index range'

Push to a named branch:

git push YOURFORK yt-4.2.x:auto-backport-of-pr-4595-on-yt-4.2.x

Create a PR against branch yt-4.2.x, I would have named this PR:

"Backport PR #4595 on branch yt-4.2.x (ytdata io: use data_file.start and .end index range)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

neutrinoceros · 2023-07-24T17:15:14Z

@chrishavlin I won't be able to perform the manual backport until tomorrow, but I'd happily do it then if it's still needed

chrishavlin · 2023-07-24T18:52:29Z

OK, tomorrow is totally fine, thanks!!

neutrinoceros · 2023-07-25T07:58:08Z

The issue with backporting it is that this PR is built on top of #4579, which isn't (and shouldn't be ?) backported.
Since I only followed this work from afar, I don't feel it's my place to decide what's best, but I see 3 options:

backport ytdata: check for all_data in particle selection #4579 too
renounce backporting the present PR
backport a modified version of this PR that works on yt 4.2.x too (in other words, resolve the merge conflict)

chrishavlin · 2023-07-25T13:53:04Z

I personally don't feel strongly about backporting this so I'm fine with option 2.

use data_file.start and .end index range

6405200

chrishavlin added bug index: particle labels Jul 24, 2023

neutrinoceros added this to the 4.2.2 milestone Jul 24, 2023

matthewturk approved these changes Jul 24, 2023

View reviewed changes

chrishavlin mentioned this pull request Jul 24, 2023

ytdata IOHandlers: data_file index ranges? #4596

Open

neutrinoceros reviewed Jul 24, 2023

View reviewed changes

chrishavlin mentioned this pull request Jul 24, 2023

Long load times on saved dataset #4565

Closed

neutrinoceros merged commit 6a91640 into yt-project:main Jul 24, 2023

lumberbot-app bot added the Still Needs Manual Backport label Jul 24, 2023

neutrinoceros added the performance label Jul 24, 2023

neutrinoceros removed this from the 4.2.2 milestone Jul 25, 2023

neutrinoceros removed the Still Needs Manual Backport label Jul 25, 2023

neutrinoceros added this to the 4.3.0 milestone Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ytdata io: use data_file.start and .end index range #4595

ytdata io: use data_file.start and .end index range #4595

chrishavlin commented Jul 24, 2023

matthewturk left a comment

neutrinoceros Jul 24, 2023

chrishavlin Jul 24, 2023

matthewturk Jul 24, 2023

neutrinoceros Jul 24, 2023

chrishavlin Jul 24, 2023

chrishavlin Jul 24, 2023

matthewturk Jul 24, 2023

neutrinoceros Jul 24, 2023

chrishavlin commented Jul 24, 2023

neutrinoceros commented Jul 24, 2023

lumberbot-app bot commented Jul 24, 2023

neutrinoceros commented Jul 24, 2023

chrishavlin commented Jul 24, 2023

neutrinoceros commented Jul 25, 2023

chrishavlin commented Jul 25, 2023

ytdata io: use data_file.start and .end index range #4595

ytdata io: use data_file.start and .end index range #4595

Conversation

chrishavlin commented Jul 24, 2023

matthewturk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrishavlin commented Jul 24, 2023

neutrinoceros commented Jul 24, 2023

lumberbot-app bot commented Jul 24, 2023

neutrinoceros commented Jul 24, 2023

chrishavlin commented Jul 24, 2023

neutrinoceros commented Jul 25, 2023

chrishavlin commented Jul 25, 2023