Particle indices have to be re-generated every time for some datasets #3487

chummels · 2021-08-28T21:18:24Z

Bug report

Bug summary

As discussed in the yt slack, some datasets must re-generate particle indices every time they are loaded, instead of using the existing .ewah files to skip this step. This negates the whole purpose of generating these indices (ewah files) and can take hours to load in a dataset depending on the nature of the dataset.

The issue arises for datasets where yt updates the refined index order to be more efficient, it generates an ewah file specifically named based on the new coarse and refined indices (e.g., halo_59.hdf5.index6_4.ewah). But when yt loads this dataset the next time, it searches for an ewah file based on the default coarse and refined indices (e.g., halo_59.hdf5.index7_5.ewah), so it fails to see the ewah file and thinks it needs to generate it again.

Code for reproduction

import yt
ds = yt.load_sample('TNGHalo')
ds.index
ds = yt.load_sample('TNGHalo')
ds.index

Actual outcome

This code needs to generate the coarse index and refined index both times it loads the dataset. If you re-run this script with a different dataset, like FIRE_M12i_ref11, it only needs to generate the coarse index and refined index once, and the second time, it just loads the particle index data Loading particle index.

The text was updated successfully, but these errors were encountered:

chummels · 2021-08-28T21:18:43Z

This issue was discussed in #3198 , and a solution was proposed:

I've now limited the heuristic, but I'm somewhat slightly concerned that what it now does is check for the old index_order in the filename, rather than the new, so it's entirely possible that if it does any modifications to the index_order, it will always always generate new ewah files.

One possible way around this would be to have the filename just have index_order1 in it, and if it's auto-generated, have it call it "auto" or something.

chummels · 2021-08-28T21:21:40Z

Alternatively, why do we need to list the indices in the ewah filename at all? Once an ewah file has been generated, does it matter at all what the coarse and refined indices are? Perhaps I'm being naive, but it seems like once it's generated, it'll just work for loading in the data. But perhaps one will see changes in efficiency depending on the future functions applied to that dataset, so maybe it does matter?

One solution would be to just use regular expressions to see if there is any ewah file (with the same filename stem) in the same directory as the dataset, and if so, try to use that. This would resolve the issue, I think, and be backwards compatible as well.

matthewturk · 2021-08-28T21:29:52Z

That might be a fine solution. I'd be open to an implementation that does that. I think we need to double check that the orders are stored in the file itself, but I believe they are.

…

On Sat, Aug 28, 2021, 4:21 PM Cameron Hummels ***@***.***> wrote: Alternatively, why do we need to list the indices in the ewah filename at all? Once an ewah file has been generated, does it matter at all what the coarse and refined indices are? Perhaps I'm being naive, but it seems like once it's generated, it'll just work for loading in the data. But perhaps one will see changes in efficiency depending on the future functions applied to that dataset, so maybe it does matter? One solution would be to just use regular expressions to see if there is any ewah file (with the same filename stem) in the same directory as the dataset, and if so, try to use that. This would resolve the issue, I think, and be backwards compatible as well. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3487 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAVXOYRXZHVG7BQV6DYIMTT7FHO7ANCNFSM5C7P55NQ> .

neutrinoceros · 2022-11-18T21:55:15Z

closed via #4198

jzuhone mentioned this issue Nov 7, 2022

Automatically find EWAH files with increased index_order2 #4198

Merged

2 tasks

neutrinoceros closed this as completed Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Particle indices have to be re-generated every time for some datasets #3487

Particle indices have to be re-generated every time for some datasets #3487

chummels commented Aug 28, 2021

chummels commented Aug 28, 2021

chummels commented Aug 28, 2021

matthewturk commented Aug 28, 2021 via email

neutrinoceros commented Nov 18, 2022

Particle indices have to be re-generated every time for some datasets #3487

Particle indices have to be re-generated every time for some datasets #3487

Comments

chummels commented Aug 28, 2021

Bug report

chummels commented Aug 28, 2021

chummels commented Aug 28, 2021

matthewturk commented Aug 28, 2021 via email

neutrinoceros commented Nov 18, 2022