Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Directly open compressed files #3443

Merged
merged 56 commits into from
Jan 11, 2022

Conversation

cphyc
Copy link
Member

@cphyc cphyc commented Jul 19, 2021

PR Summary

This PR allows the user to directly load a dataset stored in a .tar.gz file (or any foramt supported by ratarmount).
It works by first mounting the .tar file as a read-only filesystem, which can then be used directly in yt.

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

Example

# Create .tar file
tar czf output_00080.tar.gz output_00080

Now, one can load the dataset in the archive directly using

import yt
ds = yt.load_archive("output_00080.tar.gz", "output_00080")

@cphyc cphyc added enhancement Making something better new feature Something fun and new! labels Jul 19, 2021
@cphyc cphyc marked this pull request as draft July 19, 2021 14:46
yt/loaders.py Outdated Show resolved Hide resolved
yt/loaders.py Outdated Show resolved Hide resolved
@cphyc cphyc marked this pull request as ready for review July 20, 2021 10:26
Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a first batch of comments

doc/source/examining/loading_data.rst Outdated Show resolved Hide resolved
yt/loaders.py Outdated Show resolved Hide resolved
yt/loaders.py Outdated Show resolved Hide resolved
yt/loaders.py Outdated Show resolved Hide resolved
@cphyc cphyc force-pushed the open-compressed-files branch 2 times, most recently from 910f75c to 0abb298 Compare July 22, 2021 08:56
@cphyc cphyc force-pushed the open-compressed-files branch 2 times, most recently from feeea19 to 2d7e733 Compare July 23, 2021 15:33
@cphyc
Copy link
Member Author

cphyc commented Jul 23, 2021

For reference, here is a simplistic benchmark:

Access Time (s)
Direct 1.75
Through .tar.gz 5.57
Through .tar 2.03

This uses the following code:

>>>  %%timeit
... ds = yt.load("output_00080")
... ad = ds.all_data()
... ad["density"]
>>> %%timeit
... ds = yt.load_archive("output_00080.tar.gz", "output_00080")
... ad = ds.all_data()
... ad["density"]
>>> %%timeit
... ds = yt.load_archive("output_00080.tar", "output_00080")
... ad = ds.all_data()
... ad["density"]

Note that the first access is much slower as the entire tar file has to be read from disk.

@cphyc cphyc force-pushed the open-compressed-files branch 2 times, most recently from 4f60c85 to 22a00f2 Compare July 28, 2021 16:24
@cphyc cphyc force-pushed the open-compressed-files branch from 2542744 to d2210b9 Compare September 23, 2021 15:06
@matthewturk
Copy link
Member

@yt-fido test this please

@matthewturk matthewturk changed the title Directly open compressed files ENH: Directly open compressed files Oct 9, 2021
@cphyc
Copy link
Member Author

cphyc commented Oct 11, 2021

@yt-fido test this please

1 similar comment
@neutrinoceros
Copy link
Member

@yt-fido test this please

@cphyc
Copy link
Member Author

cphyc commented Oct 11, 2021

pre-commit.ci autofix

@cphyc
Copy link
Member Author

cphyc commented Oct 19, 2021

I don't really understand why it fails on Jenkins but it's really hard to find out without having direct access to it (it just doesn't seem to mount properly). I'd be happy to somehow mark to feature as experimental for the time being and see what happens with users on real cases. Wdyt @munkm @matthewturk @neutrinoceros?

@neutrinoceros
Copy link
Member

I'd be happy to somehow mark to feature as experimental for the time being and see what happens with users on real cases. Wdyt @munkm @matthewturk @neutrinoceros?

I'd get behind such a plan, as long as it is impossible to use the feature without being warned that it's still experimental. It would also need to be clearly stated in docs and release notes, but this can be done later AFAIC.

@matthewturk
Copy link
Member

That plan sounds good to me.

@cphyc cphyc force-pushed the open-compressed-files branch from f177e60 to 831c034 Compare October 20, 2021 08:32
@cphyc
Copy link
Member Author

cphyc commented Oct 20, 2021

@matthewturk and @neutrinoceros, I have added a warning and I am skipping the tests on Jenkins (at least, that's what I am hoping) as they are failing with a timeout and I cannot understand why (there must be some permission error, ...)

@cphyc cphyc force-pushed the open-compressed-files branch from aa88a81 to 54fe166 Compare October 20, 2021 16:10
neutrinoceros
neutrinoceros previously approved these changes Nov 2, 2021
@neutrinoceros
Copy link
Member

I see a minor flake8 issue so disabling auto-merge while I fix that.

neutrinoceros
neutrinoceros previously approved these changes Jan 11, 2022
@neutrinoceros
Copy link
Member

now mypy is unhappy (I think this branch is older than type-checking so it's not too surprising). @cphyc, do you want to tackle these nits yourself ? Otherwise I'm happy to step in.

@neutrinoceros neutrinoceros merged commit f5cbfb3 into yt-project:main Jan 11, 2022
@cphyc cphyc deleted the open-compressed-files branch January 11, 2022 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Making something better new feature Something fun and new!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants