-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove packaging of test data and download them on demand #123
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #123 +/- ##
=======================================
Coverage 85.24% 85.24%
=======================================
Files 73 73
Lines 9034 9042 +8
Branches 2045 2046 +1
=======================================
+ Hits 7701 7708 +7
Misses 872 872
- Partials 461 462 +1
☔ View full report in Codecov by Sentry. |
MY_PATH = os.path.dirname(__file__) | ||
ZIPF = os.path.join(MY_PATH, "edax_files.zip") | ||
TEST_DATA_PATH = Path(__file__).parent / "data" / "edax" | ||
ZIPF = TEST_DATA_PATH / "edax_files.zip" | ||
TMP_DIR = tempfile.TemporaryDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tmp_path
here as for other plugins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is good chance to get rid of zip and other packed data (I mean leave it loose - not packed within zip or gz)?
Why:
- bandwitch saving is redundant, as git compresses data behind anyway.
- size is moot point, if this is going to be downloaded on demand
- updating with new files, requires huge binary differences, getting rid of zip/gz collections, only new file will be added as difference to the git history.
- finally, these tmp_path would be not needed (albeit if tests would be rewritten to use rsciio to read the data explicitly, and retrieved list/dictionary fed to hyperspy to generate signal for testing, for some formats loading could be straight from zip file, without any tmp_folders)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should try to remove of the zip files. Is it ok, if we do this in a follow PR? It would be good to test updating register, etc.
Just to avoid misunderstand, the expected usage is to download the files on demand when running from a non-development installation. For developer, there will be no change in workflow (well, you can remove the data folder and it will be downloaded again).
The main aim is to be able to run the test suite from the conda-forge feedstock or the hyperspy-bundle.
- finally, these tmp_path would be not needed (albeit if tests would be rewritten to use rsciio to read the data explicitly, and retrieved list/dictionary fed to hyperspy to generate signal for testing, for some formats loading could be straight from zip file, without any tmp_folders)
I am not sure about that. What would be the benefit? For development purpose I usually prefer to open the file directly, sometimes using different approach, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if zip files are get rid (replaced with unpacked content), then no tmp_path is needed as all files can be just opened directly (using Hyperspy load
). That is prefered way.
Sorry for convoluted message. What I had meant some of readers (undocumented behavior) are able to open not only physical files, but can accept the opened file or other BytesIO
like opened object, i.e. file from within zip archive - no tmp folder then is needed, as byte-stream is read and parsed directly seamlessly from zip. I know that that works with tiffs. Ofc for test files that feature rather not good due to 1,2,3 reasons given before. However in the future we should consider documenting this behavior as gz, zip are one of most common available well documented, tested, widely supported containers. I.e. it is nice to be able to keep whole SEM session for one sample within single file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the zip file business, for the scenario where tmp_path
are used for testing saving file, we should keep this pattern, because we don't need to worry about removing them, user having write access to the folder, etc. What do you think?
This make me think that to run the test suite, the user will need write access to the tests folder, which is not guaranty in multi user installation. I have updated the summary of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
tmp_path
here as for other plugins?
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course for saving tests the situation is different, and rather it can't be evaded anyhow.
@ericpre Very cool! I'll probably try to check this out tomorrow to see if I can get it to work. The |
Yes, this is a good idea, even if not everyone uses pre-commit, it will be useful to request pre-commit to update it from a comment in a PR. Can we leave this for another PR? :) |
Sounds good to me. Otherwise I think it looks good. |
TODO before merging: Edit |
Everyone is happy with it, merging to continue on the other PRs. |
Follow up of #123 (Remove packaging of test data)
I had a go at removing the packaging of the test file and downloading on demand when running the test suite.
Summary of the approach:
registry.txt
by calling thersciio.tests.registry_utils.update_registry
functionpytest-xdist
orpytest
Example of the download all test data before running the test suite: https://github.com/hyperspy/rosettasciio/actions/runs/5071177296/jobs/9107256002?pr=123
Drawback
Progress of the PR
rsciio/tests/data
folder,CONTRIBUTING.rst
once we are happy with the approachupcoming_changes
folder (seeupcoming_changes/README.rst
),readthedocs
doc build of this PR (link in github checks)After the PR is merged