Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix rasterio chunking with s3 datasets #1817

Merged
merged 3 commits into from
Jan 23, 2018

Conversation

rabernat
Copy link
Contributor

@rabernat rabernat commented Jan 10, 2018

  • Closes rasterio chunks argument causes loading from s3 to fail #1816 (remove if there is no corresponding issue, which should only be the case for minor changes)
  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Passes git diff upstream/master **/*py | flake8 --diff (remove if you did not edit any Python files)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This is a simple fix for token generation of non-filename targets for rasterio.

The problem is that I have no idea how to test it without actually hitting s3 (which requires boto and aws credentials).

@shoyer
Copy link
Member

shoyer commented Jan 11, 2018

You can test this by mocking os.path.getmtime, e.g.,

from unittest import mock  # but note that we use a special import for mock in xarray

with mock.patch('os.path.getmtime', side_effect=RuntimeError):
    os.path.getmtime('fooasdf.txt')

try:
mtime = os.path.getmtime(filename)
except FileNotFoundError:
# the filename is probably and s3 bucket rather than a regular file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and -> an

mtime = os.path.getmtime(filename)
try:
mtime = os.path.getmtime(filename)
except FileNotFoundError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 2 only has the super-class OSError, so that should probably be used instead here.

@rabernat
Copy link
Contributor Author

I also refactored the rasterio test suite to eliminate a lot of boilerplate dataset creation and added tests for network urls. I'm not a geotiff expert, so please let me know if I have done anything silly.

@rabernat rabernat requested review from jhamman and fmaussion January 13, 2018 17:09
Copy link
Member

@fmaussion fmaussion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rabernat for the long overdue refactoring, althought this will mess with #1712 and #1740 quite a bit. Shouldv'e updated my PR earlier, d'oh!

def test_http_url(self):
# more examples urls here
# http://download.osgeo.org/geotiff/samples/
url = 'http://download.osgeo.org/geotiff/samples/made_up/ntf_nord.tif'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about the @network decorator, this can be quite useful for other tests as well...

@rabernat
Copy link
Contributor Author

this will mess with #1712 and #1740 quite a bit

I probably should have paid more attention to those PR's. My apologies.

Here is how I ended up refactoring the tests. I had to write a new test to catch the bug I discovered. I looked at the existing tests for guidance and saw lots of boilerplate. So rather than copying the boilerplate, I refactored it.

Let me know how I can help resolve this conflicts this may cause with the other open PRs.

@fmaussion
Copy link
Member

#1740 is OK since the tests are still missing. My PR is a bit more problematic, but I can handle it. I would favor pushing yours forward now and I'll rebase my PR on top of it.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@shoyer
Copy link
Member

shoyer commented Jan 18, 2018

@rabernat this now has some conflicts to resolve. Can you fix those and merge this? Thanks!

@rabernat
Copy link
Contributor Author

I resolved conflicts. LGTM before it gets stale again!

Is it bad form to merge my own PR?

@fmaussion
Copy link
Member

test_nodata and test_nodata_missing are tests which were added in #1740, they shouldn't be deleted by your PR. It's fine if you don't refactorize them (I'll need to have a look at all the tests anyway once all conflicting PRs are merged, see #1843)

@shoyer
Copy link
Member

shoyer commented Jan 23, 2018

@rabernat @fmaussion it's fine to merge your own PR once it's been reviewed & approved by another core developer.

@shoyer shoyer merged commit 3cd2337 into pydata:master Jan 23, 2018
@shoyer
Copy link
Member

shoyer commented Jan 23, 2018

This looks good so let's get it in!

@rabernat
Copy link
Contributor Author

@fmaussion - so sorry I accidentally deleted your tests. I was just rushing to resolve the conflicts so as not to hold up the other PRs. I hope this does not cause too much inconvenience.

@fmaussion
Copy link
Member

@rabernat no problem, will go back the rasterio tests soon

@fmaussion fmaussion mentioned this pull request Feb 6, 2018
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rasterio chunks argument causes loading from s3 to fail
4 participants