-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use windowed writing in to_raster #9
Comments
Currently dask is not a dependency to If I understand correctly, it seems this is how you are achieving this in With the use of I am wondering if this could be an option to toggle on and will only work if you have dask installed. |
Yes. Dask's There is also the complication of when to open/close the file. This is one of the uglier parts of how Satpy/trollimage does things. Dask has no way of having a task that opens a file, runs a series of tasks (saving array chunks), and then closing the file when they are all done. There are related issues on dask's github but I don't have time to find them right now. Since trollimage requires dask (because Satpy uses dask) we haven't had to deal with the difficulty of handling numpy and dask arrays in the most efficient way possible. |
Interesting. I will have to dig into it later. |
Reading on the subject leads me to think that using the
|
In point 1, doesn't blockxsize and blockysize assume tiling? What if I wanted a striped geotiff but was using dask arrays with 2D chunks? For point 2, what "chunk" method are you referring to? |
For point 1, it does assume tiling. With my current understanding I was thinking about how to optimize, but I am not entirely sure at the moment if it is a good idea or not. How often do you use a striped raster? For point 2, when writing with dask chunks and you didn't start with a dask array: http://xarray.pydata.org/en/stable/generated/xarray.DataArray.chunk.html |
Also, these could be defaults that could be overridden. |
Point 1, ok. Since the default for gdal/rasterio is striped rasters that's what I typically do but I give users the option to switch to tiled if they'd like. I'd be ok with more libraries changing the default to tiled but I'd be curious about gdal's reasoning for not doing tiling by default. This would likely remove the issues I mentioned with GDAL's internal caching. The only thing I don't 100% agree with here is that the chunk size be used directly for block size. I think it could be used, but in most cases (in my experience) the dask array chunk size is much larger than "usual" tiled geotiffs (256x256, 512x512, etc); especially if you consider cloud optimized geotiffs as the "preferred" version of a geotiff. Point 2, ah cool. I didn't know that existed. |
That is definitely a valid concern. In these cases, we would have to have a check for that. The block sizes for a geotiff for overviews are a So, it seems you can get away with larger block sizes as long as they are in the power of 2. But, this does limit the allowed dask chunk sizes when writing to a file. Ideally the dask chunk size and the block size would be the same. Thoughts? |
Another option for optimizing writing would be to call the |
Agree with everything you said. Perhaps if this method is defaulting to tiled geotiffs then it could have default block sizes of 256x256 (like GDAL does) and it could rechunk the data before writing it. If you Brain dump/side note: I wonder if rechunking for a striped geotiff to stripe size chunks would improve geotiff writing performance and final geotiff size. I'll have to try that in satpy some time. |
Sounds good. Looks like we have enough here to tinker with some ideas later. It will be interesting to hear back about the stripsize tests as well. |
This looks interesting: https://github.com/dymaxionlabs/dask-rasterio. Hasn't been updated in a while and only works with poetry. But, seems like a good additional reference. |
Looks like they do a lot of the same stuff we did in the trollimage package. The main difference is that we allow you to not compute the result right away. This lets you delay the data writing until later which is needed by some of the stuff we do in Satpy. I should probably look at how xarray is handling things nowadays with handling open file objects (and more importantly serializing them). |
One small difference I saw is the use of Reference: |
Was planning to tinker around with this and noticed the GPL license in |
From: #8
ping @djhoese
The text was updated successfully, but these errors were encountered: