-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read images from cloud storage #84
Comments
Sure. So this package provides a suite of image processing tools in the spirit of To be more specific, dask-image includes a large variety of filters (e.g. smoothing, denoising, edge detection, Fourier, etc.), morphological operations, and some operations on label images. There is certainly room for this package to grow in these areas based on the needs of the community. The functionality here is designed specifically to handle the fact that not all data needed for an operation may be in the same chunk. So will add overlaps with other chunks for filters or pull out relevant pieces of different chunks when working with label images. Hopefully that clarifies what dask-image is trying to solve and how it differs from functionality in Dask. Loading image data is generally a hard problem (even outside of Dask) due to the large variety of formats, image format extensions or specializations for specific fields, the requirements of different imaging modalities, file constraints (size, dimensionality, ordering, etc.), compression, access patterns, encryption, etc. As a consequence there are more than a few libraries that can be used to load image data with various tradeoffs that range between how closely the loaded data should match the format to smoothing out differences between many different formats by loading array data generally. For Would be happy to discuss the particular problem you are working with, but would need a bit more detail. Namely what image formats are involved, how the data is split up (if at all), whether authentication of some kind is needed, etc. There could be short term solutions using things like |
Does that help? Any other questions or comments here? |
I have a related question. What's the recommended way to use dask to read .tiff images stored at |
So I don't really use google cloud storage, but here's where I'd start:
>>> import dask_image
>>> x = dask_image.imread.imread('path/to/remote/location/*.tif') which uses the pims library to read in images. I'm not really sure how well this plays with OR Or if what you're doing is a little more custom, you can take the same approach John does in the section "Lazily load images with Dask Array", which uses Finally, will you let me know how this goes? It's something that would be very helpful for other people to know too, so I'd like to add an example for this to the docs (or maybe see a post on it in the dask blog). |
Thanks for the suggestions. I'll circle back after I explore.
…On Wed, Oct 16, 2019 at 6:36 PM Genevieve Buckley ***@***.***> wrote:
So I don't really use google cloud storage, but here's where I'd start:
1.
Use gcfs to get the remote filenames (see
https://gcsfs.readthedocs.io/en/latest/). Check this is working with
the simple text file reading example they have.
2.
Try to read a single remote tiff file with a non-dask image library
(maybe imageio, PIL, skimage, pims, whatever you use most often. For tiff
images I think most or all of these will use Christoph Gohlke's tifffile to
do the image reading). Does this bit work? If so, great, and I'd try it
with dask next.
3.
Try loading gcs images with dask. John has written a blog post that
might be helpful here: https://blog.dask.org/2019/06/20/load-image-data
You can try:
>>> import dask_image>>> x = dask_image.imread.imread('path/to/remote/location/*.tif')
which uses the pims <https://soft-matter.github.io/pims/v0.4.1/index.html>
library to read in images. I'm not really sure how well this plays with
gcs, but if the first two steps above work this should work too.
*OR*
Or if what you're doing is a little more custom, you can take the same
approach John does in the section *"Lazily load images with Dask Array"*,
which uses imageio.imread with dask-delayed to read data (and optionally
joins them together with dask.array.block or dask.array.stack).
Finally, will you let me know how this goes? It's something that would be
very helpful for other people to know too, so I'd like to add an example
for this to the docs (or maybe see a post on it in the dask blog
<https://github.com/dask/dask-blog>).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#84?email_source=notifications&email_token=ABKAKASFFTXQ6I3WZI2V3GLQO66TBA5CNFSM4FZQ4DMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOOHLY#issuecomment-542958511>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKAKAS7BTH325KLJSC3O3LQO66TBANCNFSM4FZQ4DMA>
.
|
I'm sorry I completely missed your last message @skeller88 My best guess is that you just don't have enough RAM available to persist the whole array in memory. https://examples.dask.org/array.html#Persist-data-in-memory There's also a memory leak issue with persist and dask-distributed, you can keep an eye on that conversation over dask/dask#2625 |
No worries. Thanks.
…On Thu, May 7, 2020 at 1:28 AM Genevieve Buckley ***@***.***> wrote:
I'm sorry I completely missed your last message @skeller88
<https://github.com/skeller88>
My best guess is that you just don't have enough RAM available to persist
the whole array in memory.
https://examples.dask.org/array.html#Persist-data-in-memory
There's also a memory leak issue with persist and dask-distributed, you
can keep an eye on that conversation over dask/dask#2625
<dask/dask#2625>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#84 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKAKATRFIRDI46SZ4MJVV3RQJWMTANCNFSM4FZQ4DMA>
.
|
I was wondering what the relationship between this package and the function
dask.array.image.imread
that's already part of dask.Especially as I detected that
dask.array.image.imread
doesn't actually make use of the remote data, so I couldn't give it as3://
protocol.The text was updated successfully, but these errors were encountered: