One `imread` to rule them all #229

GenevieveBuckley · 2021-05-17T09:23:45Z

A lot of people have put a lot of effort into imread lately. This is great, and it's really helped. However, we've still got a way to go.

This is where I see the four major areas problems pop up in:

Read image data into Dask arrays accurately. We need more simple test cases here. Bug report: dask_image.imread.imread regression #220
Reduce confusion. Currently, there are multiple implementations of a dask imread function. The two most easily confused are dask_image.imread.imread() and dask.array.image.imread(). We need to figure out which is best, and only use that one.
Read data in fast. For that, we'll need to have some proper benchmarks, and run them routinely as part of the CI. This will help us decide (2) above. Previous discussion:
- Imread performance issue dask_image imread performance issue #181
- Getting movie files into Dask efficiently Getting movie files into dask efficiently #134
Process the image data fast, too. For that to happen, we need smart default choices for how we chunk image data in dask arrays. Jackson Maxfield Brown describes the problem well in this short video here

The text was updated successfully, but these errors were encountered:

GenevieveBuckley · 2021-05-17T09:27:07Z

The first step is getting a benchmark script going.

We need to:

Report benchmark results for dask_image.imread.imread() and dask.array.image.imread() (for an apples to apples comparison, you might need to explictly pass pims.open as a keyword argument to dask.array.image.imread())
Add benchmarking to run on our CI

evamaxfield · 2021-05-31T00:32:45Z

Read data in fast. For that, we'll need to have some proper benchmarks, and run them routinely as part of the CI. This will help us decide (2) above. Previous discussion:

Imread performance issue dask_image imread performance issue #181

Getting movie files into Dask efficiently Getting movie files into dask efficiently #134

Highly recommend using asv. We use it on aicsimageio to get our pure reading benchmarks and after working on the Dask Summit presentation I also just added the example from my slides as a benchmark suite and a "LibCompareSuite" to monitor aicsimageio and dask-image performance.

our benchmark code for general IO suites
our benchmark code for lib comparison suites
the CI setup for our benchmarks we run it as a part of our doc building on push to main
the produced benchmarks webpage

Note that because I just changed the benchmark parameters it reset a lot of the visualizations but it does have the benchmarks for the most recent commit as scatter plots basically. As more commits are added with the same benchmark configuration, it will show as a timeseries.

Report benchmark results for dask_image.imread.imread() and dask.array.image.imread() (for an apples to apples comparison, you might need to explictly pass pims.open as a keyword argument to dask.array.image.imread())

I tried doing the above during my benchmark setup on aicsimageio and I couldn't get the pims.open option working.

For the default case I felt it was an unfair comparison. dask.array.image.imread reads the whole file per chunk and from my quick look is meant for glob reading of files. (i.e. using the dask.array.image.imread will result in each chunk of the dask array being a whole file read of a file in the glob) dask-image can do glob reading as well but I still feel like the most common API interaction is reading a massive single file. (Probably just my usage bias though).

Happy to help and PR into dask-image where I can. At the very least, my talk is now basically built into the library on every commit 😄

GenevieveBuckley · 2021-05-31T08:22:31Z

That's a strong recommendation for asv! Very helpful to have those links and implementation details.

For the default case I felt it was an unfair comparison. dask.array.image.imread reads the whole file per chunk and from my quick look is meant for glob reading of files. (i.e. using the dask.array.image.imread will result in each chunk of the dask array being a whole file read of a file in the glob) dask-image can do glob reading as well but I still feel like the most common API interaction is reading a massive single file. (Probably just my usage bias though).

I'm not sure whether it's the most common, but it's definitely common enough that we need good performance.

m-albert · 2021-05-31T09:48:26Z

Report benchmark results for dask_image.imread.imread() and dask.array.image.imread() (for an apples to apples comparison, you might need to explictly pass pims.open as a keyword argument to dask.array.image.imread())

Wanted to link here a quick performance comparison we had done in the past: #194 (comment). The conclusion had been that dask.array.image is significantly faster than dask_image.imread both when using skimage.io and pims. The only advantage for the latter is dask graph creation time when input image files are large (as dask.array.image reads in a file to determine image shape, while dask_image.imread uses pims for this).

GenevieveBuckley · 2021-06-01T01:30:02Z

The only advantage for the latter is dask graph creation time when input image files are large (as dask.array.image reads in a file to determine image shape, while dask_image.imread uses pims for this).

Presumably we could add this behaviour to dask.array.image if that's useful.

m-albert · 2021-06-01T12:56:37Z

Presumably we could add this behaviour to dask.array.image if that's useful.

Definitely. Wouldn't currently think it's too critical though.

GenevieveBuckley · 2021-06-08T03:06:07Z

@jni says that scikit-image also has a good guide to asv. I think this is it here: https://scikit-image.org/docs/dev/contribute.html#benchmarks

GenevieveBuckley · 2022-05-13T02:21:56Z

One big disadvantage for dask.array.image.imread is poor chunking behaviour. It looks like it makes a single chunk for every filename on disk. This is not greart for movie files or multislice tiffs, etc. where you probably don't want to load the whole movie file into RAM.

See #262 (comment)

jakirkham · 2022-05-13T05:27:15Z

Yeah this comes up with large multipage TIFFs. They can be kind of movie-like

Wonder if we should just make the move to using ImageIO here with PR ( imageio/imageio#739 ) in? It's hard supporting all of the different file formats/use cases out there. Maybe a better separation of concerns would improve the user experience.

Edit: Also broadly related ( dask/dask#9049 )

evamaxfield mentioned this issue May 30, 2021

admin/4.0-release-prep-and-benchmark-upgrades AllenCellModeling/aicsimageio#244

Merged

4 tasks

habi mentioned this issue Jun 22, 2021

Issues with aray shape when using dask_image.imread (and dask.array.image.imread) vs. imageio.imread #239

Closed

GenevieveBuckley mentioned this issue May 10, 2022

For some schedulers, setting PIMS image reader's .class_priority is ineffective in controlling dask-image.imread() #262

Open

GenevieveBuckley mentioned this issue Jul 25, 2022

Use natural sorting in imread(...) when globbing multiple files #265

Merged

GenevieveBuckley mentioned this issue Aug 12, 2022

dask_image.imread.imread: differences between using for local file and hosted file #268

Closed

This was referenced May 8, 2024

Multi-series and multi-channel nd2 files are loaded incompletely without error/warning #364

Open

jp2 slicing #359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One `imread` to rule them all #229

One `imread` to rule them all #229

GenevieveBuckley commented May 17, 2021

GenevieveBuckley commented May 17, 2021 •

edited by m-albert

Loading

evamaxfield commented May 31, 2021

GenevieveBuckley commented May 31, 2021

m-albert commented May 31, 2021

GenevieveBuckley commented Jun 1, 2021

m-albert commented Jun 1, 2021

GenevieveBuckley commented Jun 8, 2021

GenevieveBuckley commented May 13, 2022

jakirkham commented May 13, 2022 •

edited

Loading

One imread to rule them all #229

One imread to rule them all #229

Comments

GenevieveBuckley commented May 17, 2021

GenevieveBuckley commented May 17, 2021 • edited by m-albert Loading

evamaxfield commented May 31, 2021

GenevieveBuckley commented May 31, 2021

m-albert commented May 31, 2021

GenevieveBuckley commented Jun 1, 2021

m-albert commented Jun 1, 2021

GenevieveBuckley commented Jun 8, 2021

GenevieveBuckley commented May 13, 2022

jakirkham commented May 13, 2022 • edited Loading

One `imread` to rule them all #229

One `imread` to rule them all #229

GenevieveBuckley commented May 17, 2021 •

edited by m-albert

Loading

jakirkham commented May 13, 2022 •

edited

Loading