-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One imread
to rule them all
#229
Comments
The first step is getting a benchmark script going. We need to:
|
Highly recommend using asv. We use it on
Note that because I just changed the benchmark parameters it reset a lot of the visualizations but it does have the benchmarks for the most recent commit as scatter plots basically. As more commits are added with the same benchmark configuration, it will show as a timeseries.
I tried doing the above during my benchmark setup on aicsimageio and I couldn't get the For the default case I felt it was an unfair comparison. Happy to help and PR into dask-image where I can. At the very least, my talk is now basically built into the library on every commit 😄 |
That's a strong recommendation for
I'm not sure whether it's the most common, but it's definitely common enough that we need good performance. |
Wanted to link here a quick performance comparison we had done in the past: #194 (comment). The conclusion had been that |
Presumably we could add this behaviour to |
Definitely. Wouldn't currently think it's too critical though. |
@jni says that scikit-image also has a good guide to asv. I think this is it here: https://scikit-image.org/docs/dev/contribute.html#benchmarks |
One big disadvantage for See #262 (comment) |
Yeah this comes up with large multipage TIFFs. They can be kind of movie-like Wonder if we should just make the move to using ImageIO here with PR ( imageio/imageio#739 ) in? It's hard supporting all of the different file formats/use cases out there. Maybe a better separation of concerns would improve the user experience. Edit: Also broadly related ( dask/dask#9049 ) |
A lot of people have put a lot of effort into
imread
lately. This is great, and it's really helped. However, we've still got a way to go.This is where I see the four major areas problems pop up in:
Read image data into Dask arrays accurately. We need more simple test cases here. Bug report: dask_image.imread.imread regression #220
Reduce confusion. Currently, there are multiple implementations of a dask
imread
function. The two most easily confused aredask_image.imread.imread()
anddask.array.image.imread()
. We need to figure out which is best, and only use that one.Read data in fast. For that, we'll need to have some proper benchmarks, and run them routinely as part of the CI. This will help us decide (2) above. Previous discussion:
Process the image data fast, too. For that to happen, we need smart default choices for how we chunk image data in dask arrays. Jackson Maxfield Brown describes the problem well in this short video here
The text was updated successfully, but these errors were encountered: