-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to remove near objects in nd-image #4165
Conversation
0ea6635
to
b2bb144
Compare
Hello @lagru! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2022-04-10 15:22:21 UTC |
@lagru thanks for this! Sorry that I don't have time for a full review just yet, but this is something I've wanted for a long time! =) My three comments from a brief overview:
Will do a proper review soon! |
I think it might be possible to use _flood_fill_equal. However there are two disadvantages compared to the current solution:
But I'll try it out and see what the benchmarks say so that we can make a more informed decision.
Good idea! Using the object size sounds more intuitive. Currently the priority actually corresponds to the label ID which is assigned by C-order. I'll see what I can do.
That's something I'd like to know as well and a weak point of the current implementation.
However because a KDTree is such a nice data structure for this problem 😍 the performance is already on par with the "brute force" solution in #4024 and the code is way shorter! |
7adbb50
to
a1a8de2
Compare
Note to self: Find out if there is an efficient way to find the surface of objects. That should make significant performance gains possible because only the surface between two objects needs to be evaluated to find their distance to each other (the inside of objects can be ignored)! |
This function iterates over all objects (connected pixels that are True) inside an image and removes neighboring objects until all remaining ones are at least a minimal euclidean distance from each other.
_extrema_cy.pyx was cythonized twice for now reason.
Simplify construction of indices, use consistent name for structuring element (selem), some reformatting and improve documentation.
This trick significantly improves the performance by reducing the number of points inside the KDTree whose neighborhood needs to be evaluated for objects that are to close. The greater the checked objects size to surface ratio, the greater the improvement.
numpy.bincount seems to cast its input to dtype('intp') according to the rule safe. Therefore giving a dtype such as uint32 or int64 that can hold a larger number fails on 32-bit platforms.
15a2195
to
b44c417
Compare
With recent changes the evalutation order of objects changed.
The latter was added in NumPy 1.15 which is not yet a minimal requirement for scikit-image.
for unity spacing and if p-norm is 1 or 2. Also refactor code structure a little bit.
continue | ||
|
||
neighborhood = kdtree.query_ball_point( | ||
kdtree.data[i_indices, ...], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cross-reviewing from SciPy since there were some concerns expressed about the list
return type here (which we need because of the ragged data structures that are possible) for performance reasons.
A few questions:
- Do you have any sample timings/examples demonstrating the bottleneck/% time spent on this line for a typical use case?
- Any chance you could use the
workers
argument to speed up via concurrency? This might be particularly viable if you could accumulate the query points externally first. - How many points are you querying against in a typical use case? The reason I ask is that you seem to be querying the tree against itself, which may benefit from using
query_ball_tree
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
Do you have any sample timings/examples demonstrating the bottleneck/% time spent on this line for a typical use case?
Honestly, I haven't. Since the transition to meson, I'm not sure how to profile, debug or linetrace Cython files. Instead I tried to linetrace via a notebook without success. The approach suggested by Cython's docs
Details
import pstats, cProfile
import pyximport
pyximport.install()
import matplotlib.pyplot as plt
import skimage as ski
# Extract foreground by thresholding an image taken by the Hubble Telescope
image = ski.color.rgb2gray(ski.data.hubble_deep_field())
foreground = image > ski.filters.threshold_li(image)
objects = ski.measure.label(foreground)
cProfile.runctx(
"ski.morphology.remove_near_objects(objects, min_distance=5)",
globals(),
locals(),
"Profile.prof",
)
s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()
also doesn't work; it doesn't detect the _remove_near_objects
stuff. Do you have a solution to profile / debug meson + Cython on SciPy's side? 🙏 I took a quick look at your repo but didn't find anything.
Any chance you could use the workers argument to speed up via concurrency? This might be particularly viable if you could accumulate the query points externally first.
Hmm, I thought about it, but with the current logic this is tricky to parallelize. It may work if only calls for a single object are combined. I'll look into it!
How many points are you querying against in a typical use case? The reason I ask is that you seem to be querying the tree against itself, which may benefit from using query_ball_tree instead.
The current implementation has to query once for every pixel that is on an objects boundary / surface. Objects that were already removed in earlier iterations are skipped. For the case demonstrated in the new gallery example of this PR, I call query_ball_point
2,636 times (out of 20,505 samples, ~13%). This heavily depends on what min_distance
is used and how the objects surface to total size ratio is.
I tried query_ball_tree
and sparse_distance_matrix
, but this approach quickly runs into memory issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the following case
import numpy as np
import skimage as ski
image = ski.color.rgb2gray(ski.data.hubble_deep_field())
foreground1 = image > ski.filters.threshold_li(image) # many small objects
foreground2 = np.ones_like(foreground1)
foreground2[400, :] = 0 # two very large objects
foreground = np.concatenate([foreground1, foreground2])
objects = ski.measure.label(foreground)
%timeit ski.morphology.remove_near_objects(objects, min_distance=5)
%timeit ski.morphology.remove_near_objects(objects, min_distance=50)
%timeit ski.morphology.remove_near_objects(objects, min_distance=100)
%timeit ski.morphology.remove_near_objects(objects, min_distance=300)
Querying and entire object's indices at once with kdtree.query_ball_point(kdtree.data[start:stop], ...]
,
Patch
Subject: [PATCH] Query indices with the same object ID at once
---
Index: skimage/morphology/_misc_cy.pyx
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/skimage/morphology/_misc_cy.pyx b/skimage/morphology/_misc_cy.pyx
--- a/skimage/morphology/_misc_cy.pyx (revision c8756cae67078c1cfb05763a503792e05e842aef)
+++ b/skimage/morphology/_misc_cy.pyx (date 1713872591799)
@@ -46,34 +46,45 @@
The shape of the unraveled `image`.
"""
cdef:
- Py_ssize_t i_indices, j_indices # Loop variables to index `indices`
+ Py_ssize_t i_indices, j_indices, start, stop # Loop variables to index `indices`
Py_ssize_t i_out # Loop variable to index `out`
- np_anyint object_id, other_id
- list neighborhood
+ np_anyint start_id, stop_id
set remembered_ids
+ list n
remembered_ids = set()
- for i_indices in range(border_indices.shape[0]):
- i_out = border_indices[i_indices]
- object_id = out[i_out]
- # Skip if sample is part of a removed object
- if object_id == 0:
+ start = 0
+ start_id = out[border_indices[start]]
+ for stop in range(border_indices.shape[0]):
+ i_out = border_indices[stop]
+ stop_id = out[i_out]
+
+ if start_id == stop_id:
+ continue
+ elif start_id == 0:
+ start = stop
+ start_id = stop_id
continue
neighborhood = kdtree.query_ball_point(
- kdtree.data[i_indices, ...],
+ kdtree.data[start:stop, ...],
r=min_distance,
p=p_norm,
)
- for j_indices in neighborhood:
- # Check object IDs in neighborhood
- other_id = out[border_indices[j_indices]]
- if other_id != 0 and other_id != object_id:
- # If neighbor ID wasn't already removed or is the current one
- # remove the boundary and remember the ID
- _remove_object(out, border_indices, j_indices)
- remembered_ids.add(other_id)
+
+ for n in neighborhood:
+ for j_indices in n:
+ # Check object IDs in neighborhood
+ other_id = out[border_indices[j_indices]]
+ if other_id != 0 and other_id != start_id:
+ # If neighbor ID wasn't already removed or is the current one
+ # remove the boundary and remember the ID
+ _remove_object(out, border_indices, j_indices)
+ remembered_ids.add(other_id)
+ start = stop
+ start_id = out[border_indices[start]]
+
# Delete inner parts of remembered objects
for j_indices in range(inner_indices.shape[0]):
object_id = out[inner_indices[j_indices]]
is faster for min_distance=5
and 50
but a lot slower once min_distance
get's larger. Without profiling I'm not really sure about the reason. The implementation that queries single points only, still seems to be the one that's fastest overall.
Using workers
seems to introduce an overhead that, for most cases, is slower regardless of the specific implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice implementation!
I left mostly comments around documentation. And a general question: would it make sense to combine object removal by size into a more general filter_objects
?
Emscripten failure: https://github.com/scikit-image/scikit-image/actions/runs/9453937550/job/26040373784?pr=4165#step:8:548 That is NumPy 1.26, which I think we also test on in other targets 🤔 |
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
Co-authored-by: Stefan van der Walt <sjvdwalt@gmail.com>
@stefanv, I think I addressed most of your concerns. Sadly I don't have a clue what the pyodide specific failures are about. Also don't know an easy way to debug this locally yet. |
Hey @agriyakhetarpal, could I perhaps summon you in case you have an idea at a first glance? I/we can't figure out, why numpy is acting up for the emscripten job but nowhere else. |
Hi, @lagru – thanks for mentioning me, and please feel free to mention me on further issues that can arise; I fully intend to continue maintenance and help out after establishing this workflow. :) I had a cursory look and the issue seems to be something that I have encountered before. A stopgap-like fix is to convert the data to |
@seberg I believe this is a bug in NumPy on 32-bit platforms? In the mean time, we can work around it with a special |
Not sure it is a bug, but |
Thanks for looking at this, Sebastian. The fix isn't hard, but NumPy does not document the intp requirement as far as I can tell, nor does it throw a helpful error message. |
Tried to handle this in a more general sense. Not sure if this is the best option though.
skimage/morphology/misc.py
Outdated
if out_raveled.dtype.itemsize > np.intp().itemsize: | ||
# bincount expects intp (e.g. 32-bit on WASM), so down-cast to that | ||
priority = np.bincount(out_raveled.astype(np.intp, copy=False)) | ||
else: | ||
priority = np.bincount(out_raveled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I tried to address the WASM-bincount failure in e6446d6. Not sure if this is the best approach though. I wanted to avoid creating unnecessary up-casting of int8 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for chiming in @agriyakhetarpal & @seberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That didn't work. d749dd3 works though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what if the values are too large to be casts down to int32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused — aren't there non-wasm platforms where this would be an issue? Or is it because we no longer support 32-bit win or linux that we can get away with just is_wasm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this just force casts which is unfortunately something that indexing-like operations do in NumPy (they use same-kind), so if you manage to end up with a too large value, that could silently do the wrong (although I guess it's likely that even after the force cast you have huge values that cause an out of memory error).
If floats might come around using casting="same_kind"
would be better; you can do this unconditionally really, that is what the copy=False
is for.
if (spacing is None or np.all(spacing[0] == spacing)) and p_norm <= 2: | ||
# For unity spacing we can make the borders more sparse by using a | ||
# lower connectivity | ||
footprint = ndi.generate_binary_structure(out.ndim, 1) | ||
else: | ||
footprint = ndi.generate_binary_structure(out.ndim, out.ndim) | ||
border = ( | ||
ndi.maximum_filter(out, footprint=footprint) | ||
!= ndi.minimum_filter(out, footprint=footprint) | ||
).ravel()[indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why the connectivity is determined by the spacing and p-norm? I actually would have thought that the connectivity should be a user parameter and should default to 1 regardless of spacing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't yet. I wasn't sure if there are edge cases for spacing
and p_norm
where the using a higher-connected footprint would return the wrong results. So I opted to only use the lower connectivity for cases were I was sure that it was correct.
if object_id != 0 and object_id in remembered_ids: | ||
_remove_object(out, inner_indices, j_indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this fast?
I have an alternate implementation for this section that may be worth exploring:
(1) build a coo matrix -> csr matrix mapping object ids (rows) to raveled indices (columns). This can be done with:
sz = label_image.size
coo = sparse.coo_matrix(
(np.ones(sz), (label_image.ravel(), np.arange(sz))),
shape=(np.max(label_image), sz + 1),
)
csr = coo.tocsr()
(Bonus: csr.sum(axis=1)
is fast and gives you the bincount)
(2) Finding all the indices belonging to label i is then O(1) and they are adjacent in memory: idxs = csr.indices[csr.indptr[i]:csr.indptr[i+1]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Indeed I expect you should be able to iterate over removed indices in Python rather than Cython without much performance cost, since typically the size of the indices is much greater than the number of labels to be removed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed re-reading the code, it seems you are iterating over all indices and checking whether they are in the remembered objects — even if the object is a set and O(1) lookup, this seems less efficient than only iterating over remembered object ids and instantly recalling the relevant indices. I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds interesting! Let me explore this once I find the time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is there a typo in the code you are suggesting? Running
Details
import numpy as np
import scipy as sp
label_image = np.array(
[[8, 0, 0, 0, 0, 0, 0, 0, 0, 9, 9],
[8, 8, 8, 0, 0, 0, 0, 0, 0, 9, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7]]
)
sz = label_image.size
coo = sp.sparse.coo_matrix(
(np.ones(sz), (label_image.ravel(), np.arange(sz))),
shape=(np.max(label_image), sz + 1),
)
csr = coo.tocsr()
gives
ValueError: axis 0 index 9 exceeds matrix dimension 9
when calling coo_matrix
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I managed to get it working with
Details
Subject: [PATCH] Use sparse matrix to remove objects
---
Index: skimage/morphology/_misc_cy.pyx
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/skimage/morphology/_misc_cy.pyx b/skimage/morphology/_misc_cy.pyx
--- a/skimage/morphology/_misc_cy.pyx (revision d749dd316d40fd7d1ca71c74d170fd16b2c5143d)
+++ b/skimage/morphology/_misc_cy.pyx (date 1718287031000)
@@ -18,6 +18,7 @@
cnp.float64_t p_norm,
cnp.float64_t min_distance,
tuple shape,
+ csr,
):
"""Remove objects, in specified order, until remaining are a minimum distance apart.
@@ -70,15 +71,9 @@
other_id = out[border_indices[j_indices]]
if other_id != 0 and other_id != object_id:
# If neighbor ID wasn't already removed or is the current one
- # remove the boundary and remember the ID
- _remove_object(out, border_indices, j_indices)
- remembered_ids.add(other_id)
+ # remove the boundary
+ out.base[csr.indices[csr.indptr[other_id]:csr.indptr[other_id + 1]]] = 0
- # Delete inner parts of remembered objects
- for j_indices in range(inner_indices.shape[0]):
- object_id = out[inner_indices[j_indices]]
- if object_id != 0 and object_id in remembered_ids:
- _remove_object(out, inner_indices, j_indices)
cdef inline _remove_object(
Index: skimage/morphology/misc.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/skimage/morphology/misc.py b/skimage/morphology/misc.py
--- a/skimage/morphology/misc.py (revision d749dd316d40fd7d1ca71c74d170fd16b2c5143d)
+++ b/skimage/morphology/misc.py (date 1718287524197)
@@ -2,6 +2,7 @@
import numpy as np
import functools
+import scipy as sp
from scipy import ndimage as ndi
from scipy.spatial import cKDTree
@@ -401,18 +402,22 @@
border_indices = border_indices[np.argsort(out_raveled[border_indices])]
inner_indices = inner_indices[np.argsort(out_raveled[inner_indices])]
- if priority is None:
- if is_wasm:
- # bincount expects intp (32-bit) on WASM, so down-cast to that
- priority = np.bincount(out_raveled.astype(np.intp, copy=False))
- else:
- priority = np.bincount(out_raveled)
# `priority` can only be indexed by positive object IDs,
# `border_indices` contains all unique sorted IDs so check the lowest / first
smallest_id = out_raveled[border_indices[0]]
if smallest_id < 0:
raise ValueError(f"found object with negative ID {smallest_id!r}")
+ largest_id = out_raveled[border_indices[-1]]
+ coo = sp.sparse.coo_matrix(
+ (np.ones(out.size), (out_raveled, np.arange(out.size))),
+ shape=(largest_id + 1, out.size)
+ )
+ csr = coo.tocsr()
+
+ if priority is None:
+ priority = csr.sum(axis=1).base.ravel()
+
try:
# Sort by priority second using a stable sort to preserve the contiguous
# sorting of objects. Because each pixel in an object has the same
@@ -447,6 +452,7 @@
min_distance=min_distance,
p_norm=p_norm,
shape=label_image.shape,
+ csr=csr,
)
if out_raveled.base is not out:
However, benchmarking with
import skimage as ski
import scipy as sp
image = ski.data.hubble_deep_field()
image = ski.color.rgb2gray(image)
objects = image > 0.18
label_image, _ = sp.ndimage.label(objects)
%timeit ski.morphology.remove_objects_by_distance(label_image, 1)
%timeit ski.morphology.remove_objects_by_distance(label_image, 5)
%timeit ski.morphology.remove_objects_by_distance(label_image, 100)
%timeit ski.morphology.remove_objects_by_distance(label_image, 500)
Shows that the current Cython-loop based approach is still faster.
- Current Cython loop
364 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
346 ms ± 5.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
198 ms ± 2.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
236 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - Sparse-matrix approach
372 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
351 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
215 ms ± 3.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
245 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might still be worth the switch though, as the logic should be way easier to follow...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic should be way easier to follow...?
Yes, and, you're currently doing redundant work, because the csr approach is sorting the indices for you. So you should be able to get rid of the argsort calls. (But it might be a bit tricky to deal with border/inner. But should be doable e.g. with a mask I think.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, since this may take me some time to figure out. Do you mind if we added that optimization in another PR? That way we can get this into the next release which we want to do soon. This PR also has kind of a bad track record with getting stalled again because attention shifts again. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Might be worth adding a # TODO
comment to the code pointing to this discussion, but whatever works for you. 😊
Co-authored-by: Juan Nunez-Iglesias <jni@fastmail.com>
I am merging this and making a 0.24rc0 release. Thanks! |
Description
Replaces #4024 with an implementation based on SciPy's cKDTree.
I propose to add a function that iterates over objects (connected pixels that are not zero) inside a
binaryimage and removes neighboring objects until all remaining ones are more than a minimal euclidean distance from each other.Furthermore it moves the functionsDone in #4209_resolve_neighborhood
,_set_edge_values_inplace
and_fast_pad
intomorphology._util
. They are used for several nd-algorithms throughout the submodule so that seems like a sensible place to keep them.This function might be used to implement a distance "filtering" for functions like
local_maxima
as well (closes #3816).Checklist
./doc/examples
for new featuresRelease note
For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.