Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find objects bounding boxes #240

Merged
merged 24 commits into from
Dec 17, 2021
Merged

Conversation

GenevieveBuckley
Copy link
Collaborator

@GenevieveBuckley GenevieveBuckley commented Jul 22, 2021

Here's an alternative approach for finding bounding boxes for labelled objects within an image. Closes #96

This particular implementation would mean you don't need to know the maximum label number ahead of time (an advantage over #97).

The approach is roughly as described by @jni here:

  1. For each image chunk, find the bounding boxes for each integer label (except the zero background)
  2. Store this information in a dictionary or dataframe, so each integer label is matched to the bounding box.
  3. Merge all these together, accounting for the fact that some objects will span multiple image chunks.

@GenevieveBuckley
Copy link
Collaborator Author

GenevieveBuckley commented Jul 22, 2021

There are still some flaws I can't quite work out. I'd appreciate thoughts here.

  • Problem 1: Right now I need to call compute twice at the end to see the result, so I think I've done something wrong. I've been careful to only delay a single function, so I'm not sure why it's like this. Adding pdb statements isn't helping me drop into problem areas to check.
  • Problem 2: I think we should be able to have the dataframes be Dask dataframes instead of pandas dataframes. But when I change _find_bounding_boxes to return a Dask dataframe with one partition, then later functions complain about not knowing what meta should be & my attempts to add it have not been successful. Expand the details below to see the error message.
Details:
distributed.worker - WARNING - Compute Failed
Function:  combine
args:      (Dask DataFrame Structure:
                    0       1
npartitions=1                
111            object  object
222               ...     ...
Dask Name: from_pandas, 1 tasks, Dask DataFrame Structure:
                    0       1
npartitions=1                
222            object  object
333               ...     ...
Dask Name: from_pandas, 1 tasks, <function _combine_dataframes at 0x7fb7c8688e50>)
kwargs:    {}
Exception: ValueError('Metadata inference failed in `combine`.\n\nYou have supplied a custom function and Dask is unable to \ndetermine the type of output that that function returns. \n\nTo resolve this please provide a meta= keyword.\nThe docstring of the Dask function you ran should have more information.\n\nOriginal error is below:\n------------------------\nAttributeError("\'str\' object has no attribute \'start\'")\n\nTraceback:\n---------\n  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error\n    yield\n  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py", line 5612, in _emulate\n    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))\n  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py", line 963, in __call__\n    return getattr(obj, self.method)(*args, **kwargs)\n  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/frame.py", line 6383, in combine\n    arr = func(series, otherSeries)\n  File "/tmp/ipykernel_283799/2299891482.py", line 54, in _combine_dataframes\n  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/series.py", line 2929, in combine\n    new_values.append(func(lv, rv))\n  File "/tmp/ipykernel_283799/2299891482.py", line 48, in _combine_series\n')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py in raise_on_meta_error()
    175     try:
--> 176         yield
    177     except Exception as e:

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in _emulate()
   5611     with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 5612         return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
   5613 

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py in __call__()
    962     def __call__(self, obj, *args, **kwargs):
--> 963         return getattr(obj, self.method)(*args, **kwargs)
    964 

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/frame.py in combine()
   6382 
-> 6383             arr = func(series, otherSeries)
   6384             arr = maybe_downcast_to_dtype(arr, new_dtype)

/tmp/ipykernel_283799/2299891482.py in _combine_dataframes()
     53 def _combine_dataframes(s1, s2):
---> 54     combined = s1.combine(s2, _combine_series)
     55     return combined

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/series.py in combine()
   2928                 with np.errstate(all="ignore"):
-> 2929                     new_values.append(func(lv, rv))
   2930         else:

/tmp/ipykernel_283799/2299891482.py in _combine_series()
     47     else:
---> 48         start = min(a.start, b.start)
     49         stop = max(a.stop, b.stop)

AttributeError: 'str' object has no attribute 'start'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_283799/2906121027.py in <module>
----> 1 result.compute().compute()

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/base.py in compute(self, **kwargs)
    283         dask.base.compute
    284         """
--> 285         (result,) = compute(self, traverse=False, **kwargs)
    286         return result
    287 

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs)
    565         postcomputes.append(x.__dask_postcompute__())
    566 
--> 567     results = schedule(dsk, keys, **kwargs)
    568     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    569 

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2705                     should_rejoin = False
   2706             try:
-> 2707                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   2708             finally:
   2709                 for f in futures.values():

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
   2019             else:
   2020                 local_worker = None
-> 2021             return self.sync(
   2022                 self._gather,
   2023                 futures,

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    860             return future
    861         else:
--> 862             return sync(
    863                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    864             )

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    336     if error[0]:
    337         typ, exc, tb = error[0]
--> 338         raise exc.with_traceback(tb)
    339     else:
    340         return result[0]

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/utils.py in f()
    319             if callback_timeout is not None:
    320                 future = asyncio.wait_for(future, callback_timeout)
--> 321             result[0] = yield future
    322         except Exception:
    323             error[0] = sys.exc_info()

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/tornado/gen.py in run(self)
    760 
    761                     try:
--> 762                         value = future.result()
    763                     except Exception:
    764                         exc_info = sys.exc_info()

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1884                             exc = CancelledError(key)
   1885                         else:
-> 1886                             raise exception.with_traceback(traceback)
   1887                         raise exc
   1888                     if errors == "skip":

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py in __call__()
    961 
    962     def __call__(self, obj, *args, **kwargs):
--> 963         return getattr(obj, self.method)(*args, **kwargs)
    964 
    965     def __reduce__(self):

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in combine()
   2786     @derived_from(pd.DataFrame)
   2787     def combine(self, other, func, fill_value=None, overwrite=True):
-> 2788         return self.map_partitions(
   2789             M.combine, other, func, fill_value=fill_value, overwrite=overwrite
   2790         )

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in map_partitions()
    689         None as the division.
    690         """
--> 691         return map_partitions(func, self, *args, **kwargs)
    692 
    693     @insert_meta_param_description(pad=12)

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in map_partitions()
   5666         # Use non-normalized kwargs here, as we want the real values (not
   5667         # delayed values)
-> 5668         meta = _emulate(func, *args, udf=True, **kwargs)
   5669     else:
   5670         meta = make_meta(meta, index=meta_index, parent_meta=parent_meta)

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in _emulate()
   5610     """
   5611     with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 5612         return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
   5613 
   5614 

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/contextlib.py in __exit__()
    129                 value = type()
    130             try:
--> 131                 self.gen.throw(type, value, traceback)
    132             except StopIteration as exc:
    133                 # Suppress StopIteration *unless* it's the same exception that

~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py in raise_on_meta_error()
    195         )
    196         msg = msg.format(" in `{0}`".format(funcname) if funcname else "", repr(e), tb)
--> 197         raise ValueError(msg) from e
    198 
    199 

ValueError: Metadata inference failed in `combine`.

You have supplied a custom function and Dask is unable to 
determine the type of output that that function returns. 

To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.

Original error is below:
------------------------
AttributeError("'str' object has no attribute 'start'")

Traceback:
---------
  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error
    yield
  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py", line 5612, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py", line 963, in __call__
    return getattr(obj, self.method)(*args, **kwargs)
  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/frame.py", line 6383, in combine
    arr = func(series, otherSeries)
  File "/tmp/ipykernel_283799/2299891482.py", line 54, in _combine_dataframes
  File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/series.py", line 2929, in combine
    new_values.append(func(lv, rv))
  File "/tmp/ipykernel_283799/2299891482.py", line 48, in _combine_series

EDIT: Re problem 2
Ah, now I look closer at the dask dataframe API it seems the combine function isn't available in Dask. I'm going to move back to merge, which is what I'd been fiddling around with before. An outer join is a good first step, then it's a question of applying a function to each row.

@GenevieveBuckley
Copy link
Collaborator Author

Some other points (not problems, just things to note)

I think it's unavoidable that we'll have to use delayed.

We can't use the block_info keyword argument directly if map_blocks returns a non-array type of output. We want to return a dictionary or dataframe. See details at dask/dask#7921. Additionally, even if we do fix this issue, then we need a more modern version of dask to get map_blocks to return dataframe output. But upgrading the version of dask (eg: to 2021.6.2) causes several other tests to fail in ndmeasure.

Second, I think it's better to avoid using the scipy.ndimage.find_objects function directly. If you have an image chunk with just one object with a really high integer label n, the scipy find_objects result will return n - 1 values of None, and then the single meaningful result. That seems bad for parallized applications, so I think looping through only the unique integer values present in a given image chunk is a better way to go.

@GenevieveBuckley
Copy link
Collaborator Author

Here's the task graph for a small example (not the one in the tests, but a small image of blobs with 18 objects spread over four image chunks).

image

@GenevieveBuckley
Copy link
Collaborator Author

And after the first compute call (why do I need to call compute twice, what am I doing wrong), the task graph for the same example looks like this:

image

Copy link
Collaborator

@m-albert m-albert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Genevieve, I think this is great functionality for dask-image. I left you some comments in review boxes :)

dask_image/ndmeasure/_utils/_find_objects.py Show resolved Hide resolved
dask_image/ndmeasure/_utils/_find_objects.py Outdated Show resolved Hide resolved
tests/test_dask_image/test_ndmeasure/test_find_objects.py Outdated Show resolved Hide resolved
dask_image/ndmeasure/_utils/_find_objects.py Outdated Show resolved Hide resolved
dask_image/ndmeasure/_utils/_find_objects.py Show resolved Hide resolved
tests/test_dask_image/test_ndmeasure/test_find_objects.py Outdated Show resolved Hide resolved
@GenevieveBuckley
Copy link
Collaborator Author

I'm hoping dask/dask#7851 isn't going to be a problem here (might not be, but it's a good idea to try this on something of a decent size)

@GenevieveBuckley
Copy link
Collaborator Author

I'd forgotten we'd left this one hanging. Reviewing the thread, it seems like this is the blocker.

@GenevieveBuckley
Copy link
Collaborator Author

Found and fixed the bug holding us up. Will merge today.

@GenevieveBuckley
Copy link
Collaborator Author

Final note: this new function is not compatible with cupy. That'll be a thing to work on later, see #253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add find_objects
2 participants