-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find objects bounding boxes #240
Conversation
There are still some flaws I can't quite work out. I'd appreciate thoughts here.
Details:distributed.worker - WARNING - Compute Failed
Function: combine
args: (Dask DataFrame Structure:
0 1
npartitions=1
111 object object
222 ... ...
Dask Name: from_pandas, 1 tasks, Dask DataFrame Structure:
0 1
npartitions=1
222 object object
333 ... ...
Dask Name: from_pandas, 1 tasks, <function _combine_dataframes at 0x7fb7c8688e50>)
kwargs: {}
Exception: ValueError('Metadata inference failed in `combine`.\n\nYou have supplied a custom function and Dask is unable to \ndetermine the type of output that that function returns. \n\nTo resolve this please provide a meta= keyword.\nThe docstring of the Dask function you ran should have more information.\n\nOriginal error is below:\n------------------------\nAttributeError("\'str\' object has no attribute \'start\'")\n\nTraceback:\n---------\n File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error\n yield\n File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py", line 5612, in _emulate\n return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))\n File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py", line 963, in __call__\n return getattr(obj, self.method)(*args, **kwargs)\n File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/frame.py", line 6383, in combine\n arr = func(series, otherSeries)\n File "/tmp/ipykernel_283799/2299891482.py", line 54, in _combine_dataframes\n File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/series.py", line 2929, in combine\n new_values.append(func(lv, rv))\n File "/tmp/ipykernel_283799/2299891482.py", line 48, in _combine_series\n')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py in raise_on_meta_error()
175 try:
--> 176 yield
177 except Exception as e:
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in _emulate()
5611 with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 5612 return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
5613
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py in __call__()
962 def __call__(self, obj, *args, **kwargs):
--> 963 return getattr(obj, self.method)(*args, **kwargs)
964
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/frame.py in combine()
6382
-> 6383 arr = func(series, otherSeries)
6384 arr = maybe_downcast_to_dtype(arr, new_dtype)
/tmp/ipykernel_283799/2299891482.py in _combine_dataframes()
53 def _combine_dataframes(s1, s2):
---> 54 combined = s1.combine(s2, _combine_series)
55 return combined
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/series.py in combine()
2928 with np.errstate(all="ignore"):
-> 2929 new_values.append(func(lv, rv))
2930 else:
/tmp/ipykernel_283799/2299891482.py in _combine_series()
47 else:
---> 48 start = min(a.start, b.start)
49 stop = max(a.stop, b.stop)
AttributeError: 'str' object has no attribute 'start'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/tmp/ipykernel_283799/2906121027.py in <module>
----> 1 result.compute().compute()
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/base.py in compute(self, **kwargs)
283 dask.base.compute
284 """
--> 285 (result,) = compute(self, traverse=False, **kwargs)
286 return result
287
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs)
565 postcomputes.append(x.__dask_postcompute__())
566
--> 567 results = schedule(dsk, keys, **kwargs)
568 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
569
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
2705 should_rejoin = False
2706 try:
-> 2707 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
2708 finally:
2709 for f in futures.values():
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
2019 else:
2020 local_worker = None
-> 2021 return self.sync(
2022 self._gather,
2023 futures,
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
860 return future
861 else:
--> 862 return sync(
863 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
864 )
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
336 if error[0]:
337 typ, exc, tb = error[0]
--> 338 raise exc.with_traceback(tb)
339 else:
340 return result[0]
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/utils.py in f()
319 if callback_timeout is not None:
320 future = asyncio.wait_for(future, callback_timeout)
--> 321 result[0] = yield future
322 except Exception:
323 error[0] = sys.exc_info()
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/tornado/gen.py in run(self)
760
761 try:
--> 762 value = future.result()
763 except Exception:
764 exc_info = sys.exc_info()
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
1884 exc = CancelledError(key)
1885 else:
-> 1886 raise exception.with_traceback(traceback)
1887 raise exc
1888 if errors == "skip":
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py in __call__()
961
962 def __call__(self, obj, *args, **kwargs):
--> 963 return getattr(obj, self.method)(*args, **kwargs)
964
965 def __reduce__(self):
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in combine()
2786 @derived_from(pd.DataFrame)
2787 def combine(self, other, func, fill_value=None, overwrite=True):
-> 2788 return self.map_partitions(
2789 M.combine, other, func, fill_value=fill_value, overwrite=overwrite
2790 )
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in map_partitions()
689 None as the division.
690 """
--> 691 return map_partitions(func, self, *args, **kwargs)
692
693 @insert_meta_param_description(pad=12)
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in map_partitions()
5666 # Use non-normalized kwargs here, as we want the real values (not
5667 # delayed values)
-> 5668 meta = _emulate(func, *args, udf=True, **kwargs)
5669 else:
5670 meta = make_meta(meta, index=meta_index, parent_meta=parent_meta)
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py in _emulate()
5610 """
5611 with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 5612 return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
5613
5614
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/contextlib.py in __exit__()
129 value = type()
130 try:
--> 131 self.gen.throw(type, value, traceback)
132 except StopIteration as exc:
133 # Suppress StopIteration *unless* it's the same exception that
~/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py in raise_on_meta_error()
195 )
196 msg = msg.format(" in `{0}`".format(funcname) if funcname else "", repr(e), tb)
--> 197 raise ValueError(msg) from e
198
199
ValueError: Metadata inference failed in `combine`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
AttributeError("'str' object has no attribute 'start'")
Traceback:
---------
File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error
yield
File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/dataframe/core.py", line 5612, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/dask/utils.py", line 963, in __call__
return getattr(obj, self.method)(*args, **kwargs)
File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/frame.py", line 6383, in combine
arr = func(series, otherSeries)
File "/tmp/ipykernel_283799/2299891482.py", line 54, in _combine_dataframes
File "/home/genevieve/anaconda3/envs/dask-image-dev-moderndask/lib/python3.8/site-packages/pandas/core/series.py", line 2929, in combine
new_values.append(func(lv, rv))
File "/tmp/ipykernel_283799/2299891482.py", line 48, in _combine_series
EDIT: Re problem 2 |
Some other points (not problems, just things to note) I think it's unavoidable that we'll have to use delayed. We can't use the block_info keyword argument directly if map_blocks returns a non-array type of output. We want to return a dictionary or dataframe. See details at dask/dask#7921. Additionally, even if we do fix this issue, then we need a more modern version of dask to get map_blocks to return dataframe output. But upgrading the version of dask (eg: to 2021.6.2) causes several other tests to fail in ndmeasure. Second, I think it's better to avoid using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Genevieve, I think this is great functionality for dask-image
. I left you some comments in review boxes :)
I'm hoping dask/dask#7851 isn't going to be a problem here (might not be, but it's a good idea to try this on something of a decent size) |
I'd forgotten we'd left this one hanging. Reviewing the thread, it seems like this is the blocker. |
Found and fixed the bug holding us up. Will merge today. |
Final note: this new function is not compatible with cupy. That'll be a thing to work on later, see #253 |
Here's an alternative approach for finding bounding boxes for labelled objects within an image. Closes #96
This particular implementation would mean you don't need to know the maximum label number ahead of time (an advantage over #97).
The approach is roughly as described by @jni here: