Dask backend execution #2557

gerrymanoim · 2020-12-03T03:35:20Z

Part 3 of #2537.

See #2553 for (much improved) tracking anything that is xfailed in dask, but not in pandas. Functions/tests expected to fail are marked with TODO - <reason keyword> #2553.

Notes/caveats:

We borrow functions/registrations from the pandas backend when we can.
Aggregations/groupin/udf/windows are not implemented.

gerrymanoim · 2020-12-03T04:09:13Z

Looks like there's a join test that's flaky and some tests in ibis/test/all that are having some trouble with dask DataFrames - will take a look.

jreback

so as i look thru, i am seeing a lot of code copying. so 2 choices here

you can simply import the actual pandas implmentation that is generic (e.g. literally import the function) and then just call it within the dask registered function

OR

move the implmenetation to a common location & register both pandas and dask.

Both of these are solutions are fine. What I would do is basically merge all the tests, but xfail them. in a pre-cursor PR. then as ops are defined can xfail the tests.

ibis/backends/dask/client.py

ibis/backends/dask/core.py

ibis/backends/dask/execution/__init__.py

ibis/backends/dask/execution/arrays.py

ibis/backends/dask/execution/generic.py

jreback · 2020-12-03T19:44:05Z

ibis/backends/dask/execution/maps.py

+    return data.map(get)
+
+
+# Note - to avoid dispatch ambiguities we must unregister pandas


really? why? these are a dd.Series should override here? (now a way around this is to in a common location you simply register with both types, e.g. in a dask/pandas ops section). we can do that as a followup, though maybe makes sense to split it out now (before this PR).

If you don't unregister these, https://github.com/ibis-project/ibis/blob/master/ibis/backends/pandas/tests/test_core.py#L44 will fail. If I understand the issue correct it is that multipledisptach sees

(ops.MapValueOrDefaultForKey, collections.abc.Mapping, pandas.Series, object,) (ops.MapValueOrDefaultForKey, collections.abc.Mapping, object, pandas.Series,) (ops.MapValueOrDefaultForKey, collections.abc.Mapping, dd.Series, object,) (ops.MapValueOrDefaultForKey, collections.abc.Mapping, object, dd.Series,)

as an ambiguous set of registrations. via https://multiple-dispatch.readthedocs.io/en/latest/resolution.html#ambiguities.

hmm, this looks like we should actually just register both (e.g. do this as a common set).

though why are these ambiguous?

Good question - that's the source of the failing pandas test as well. I think what is happening is multipledispatch sees two ambiguities:

(ops.MapValueOrDefaultForKey, collections.abc.Mapping, object, pandas.Series,) (ops.MapValueOrDefaultForKey, collections.abc.Mapping, dd.Series, object,)

and separately

(ops.MapValueOrDefaultForKey, collections.abc.Mapping, pandas.Series, object,) (ops.MapValueOrDefaultForKey, collections.abc.Mapping, object, dd.Series,)

multipledispatch sees this as ambiguous because it won't know what do do if you call this with a (pandas.Series, dd.Series) object. I've changed the registration around in c125fe7 to both avoid the ambiguity and keep dispatch working correctly in pandas (even if you have also loaded the dask backend at the same time). I tried to clarify in the code why we were doing it this way.

Let me know if this was a reasonable approach and the commentary is clear, or you want this done a different way.

ibis/backends/dask/execution/selection.py

gerrymanoim · 2020-12-03T21:39:36Z

RE the xfail at: https://github.com/gerrymanoim/ibis/blob/ibis-dask-execution/ibis/backends/dask/tests/test_datatypes.py#L89, the issue here is https://github.com/ibis-project/ibis/blob/master/ibis/expr/types.py#L1256 imports pandas directly, detects pd.Series and converts there. I'm not sure how we want this changed - or does this test not apply to the dask backend?

jreback · 2020-12-03T22:13:47Z

RE the xfail at: https://github.com/gerrymanoim/ibis/blob/ibis-dask-execution/ibis/backends/dask/tests/test_datatypes.py#L89, the issue here is https://github.com/ibis-project/ibis/blob/master/ibis/expr/types.py#L1256 imports pandas directly, detects pd.Series and converts there. I'm not sure how we want this changed - or does this test not apply to the dask backend?

just create an issue for now. need to re-write this with a single dispatch rule to handle this instead of the way its done.

gerrymanoim · 2020-12-03T23:01:15Z

Discovered an issue with how generic ibis tests were being run on the dask backend.

A lot of these newer test failures have to do with how the tests themselves are written. For example, https://github.com/gerrymanoim/ibis/blob/ibis-dask-execution/ibis/tests/all/test_string.py#L17 has the line:

df.string_col.map(is_text_type).all()

which is not a valid dask expression (the valid one would be df.string_col.map(is_text_type).compute().all())

Not sure what the best option is for these tests.

jreback · 2020-12-03T23:11:43Z

Discovered an issue with how generic ibis tests were being run on the dask backend.

A lot of these newer test failures have to do with how the tests themselves are written. For example, https://github.com/gerrymanoim/ibis/blob/ibis-dask-execution/ibis/tests/all/test_string.py#L17 has the line:
df.string_col.map(is_text_type).all()
which is not a valid dask expression (the valid one would be df.string_col.map(is_text_type).compute().all())

Not sure what the best option is for these tests.

the compute() should only be for execution. what's the issue here?

gerrymanoim · 2020-12-03T23:15:21Z

Discovered an issue with how generic ibis tests were being run on the dask backend.
A lot of these newer test failures have to do with how the tests themselves are written. For example, https://github.com/gerrymanoim/ibis/blob/ibis-dask-execution/ibis/tests/all/test_string.py#L17 has the line:
df.string_col.map(is_text_type).all()
which is not a valid dask expression (the valid one would be df.string_col.map(is_text_type).compute().all())
Not sure what the best option is for these tests.
the compute() should only be for execution. what's the issue here?

Running the code there gives you

E   TypeError: Trying to convert dd.Scalar<series-..., dtype=bool> to a boolean value. Because Dask objects are lazily evaluated, they cannot be converted to a boolean value or used in boolean conditions like if statements. Try calling .compute() to force computation prior to converting to a boolean value or using in a conditional statement.

for example.

Another example -

Several tests use the fixture sorted_df which is defined in https://github.com/ibis-project/ibis/blob/master/ibis/tests/all/conftest.py#L195. dask.DataFrame has no sort_values method, causing a bunch of the all/test_generic.py tests to error out.

gerrymanoim · 2020-12-04T16:49:30Z

Both of these are solutions are fine. What I would do is basically merge all the tests, but xfail them. in a pre-cursor PR. then as ops are defined can xfail the tests.

Originally I was going to break up this PR, but a difficulty is that tests depend on lots of different pieces from execution, so I ended up coding up most of it just to get test_arrays to run (for example).

I think my next steps here are:

Fix the pandas test that's broken
Break up/reorganize generic.py (aggregations, indexing, etc)
Refactor anything where the pandas code is generic and we just need to register the dask types.
Run coverage to make sure we're hitting code paths you pointed out
Build a table of which dask things are xfailed and why, and document that in a reasonable way on the PR.
Figure out what to do with the ibis/tests/all tests that fail with dask

Does that sound like reasonable plan?

jreback · 2020-12-04T17:00:05Z

Both of these are solutions are fine. What I would do is basically merge all the tests, but xfail them. in a pre-cursor PR. then as ops are defined can xfail the tests.

Originally I was going to break up this PR, but a difficulty is that tests depend on lots of different pieces from execution, so I ended up coding up most of it just to get test_arrays to run (for example).

I think my next steps here are:

Fix the pandas test that's broken

Break up/reorganize generic.py (aggregations, indexing, etc)

Refactor anything where the pandas code is generic and we just need to register the dask types.

Run coverage to make sure we're hitting code paths you pointed out

Build a table of which dask things are xfailed and why, and document that in a reasonable way on the PR.

Figure out what to do with the ibis/tests/all tests that fail with dask

Does that sound like reasonable plan?

plan sounds good.

ibis/backends/dask/execution/__init__.py

ibis/backends/dask/execution/arrays.py

ibis/backends/dask/execution/generic.py

icexelloss · 2020-12-04T22:54:45Z

ibis/backends/dask/execution/generic.py

+        replacement
+        if isnull(value)
+        else dd.from_pandas(
+            pd.Series(value, index=replacement.index), npartitions=1


How does this work? Isn't replace.index a distributed index? Will this trigger any data materialization?

This is a good question - I think this is done lazily

In [34]: dask_base Out[34]: Dask Series Structure: npartitions=1 0 datetime64[ns] 2 ... dtype: datetime64[ns] Dask Name: from_pandas, 1 tas In [32]: dd.from_pandas(pd.Series(5, index=dask_base.index), npartitions=1) Out[32]: Dask Series Structure: npartitions=1 0 int64 2 ... dtype: int64 Dask Name: from_pandas, 1 tasks In [33]: dd.from_pandas(pd.Series(5, index=dask_base.index), npartitions=1).compute() Out[33]: 0 5 1 5 2 5 dtype: int64

a lot of operations like this will trigger materialization at times. this is ok. its on the worker in a partition. its usually much easier to just do it this way.

that is the type of the value and replacement here? Also why do we return a series of a single partition? Should this method return the same partitions as the input dd.Series?

I should express the partitions in chunks rather than having a static number - fixing.

ibis/backends/dask/execution/generic.py

icexelloss · 2020-12-04T22:58:13Z

ibis/backends/dask/execution/generic.py

+            raw_1d.astype(constants.IBIS_TYPE_TO_PANDAS_TYPE[expr.type()])
+        )
+    # TODO - we force computation here
+    if isinstance(expr, ir.ScalarExpr) and result.size.compute() == 1:


Why are we forcing computation here?

We need to actually know the result.size to do the boolean test. I tried to be explicit about places where we're forcing compute so we can come back and adjust them if we find better ways of lazily doing these checks.

kk sounds fine. We can look later. Should try to get rid of these computation in the complication step as much as possible.

Agreed on the general point of aiming to not materialize as much as possible.

gerrymanoim · 2020-12-11T23:45:32Z

@jreback @icexelloss

Quick update:

I believe I'm done with refactoring and all tests are green locally.
I still need to run coverage and address concerns above about execute_cast_series_timestamp above.
Yesterday dask looks like they released a new version that broke a bunch of tests. For now I'm pinning the October version in requirements/CI (2.30.0) and will come back to updating. Maybe as a separate PR? I don't know what versions of dask we'd like to support.
Since TST: Refactoring backend tests #2566 got merged, I need to rebase this PR.

Once I rebase and get CI tests green I'll mark this ready to review and ping you again.

jreback · 2020-12-12T00:19:23Z

@gerrymanoim sounds great ping when ready

pinning ok for now

gerrymanoim · 2020-12-12T22:39:37Z

Ah - looks like pandas got bumped in the CI to 1.1, which was breaking a bunch of my tests (pandas 1.1 and dask 2.30 do not get along). After bumping dask as well, everything looks green.

jreback

ok a few comments. this looks reasonable. ok to do the removal of generic.py in a followon. will look once again, but then prob ok to merge. like to do some cleaning in small pieces rather than big changes first.

.github/workflows/main.yml

ci/deps/dask.yml

ibis/backends/dask/execution/aggregations.py

ibis/backends/dask/execution/decimal.py

jreback · 2020-12-13T16:49:00Z

ibis/backends/dask/execution/generic.py

+
+    if isinstance(from_type, (dt.Timestamp, dt.Date)):
+        return data.astype(
+            'M8[ns]' if tz is None else DatetimeTZDtype('ns', tz)


this is likley a bug, mark the test xfail

jreback · 2020-12-13T16:51:01Z

ibis/backends/dask/execution/generic.py

+    return result
+
+
+@execute_node.register(ops.SimpleCase, dd.Series, list, list, object)


can you either organize this file into sections, with comments indicating the type of ops or split to separate files.

ok i see you did split out a lot of things. I guess just need to split the rest and remove the need for generic.py.

If done as a follow on - should pandas and dask be updated at the same time?

let's do separate PRs for these (can be done any order and all should be orthogonal).

ibis/backends/dask/execution/maps.py

ibis/backends/dask/execution/numeric.py

jreback · 2020-12-13T17:02:08Z

ibis/backends/tests/test_numeric.py

@@ -101,7 +101,10 @@ def test_isnan_isinf(
        expected = backend.default_series_rename(expected)
        backend.assert_series_equal(result, expected)
    else:
-        assert result == expected
+        try:


why this change? can you simply change the expected or make a dask specific test? catching an exception in a test is generally a no-no as it hides things

This touches on a general issue I had with tests where many of them implicitly assume you're running pandas in the test (see anything with pandas in #2553 (comment)).

Here - I wanted to support equality for dd.Series objects without adding an explicit dependency on dask for running the general tests.

I could change this to something like if 'dd' in locals() and isinstance(expected, dd.Series):

However - it seems that ibs/backends/tests now assumes you have requirements for all envs installed. Is that intended?

I think no, this is something that @datapythonista removed, are you rebased on master (which reverted the last change).

Rebased now. So given that I don't necessarily want a dask dependency to run these tests, maybe these checks would be better than the exception?

"dask" in str(type(df)

or

hasattr(df, "compute")

gerrymanoim · 2020-12-14T23:27:59Z

@icexelloss @jreback - I'm rebasing now after which tests should be green (I'll confirm) and I'll be ready for more comments/re-review. I'm also addressing some of Jeff's comments above.

gerrymanoim · 2020-12-15T02:49:36Z

The failure is from azure pipelines taking longer than 1 hour to build the docs.

gerrymanoim · 2020-12-15T18:27:10Z

https://dev.azure.com/ibis-project/ibis/_build/results?buildId=3987&view=logs&j=8f09edc2-e3b7-52de-126a-0225c4f3efa1&t=8f09edc2-e3b7-52de-126a-0225c4f3efa1 worked fine without me changing anything. Maybe some issue with the conda solve last night.

jreback

https://dev.azure.com/ibis-project/ibis/_build/results?buildId=3987&view=logs&j=8f09edc2-e3b7-52de-126a-0225c4f3efa1&t=8f09edc2-e3b7-52de-126a-0225c4f3efa1 worked fine without me changing anything. Maybe some issue with the conda solve last night.

yeah ok

.github/workflows/main.yml

ci/deps/dask.yml

jreback · 2020-12-15T23:39:25Z

ibis/backends/dask/execution/generic.py

+
+    if isinstance(from_type, (dt.Timestamp, dt.Date)):
+        return data.astype(
+            'M8[ns]' if tz is None else DatetimeTZDtype('ns', tz)


hmm what does the pandas backend do here? (note not really concerned about this right now, just add a TODO and can followup later)

jreback · 2020-12-15T23:41:43Z

ibis/backends/dask/execution/generic.py

+        timestamps = data.map_partitions(
+            to_datetime,
+            infer_datetime_format=True,
+            meta=(data.name, 'datetime64[ns]'),


yeah this is prb ok

jreback · 2020-12-15T23:42:33Z

ibis/backends/dask/execution/generic.py

+            meta=(data.name, 'datetime64[ns]'),
+        )
+        # TODO - is there a better way to do this
+        timestamps = timestamps.astype(timestamps.head(1).dtype)


yeah this is prob ok, but just add a todo and can look at later

jreback · 2020-12-15T23:45:13Z

ibis/backends/dask/execution/generic.py

+        replacement
+        if isnull(value)
+        else dd.from_pandas(
+            pd.Series(value, index=replacement.index), npartitions=1


a lot of operations like this will trigger materialization at times. this is ok. its on the worker in a partition. its usually much easier to just do it this way.

jreback · 2020-12-15T23:47:40Z

ibis/backends/dask/execution/numeric.py

+
+
+def vectorize_object(op, arg, *args, **kwargs):
+    # TODO - this works for now, but I think we can do something much better


yeah this almost doesn't even matter. its going the op on object dtype with numeric data; this is for numpy compat and likely never hit in any import case.

jreback · 2020-12-15T23:55:43Z

looks good. plenty of followups! asked @icexelloss and @emilyreff7 to have a look.

icexelloss · 2020-12-17T15:58:17Z

ibis/backends/dask/execution/generic.py

+
+@execute_node.register(ops.NullIf, simple_types, dd.Series)
+def execute_node_nullif_scalar_series(op, value, series, **kwargs):
+    # TODO - not preserving the index


I am curious why not preserving the index here? Does the input/output Series have the same index?

Since we are doing a da.where on the dd.Series.values that will not carry the index information with it.

ibis/backends/dask/execution/join.py

icexelloss · 2020-12-17T16:04:41Z

ibis/backends/dask/execution/join.py

+
+# TODO - execute_materialized_join - #2553
+@execute_node.register(ops.Join, dd.DataFrame, dd.DataFrame)
+def execute_materialized_join(op, left, right, **kwargs):


What is materialized join?

I believe this is the join handler for joins on columns we need to materialize, i.e the following from test_join_with_non_trivial_key:

join = left.join(right, right.key.length() == left.key.length(), how=how) expr = join[left.key, left.value, right.other_value] result = expr.execute() expected = ( dd.merge( df1.assign(key_len=df1.key.str.len()), df2.assign(key_len=df2.key.str.len()), on='key_len', how=how, ) .drop(['key_len', 'key_y', 'key2', 'key3'], axis=1) .rename(columns={'key_x': 'key'}) )

(code is basically https://github.com/ibis-project/ibis/blob/master/ibis/backends/pandas/execution/join.py#L58),

ibis/backends/dask/execution/numeric.py

icexelloss · 2020-12-17T16:13:49Z

ibis/backends/dask/execution/numeric.py

+    return call_numpy_ufunc(func, op, data, **kwargs).astype(return_type)
+
+
+def vectorize_object(op, arg, *args, **kwargs):


What does this method do?

Also a bit weird that this is only used for log and round operations. Curious why this is not used in all numeric methods?

I think this was built to handle applying operations where have Decimal (arrays of which have an object dtype), scientific notation, and floats as strings. This is in 9882b5a via #1071.

np.vectorize acts as map that broadcasts correctly.

ibis/backends/dask/execution/selection.py

icexelloss · 2020-12-17T16:28:09Z

ibis/backends/dask/execution/selection.py

+            raise KeyError(name)
+        (root_table,) = op.root_tables()
+        left_root, right_root = ops.distinct_roots(
+            parent_table_op.left, parent_table_op.right


What is the type of parent_table_op here?

Inside of this branch of the function they parent_table_op is a join operation of some type.

icexelloss · 2020-12-17T16:30:21Z

ibis/backends/dask/execution/selection.py

+    result = execute(expr, scope=scope, timecontext=timecontext, **kwargs)
+    assert result_name is not None, 'Column selection name is None'
+    if np.isscalar(result):
+        series = dd.from_array(np.repeat(result, len(data.index)))


This will enforce computation, right? (Also might below memory since you can materialize into np array on the driver.

This is still lazy:

In [5]: data = dd.from_pandas(pd.Series([1,2,3]), npartitions=1) In [6]: series = dd.from_array(np.repeat(5, len(data.index))) In [7]: series Out[7]: Dask Series Structure: npartitions=1 0 int64 2 ... dtype: int64 Dask Name: from_array, 1 tasks In [8]: series.compute() Out[8]: 0 5 1 5 2 5 dtype: int64

icexelloss · 2020-12-17T16:31:48Z

ibis/backends/dask/execution/selection.py

+            parent_table_op.left, parent_table_op.right
+        )
+        suffixes = {
+            left_root: constants.LEFT_JOIN_SUFFIX,


Don't totally get this logic here. Why is "joining" appear in the column selection rule here?

The codepath here is distinct form execute_selection_dataframe (in either dask or pandas) which performs the basic selection operation. Here we're performing a selection on a ir.ColumnExpr. An example (from test_core that hits this code path:

left = ibis_table right = left.view() join = left.join(right, 'plain_strings')[left.plain_int64]

or similarly from test_join_with _project_right_duplicate_column:

right = client.table('df3') join = left.join(right, ['key'], how=how) expr = join[left.key, right.key2, right.other_value] result = expr.execute()

icexelloss · 2020-12-17T16:33:10Z

ibis/backends/dask/execution/strings.py

+
+@execute_node.register(ops.StringAscii, dd.Series)
+def execute_string_ascii(op, data, **kwargs):
+    output_meta = pandas.Series([], dtype=np.dtype('int32'), name=data.name)


Hmm.. what should the type of meta be in general? looks like sometimes it's a pd.Series sometimes it's a dict?

https://docs.dask.org/en/latest/dataframe-design.html#metadata. A few different ways to apply meta are all equally valid. My reads is you can:

Provide an empty Pandas object with appropriate dtypes and names.

A descriptive meta, which can be:

A dict of {name: dtype} or an iterable of (name, dtype) specifies a DataFrame. Note that order is important: the order of the names in meta should match the order of the columns A tuple of (name, dtype) specifies a series A dtype object or string (e.g. 'f8') specifies a scalar

icexelloss · 2020-12-17T16:35:32Z

ibis/backends/dask/execution/temporal.py

+        & (data.dt.time.astype(str) <= upper)
+    ).to_dask_array(True)
+
+    result = da.zeros(len(data), dtype=np.bool_)


len(data) seems dangerous here since it will materialize data? Can we create a distributed array here by calling "apply" on the original dd.Series?

Not sure. I'll add this to a "materialization to investigate" follow up issue (along with items you mentioned above).

icexelloss

Finished one around review.

gerrymanoim · 2020-12-18T00:36:13Z

@jreback @icexelloss - I think I've hit all the feedback, let me know if there's anything else that needs to be done.

jreback

few more comments, main thing is to lower the required dask version.

ci/deps/dask.yml

jreback · 2020-12-23T13:53:36Z

ibis/backends/dask/execution/aggregations.py

+
+# TODO - aggregations - #2553
+# Not all code paths work cleanly here
+@execute_node.register(ops.Aggregation, dd.DataFrame)


add to the list of things that we can register for pandas/dask

ibis/backends/dask/execution/timecontext.py

gerrymanoim · 2020-12-23T17:00:22Z

Updated for review comments.

jreback

thanks @gerrymanoim lgtm.

gerrymanoim changed the title ~~Ibis dask execution~~ Dask backend execution Dec 3, 2020

gerrymanoim mentioned this pull request Dec 3, 2020

Dask Backend TODO Items #2537

Closed

7 tasks

jreback added the dask The Dask backend label Dec 3, 2020

jreback added this to the Next release milestone Dec 3, 2020

jreback suggested changes Dec 3, 2020

View reviewed changes

gerrymanoim marked this pull request as draft December 4, 2020 21:13

icexelloss reviewed Dec 4, 2020

View reviewed changes

ibis/backends/dask/execution/__init__.py Outdated Show resolved Hide resolved

icexelloss reviewed Dec 4, 2020

View reviewed changes

gerrymanoim force-pushed the ibis-dask-execution branch from 1bce6fe to bf11966 Compare December 12, 2020 00:38

jreback suggested changes Dec 13, 2020

View reviewed changes

gerrymanoim marked this pull request as ready for review December 14, 2020 23:11

Implement execution for dask backend

30f8bee

gerrymanoim force-pushed the ibis-dask-execution branch from fd1ae80 to 30f8bee Compare December 14, 2020 23:42

Gerry Manoim added 4 commits December 14, 2020 20:00

fix tests

4f97f39

fix up conftest so dask passes

ba0cd60

do not drop the sorted value

d0e75ff

fix tests correctly

ff5de84

Gerry Manoim added 2 commits December 15, 2020 12:52

better imports for groupby objects

a29f021

xfail udfs properly instead of not running them

9988ee3

jreback reviewed Dec 15, 2020

View reviewed changes

.github/workflows/main.yml Outdated Show resolved Hide resolved

jreback reviewed Dec 15, 2020

View reviewed changes

Update dask reqs for min version

6555d30

icexelloss reviewed Dec 17, 2020

View reviewed changes

ibis/backends/dask/execution/join.py Show resolved Hide resolved

icexelloss reviewed Dec 17, 2020

View reviewed changes

ibis/backends/dask/execution/numeric.py Show resolved Hide resolved

icexelloss reviewed Dec 17, 2020

View reviewed changes

ibis/backends/dask/execution/selection.py Outdated Show resolved Hide resolved

icexelloss reviewed Dec 17, 2020

View reviewed changes

icexelloss suggested changes Dec 17, 2020

View reviewed changes

forward on partitions

8de540b

jreback approved these changes Dec 18, 2020

View reviewed changes

jreback suggested changes Dec 23, 2020

View reviewed changes

Gerry Manoim added 2 commits December 23, 2020 11:01

rip out timecontext

b4ecc65

Lower the min required dask version

2c7f535

jreback approved these changes Dec 23, 2020

View reviewed changes

jreback merged commit 96a9b0b into ibis-project:master Dec 23, 2020

gerrymanoim deleted the ibis-dask-execution branch December 24, 2020 03:33

cpcloud removed this from the Next release milestone Jan 7, 2022

		return data.map(get)


		# Note - to avoid dispatch ambiguities we must unregister pandas

		return result


		@execute_node.register(ops.SimpleCase, dd.Series, list, list, object)



		def vectorize_object(op, arg, args, *kwargs):
		# TODO - this works for now, but I think we can do something much better

		return call_numpy_ufunc(func, op, data, **kwargs).astype(return_type)


		def vectorize_object(op, arg, args, *kwargs):

Dask backend execution #2557

Dask backend execution #2557

Conversation

gerrymanoim commented Dec 3, 2020 • edited Loading

gerrymanoim commented Dec 3, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim commented Dec 3, 2020 • edited Loading

jreback commented Dec 3, 2020

gerrymanoim commented Dec 3, 2020 • edited Loading

jreback commented Dec 3, 2020

gerrymanoim commented Dec 3, 2020 • edited Loading

gerrymanoim commented Dec 4, 2020 • edited Loading

jreback commented Dec 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim Dec 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim commented Dec 11, 2020 • edited Loading

jreback commented Dec 12, 2020

gerrymanoim commented Dec 12, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim commented Dec 14, 2020 • edited Loading

gerrymanoim commented Dec 15, 2020

gerrymanoim commented Dec 15, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

icexelloss left a comment

Choose a reason for hiding this comment

gerrymanoim commented Dec 18, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerrymanoim commented Dec 3, 2020 •

edited

Loading

gerrymanoim commented Dec 3, 2020 •

edited

Loading

gerrymanoim commented Dec 3, 2020 •

edited

Loading

gerrymanoim commented Dec 3, 2020 •

edited

Loading

gerrymanoim commented Dec 4, 2020 •

edited

Loading

gerrymanoim Dec 17, 2020 •

edited

Loading

gerrymanoim Dec 4, 2020 •

edited

Loading

gerrymanoim commented Dec 11, 2020 •

edited

Loading

gerrymanoim commented Dec 14, 2020 •

edited

Loading

gerrymanoim Dec 17, 2020 •

edited

Loading

gerrymanoim Dec 17, 2020 •

edited

Loading