-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] apply
with a UDF that references the pandas module and another module fails to find __import__
with cudf.pandas
#15548
Comments
@mroeschke Would you have insight here? I know you’ve looked at datetimes and as_column refactoring lately. It looks like we’re hitting this: cudf/python/cudf/cudf/core/column/column.py Line 1921 in 9192d25
|
Thanks for the report. Related to @bdice's comment about a However once fixing that, I think we're hitting an actual bug when a UDF references the pandas module and another module from the global namespace (xref #14482 maybe) In [1]: %load_ext cudf.pandas
...: import pandas as pd
...: from datetime import datetime
...:
...: def my_apply(df, bias: int):
...: datetime.strptime(df['Minute'], '%H:%M:%S')
...: return pd.to_numeric(1)
...:
...: my_df = pd.DataFrame({'Minute': ['09:00:00']})
...: my_df.apply(my_apply,axis=1, bias=1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/python/cudf/cudf/pandas/fast_slow_proxy.py:889, in _fast_slow_function_call(func, *args, **kwargs)
888 fast_args, fast_kwargs = _fast_arg(args), _fast_arg(kwargs)
--> 889 result = func(*fast_args, **fast_kwargs)
890 if result is NotImplemented:
891 # try slow path
File ~/python/cudf/cudf/pandas/fast_slow_proxy.py:30, in call_operator(fn, args, kwargs)
29 def call_operator(fn, args, kwargs):
---> 30 return fn(*args, **kwargs)
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/nvtx/nvtx.py:116, in annotate.__call__.<locals>.inner(*args, **kwargs)
115 libnvtx_push_range(self.attributes, self.domain.handle)
--> 116 result = func(*args, **kwargs)
117 libnvtx_pop_range(self.domain.handle)
File ~python/cudf/cudf/core/dataframe.py:4603, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
4601 raise ValueError("The `result_type` kwarg is not yet supported.")
-> 4603 return self._apply(func, _get_row_kernel, *args, **kwargs)
File ~/miniforge3/envs/cudf-dev/lib/python3.11/contextlib.py:81, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
80 with self._recreate_cm():
---> 81 return func(*args, **kwds)
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/nvtx/nvtx.py:116, in annotate.__call__.<locals>.inner(*args, **kwargs)
115 libnvtx_push_range(self.attributes, self.domain.handle)
--> 116 result = func(*args, **kwargs)
117 libnvtx_pop_range(self.domain.handle)
File ~/python/cudf/cudf/core/indexed_frame.py:3446, in IndexedFrame._apply(self, func, kernel_getter, *args, **kwargs)
3445 if kwargs:
-> 3446 raise ValueError("UDFs using **kwargs are not yet supported.")
3447 try:
ValueError: UDFs using **kwargs are not yet supported.
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Cell In[1], line 10
7 return pd.to_numeric(1)
9 my_df = pd.DataFrame({'Minute': ['09:00:00']})
---> 10 my_df.apply(my_apply,axis=1, bias=1)
File ~/python/cudf/cudf/pandas/fast_slow_proxy.py:837, in _CallableProxyMixin.__call__(self, *args, **kwargs)
836 def __call__(self, *args, **kwargs) -> Any:
--> 837 result, _ = _fast_slow_function_call(
838 # We cannot directly call self here because we need it to be
839 # converted into either the fast or slow object (by
840 # _fast_slow_function_call) to avoid infinite recursion.
841 # TODO: When Python 3.11 is the minimum supported Python version
842 # this can use operator.call
843 call_operator,
844 self,
845 args,
846 kwargs,
847 )
848 return result
File ~/python/cudf/cudf/pandas/fast_slow_proxy.py:902, in _fast_slow_function_call(func, *args, **kwargs)
900 slow_args, slow_kwargs = _slow_arg(args), _slow_arg(kwargs)
901 with disable_module_accelerator():
--> 902 result = func(*slow_args, **slow_kwargs)
903 return _maybe_wrap_result(result, func, *args, **kwargs), fast
File ~/python/cudf/cudf/pandas/fast_slow_proxy.py:30, in call_operator(fn, args, kwargs)
29 def call_operator(fn, args, kwargs):
---> 30 return fn(*args, **kwargs)
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/pandas/core/frame.py:10361, in DataFrame.apply(self, func, axis, raw, result_type, args, by_row, engine, engine_kwargs, **kwargs)
10347 from pandas.core.apply import frame_apply
10349 op = frame_apply(
10350 self,
10351 func=func,
(...)
10359 kwargs=kwargs,
10360 )
> 10361 return op.apply().__finalize__(self, method="apply")
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/pandas/core/apply.py:916, in FrameApply.apply(self)
913 elif self.raw:
914 return self.apply_raw(engine=self.engine, engine_kwargs=self.engine_kwargs)
--> 916 return self.apply_standard()
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/pandas/core/apply.py:1063, in FrameApply.apply_standard(self)
1061 def apply_standard(self):
1062 if self.engine == "python":
-> 1063 results, res_index = self.apply_series_generator()
1064 else:
1065 results, res_index = self.apply_series_numba()
File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/pandas/core/apply.py:1081, in FrameApply.apply_series_generator(self)
1078 with option_context("mode.chained_assignment", None):
1079 for i, v in enumerate(series_gen):
1080 # ignore SettingWithCopy here in case the user mutates
-> 1081 results[i] = self.func(v, *self.args, **self.kwargs)
1082 if isinstance(results[i], ABCSeries):
1083 # If we have a view on v, we need to make a copy because
1084 # series_generator will swap out the underlying data
1085 results[i] = results[i].copy(deep=False)
Cell In[1], line 6, in my_apply(df, bias)
5 def my_apply(df, bias: int):
----> 6 datetime.strptime(df['Minute'], '%H:%M:%S')
7 return pd.to_numeric(1)
KeyError: '__import__' vs In [1]: %load_ext cudf.pandas
...: import pandas as pd
...: from datetime import datetime
...:
...: def my_apply(df, bias: int):
...: datetime.strptime(df['Minute'], '%H:%M:%S')
...: return 1
...:
...: my_df = pd.DataFrame({'Minute': ['09:00:00']})
...: my_df.apply(my_apply,axis=1, bias=1)
Out[1]:
0 1
dtype: int64 |
apply
with a UDF that references the pandas module and another module fails to find __import__
with cudf.pandas
closes #15548 `_replace_closurevars` creates a new function by replacing objects with their fast versions. When creating the new function, it populates `globals` from the result of `inspect.getclosurevars`, but it don't think it comprehensively returns _all_ the globals accessible to the function (`function.__globals__`) To minimize the change, the "fast globals" are still sourced from `inspect.getclosurevars`, and those update the `old_function.__globals__` when creating a new function. Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #15569
My code runs correctly without cudf. When I install cudf, it reports a 'NotImplementedError'. Which part of the code caused the problem? Is there a roadmap to implement it?
The error:
And here is my code:
I'm using
cudf-cu12==24.4.0
andpandas==2.2.1
The text was updated successfully, but these errors were encountered: