Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GeoDataFrame attribute dropped by pandas when using groupby.apply #45314

Closed
2 of 3 tasks
kdpenner opened this issue Jan 11, 2022 · 2 comments · Fixed by #45363
Closed
2 of 3 tasks

BUG: GeoDataFrame attribute dropped by pandas when using groupby.apply #45314

kdpenner opened this issue Jan 11, 2022 · 2 comments · Fixed by #45363
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@kdpenner
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import geopandas as gpd
from geopandas import sjoin
from shapely.geometry import MultiPoint


pts = MultiPoint([[0, 0], [1, 1]])
polys = [_.buffer(0.1) for _ in pts.geoms]
gdf = gpd.GeoDataFrame(["a", "b"], geometry=polys)

df_close = gdf.groupby(0).apply(lambda x: sjoin(x, x, how="inner",
                                                predicate="overlaps"))

Issue Description

Probable bug with pandas, see geopandas/geopandas#2294

With pandas 1.2.5, the MWE above works fine. With pandas 1.3.5, it fails; here's the traceback:

Traceback (most recent call last):
  File "/Users/kylepenner/Desktop/bug.py", line 10, in <module>
    df_close = gdf.groupby(0).apply(lambda x: sjoin(x, x, how="inner",
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1275, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1309, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/groupby/ops.py", line 852, in apply
    res = f(group)
  File "/Users/kylepenner/Desktop/bug.py", line 10, in <lambda>
    df_close = gdf.groupby(0).apply(lambda x: sjoin(x, x, how="inner",
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/geopandas/tools/sjoin.py", line 122, in sjoin
    _basic_checks(left_df, right_df, how, lsuffix, rsuffix)
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/geopandas/tools/sjoin.py", line 165, in _basic_checks
    if not _check_crs(left_df, right_df):
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/geopandas/array.py", line 68, in _check_crs
    if not left.crs == right.crs:
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/generic.py", line 5487, in __getattr__
    return object.__getattribute__(self, name)
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/geopandas/geodataframe.py", line 408, in crs
    return self._crs
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/core/generic.py", line 5487, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'GeoDataFrame' object has no attribute '_crs'

Expected Behavior

A GeoDataFrame.

Installed Versions

Throws an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
    deps = _get_dependency_info()
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
    mod = import_optional_dependency(modname, errors="ignore")
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/pandas/compat/_optional.py", line 115, in import_optional_dependency
    module = importlib.import_module(name)
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/setuptools/__init__.py", line 8, in <module>
    import _distutils_hack.override  # noqa: F401
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/_distutils_hack/override.py", line 1, in <module>
    __import__('_distutils_hack').do_override()
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/_distutils_hack/__init__.py", line 71, in do_override
    ensure_local_distutils()
  File "/Users/kylepenner/miniconda3/envs/test2/lib/python3.10/site-packages/_distutils_hack/__init__.py", line 59, in ensure_local_distutils
    assert '_distutils' in core.__file__, core.__file__
AssertionError: /Users/kylepenner/miniconda3/envs/test2/lib/python3.10/distutils/core.py
@kdpenner kdpenner added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2022
@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Groupby and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 13, 2022
@jorisvandenbossche jorisvandenbossche added this to the 1.4 milestone Jan 13, 2022
@jorisvandenbossche
Copy link
Member

This regression is not new for 1.4 (it's already from 1.3), but since it's a regression it would still be nice to fix it as soon as possible (and not for 1.5), so for now labeled it as 1.4.

I will take a closer look, but I think this essentially is due to not "properly" calling the _constructor at

# fastpath equivalent to `return sdata._constructor(mgr)`
obj = type(sdata)._from_mgr(mgr)
object.__setattr__(obj, "_flags", sdata._flags)

(from #40236)

@jorisvandenbossche
Copy link
Member

A pandas reproducer I tried constructing actually shows a regression since 1.3 as well:

custom_df = tm.SubclassedDataFrame({"a": [1, 2, 3], "b": [1, 1, 2], "c": [7, 8, 9]})
custom_df.testattr = "hello"
custom_df.groupby("c").apply(lambda df: df.testattr)

worked in 1.3.5, but is failing on main / 1.4rc.
I think in 1.3.5 this was working because it took the "fast_apply" path through libreduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants