Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend / plugin system remove_duplicates raises AttributeError on discovering duplicates #5944

Closed
ashwinvis opened this issue Nov 6, 2021 · 11 comments · Fixed by #5959
Closed

Comments

@ashwinvis
Copy link
Contributor

What happened:

In one of my CI runs somehow the entrypoints ended up being defined twice. Then, I discovered that the function remove_duplicates which weeds out duplicate entrypoints were not updated.

What you expected to happen:

No bugs in remove_duplicates function

Minimal Complete Verifiable Example:

# Put your MCVE code here
from xarray.backends.plugins import remove_duplicates
from importlib.metadata import entry_points

eps = entry_points().get('xarray.backends', ())

remove_duplicates(eps)
remove_duplicates(eps + eps)
<ipython-input-12-22df5e55614a>:1: DeprecationWarning: EntryPoints list interface is deprecated. Cast to list if needed.
  remove_duplicates(eps + eps)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-22df5e55614a> in <module>
----> 1 remove_duplicates(eps + eps)

~/.pyenv/versions/3.9.7/envs/pymech/lib/python3.9/site-packages/xarray/backends/plugins.py in remove_duplicates(entrypoints)
     27         matches_len = len(matches)
     28         if matches_len > 1:
---> 29             selected_module_name = matches[0].module_name
     30             all_module_names = [e.module_name for e in matches]
     31             warnings.warn(

AttributeError: 'EntryPoint' object has no attribute 'module_name'

Anything else we need to know?:

Following v0.20.0 the entrypoints were discovered using importlib.metadata / importlib_metadata, but it was broken for 3rd party backeds. In v0.20.1 after #5931 the backend detection was fixed, but remains to be thoroughly tested #5934. This bug might be rare, compared to #5930, so I would recommend having some tests in place before making the next release.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.7 (default, Nov 3 2021, 09:51:04)
[GCC 11.1.0]
python-bits: 64
OS: Linux
OS-release: 5.10.75-1-lts
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C
LOCALE: (None, None)
libhdf5: 1.12.1
libnetcdf: None

xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.3
scipy: None
netCDF4: None
pydap: None
h5netcdf: 0.11.0
h5py: 3.5.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2021.10.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2021.10.1
cupy: None
pint: None
sparse: None
setuptools: 57.4.0
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 7.29.0
sphinx: 4.2.0

@ashwinvis
Copy link
Contributor Author

On a side note, the syntax .get is deprecated in the importlib_metadata package and most likely in Python 3.10's importlib.metadata stdlib.

In [16]: from importlib_metadata import entry_points

In [17]: entry_points().get('xarray.backends', ())
<ipython-input-17-5f3ea0df5c10>:1: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.
  entry_points().get('xarray.backends', ())
Out[17]: 
[EntryPoint(name='rasterio', value='rioxarray.xarray_plugin:RasterioBackend', group='xarray.backends'),
 EntryPoint(name='pymech', value='pymech.dataset:PymechXarrayBackend', group='xarray.backends')]

In [18]: entry_points().select(group='xarray.backends')
Out[18]: 
[EntryPoint(name='rasterio', value='rioxarray.xarray_plugin:RasterioBackend', group='xarray.backends'),
 EntryPoint(name='pymech', value='pymech.dataset:PymechXarrayBackend', group='xarray.backends')]

@weiji14
Copy link
Contributor

weiji14 commented Nov 8, 2021

I'm getting a similar issue with xarray=0.20.1 when rioxarray and rasterio are both installed, looks like #5931 didn't fully fix things? Here's the full traceback.

_________________ test_open_variable_filter[open_rasterio_engine] _________________

open_rasterio = <function open_rasterio_engine at 0x7fa99ca7ec10>

    def test_open_variable_filter(open_rasterio):
>       with open_rasterio(
            os.path.join(TEST_INPUT_DATA_DIR, "PLANET_SCOPE_3D.nc"), variable=["blue"]
        ) as rds:

test/integration/test_integration__io.py:185: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/conftest.py:103: in open_rasterio_engine
    return xr.open_dataset(file_name_or_object, engine="rasterio", **kwargs)
../../../miniconda3/envs/rioxarray/lib/python3.9/site-packages/xarray/backends/api.py:481: in open_dataset
    backend = plugins.get_backend(engine)
../../../miniconda3/envs/rioxarray/lib/python3.9/site-packages/xarray/backends/plugins.py:158: in get_backend
    engines = list_engines()
../../../miniconda3/envs/rioxarray/lib/python3.9/site-packages/xarray/backends/plugins.py:103: in list_engines
    return build_engines(entrypoints)
../../../miniconda3/envs/rioxarray/lib/python3.9/site-packages/xarray/backends/plugins.py:92: in build_engines
    entrypoints = remove_duplicates(entrypoints)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

entrypoints = [EntryPoint(name='rasterio', value='rioxarray.xarray_plugin:RasterioBackend', group='xarray.backends'), EntryPoint(nam...rray.backends'), EntryPoint(name='rasterio', value='rioxarray.xarray_plugin:RasterioBackend', group='xarray.backends')]

    def remove_duplicates(entrypoints):
        # sort and group entrypoints by name
        entrypoints = sorted(entrypoints, key=lambda ep: ep.name)
        entrypoints_grouped = itertools.groupby(entrypoints, key=lambda ep: ep.name)
        # check if there are multiple entrypoints for the same name
        unique_entrypoints = []
        for name, matches in entrypoints_grouped:
            matches = list(matches)
            unique_entrypoints.append(matches[0])
            matches_len = len(matches)
            if matches_len > 1:
>               selected_module_name = matches[0].module_name
E               AttributeError: 'EntryPoint' object has no attribute 'module_name'

../../../miniconda3/envs/rioxarray/lib/python3.9/site-packages/xarray/backends/plugins.py:29: AttributeError
================================ warnings summary =================================
test/integration/test_integration__io.py::test_open_variable_filter[open_rasterio]
  /home/username/projects/rioxarray/rioxarray/_io.py:366: DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.
    new_val = np.fromstring(value.strip("{}"), dtype="float", sep=",")

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================= short test summary info =============================
FAILED test/integration/test_integration__io.py::test_open_variable_filter[open_rasterio_engine]
!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!

Output of xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.10.0-8-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: ('en_NZ', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.7.1
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2021.11.0
distributed: 2021.11.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2021.11.0
cupy: None
pint: None
sparse: None
setuptools: 58.5.3
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: None
sphinx: 1.8.5

@kmuehlbauer
Copy link
Contributor

@ashwinvis @weiji14 I've encountered the same problem. As there is possibly a bug in remove_duplicates, another issue is hidden deeper.

@functools.lru_cache(maxsize=1)
def list_engines():
entrypoints = entry_points().get("xarray.backends", ())
return build_engines(entrypoints)

Here entry_points is called and for some reason it returns the xarray.backends twice (but only if invoked via pytest). This can be followed down to importlib/metadata.py distributions() which finds the package under test twice. This is as far as I could get.

Any help very much appreciated.

@ashwinvis
Copy link
Contributor Author

@kmuehlbauer

@ashwinvis
Copy link
Contributor Author

To use the select method, the following should be changed:

  • Swap the imports so that newer importlib_metadata is imported first:
    try:
    from importlib.metadata import entry_points
    except ImportError:
    # if the fallback library is missing, we are doomed.
    from importlib_metadata import entry_points # type: ignore[no-redef]
  • Require importlib_metadata for Python < 3.10
    importlib-metadata; python_version < '3.8'

@kmuehlbauer
Copy link
Contributor

@weiji14 I've solved my issue by invoking pytest with --import-mode="append" outside the package directory.

@snowman2
Copy link
Contributor

snowman2 commented Nov 8, 2021

When a fix is added for this, feel free to ping me to test the fix on rioxarray.

@kmuehlbauer
Copy link
Contributor

@snowman2 Do you get the same error when running the testsuite for rioxarray with xarray 0.20.1?

My question would be, why are those duplicates there in the first place.

@snowman2
Copy link
Contributor

snowman2 commented Nov 8, 2021

I see the same issue reported here: #5944 (comment)

See: https://github.com/corteva/rioxarray/runs/4140632105 (Note: updated CI build link)

@kmuehlbauer
Copy link
Contributor

@snowman2 #5959 should fix the AttributeError: 'EntryPoint' object has no attribute 'module_name'.

@kmuehlbauer
Copy link
Contributor

OK, here are my findings.

There are tests for this in https://github.com/pydata/xarray/blob/main/xarray/tests/test_plugins.py, but they are still using pkg_resources. Because of that, the tests did not capture this issue although we have tests for it.

While fixing the AttributeError the tests finally failed (it was using pkg_resources nomenclature).

With latest #5959 everything should be in place now.

TimoRoth added a commit to OGGM/OGGM-Docker that referenced this issue Nov 9, 2021
Installing this in parallel with rasterio causes xarray to explode.
Remove it again until pydata/xarray#5944 is
resolved.

This reverts commit 86b3101.
kmuehlbauer added a commit to kmuehlbauer/wradlib that referenced this issue Nov 11, 2021
kmuehlbauer added a commit to kmuehlbauer/wradlib that referenced this issue Nov 11, 2021
kmuehlbauer added a commit to wradlib/wradlib that referenced this issue Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants