Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datasets): remove deprecation warnings #255

Merged
merged 11 commits into from
Jul 24, 2023

Conversation

astrojuanlu
Copy link
Member

@astrojuanlu astrojuanlu commented Jun 28, 2023

Description

Close #264.

After the next Kedro release, importing certain kedro.io.core classes will emit DeprecationWarnings. However, we don't want this to happen in code the user can't control.

This introduces a kedro_datasets._io private shim that helps importing the right class in a backwards and forwards compatible way. As a result, DeprecationWarnings do not appear.

Before:

In [1]: from kedro_datasets.pandas import CSVDataSet
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/csv_dataset.py:12: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/excel_dataset.py:12: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
  warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.cloud')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.logging')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/google/rpc/__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  pkg_resources.declare_namespace(__name__)
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/gbq_dataset.py:14: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/hdf_dataset.py:11: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/json_dataset.py:12: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/parquet_dataset.py:12: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/sql_dataset.py:11: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/xml_dataset.py:12: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (
/Users/juan_cano/Projects/QuantumBlack Labs/kedro-plugins/kedro-datasets/kedro_datasets/pandas/generic_dataset.py:11: DeprecationWarning: 'DataSetError' has been renamed to 'DatasetError', and the alias will be removed in Kedro 0.19.0
  from kedro.io.core import (

After:

In [1]: from kedro_datasets.pandas import CSVDataSet
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
  warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.cloud')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2349: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.logging')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/Users/juan_cano/.micromamba/envs/kedro310-dev/lib/python3.10/site-packages/google/rpc/__init__.py:20: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google.rpc')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  pkg_resources.declare_namespace(__name__)

(I mean... that's a mouthful too, but kedro-org/kedro#2744)

Development notes

Alternatively, we could use the try ... except ImportError ... in all datasets. But this makes the diff smaller, plus gives us a way to include AbstractDataset and AbstractVersionedDataset in the future if we decide to rename those too.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@astrojuanlu astrojuanlu changed the title Remove deprecation warnings from kedro-datasets fix: Remove deprecation warnings from kedro-datasets Jun 28, 2023
@astrojuanlu astrojuanlu force-pushed the dataset-backwards-compatibility branch from ed371a1 to f4ffbd2 Compare June 28, 2023 10:49
@astrojuanlu

This comment was marked as resolved.

kedro-datasets/kedro_datasets/_io.py Outdated Show resolved Hide resolved
@deepyaman deepyaman changed the title fix: Remove deprecation warnings from kedro-datasets fix(datasets): remove deprecation warnings Jul 7, 2023
@astrojuanlu astrojuanlu force-pushed the dataset-backwards-compatibility branch 2 times, most recently from e560c3c to 0803bb4 Compare July 7, 2023 12:20
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Should this be added to the release notes for future reference?

@astrojuanlu
Copy link
Member Author

Thanks @merelcht ! Will add some release notes.

I'm struggling to understand why the tensorflow tests present some numerical failures and why our linting got triggered in some files I didn't touch 😕 do you think we can bring some help?

@merelcht
Copy link
Member

Thanks @merelcht ! Will add some release notes.

I'm struggling to understand why the tensorflow tests present some numerical failures and why our linting got triggered in some files I didn't touch 😕 do you think we can bring some help?

Do they fail locally as well?

@astrojuanlu
Copy link
Member Author

I confirm the tensorflow tests don't fail on Gitpod.

The linting failures are there though, mostly R0903 too-few-public-methods.

@merelcht
Copy link
Member

I confirm the tensorflow tests don't fail on Gitpod.

The linting failures are there though, mostly R0903 too-few-public-methods.

I'm not quite sure why it suddenly flags that, but I would just ignore it. The amount of public methods hasn't changed. I wonder if it's some strange thing where it can't figure out this is inheriting from AbstractDataSet.

@astrojuanlu astrojuanlu force-pushed the dataset-backwards-compatibility branch from b923752 to 82b174b Compare July 19, 2023 08:04
astrojuanlu and others added 7 commits July 19, 2023 17:49
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu
Copy link
Member Author

Identified the pylint message as a possible bug: pylint-dev/pylint#8865

Proceeding to selectively ignore it.

See pylint-dev/pylint#8865

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu astrojuanlu force-pushed the dataset-backwards-compatibility branch from 82b174b to 3f0b644 Compare July 20, 2023 06:48
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu astrojuanlu force-pushed the dataset-backwards-compatibility branch from 51bb35a to 5168a47 Compare July 20, 2023 07:10
@astrojuanlu
Copy link
Member Author

Linting sorted out, TensorFlow tests still acting up.

@astrojuanlu
Copy link
Member Author

Couldn't reproduce the failures in a dedicated branch: https://github.com/kedro-org/kedro-plugins/actions/runs/5608121558/jobs/10260084023?pr=275

So it's either flakiness, or something introduced by this PR.

@astrojuanlu
Copy link
Member Author

Restarted the job, exact same result. I'm discarding flakiness, so I suppose there's something introduced by this PR that creates the problem.

deepyaman and others added 2 commits July 24, 2023 07:26
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
This needs to be addressed later on, see TODO.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu
Copy link
Member Author

astrojuanlu commented Jul 24, 2023

This made the TF tests failures go away:

diff --git a/kedro-datasets/kedro_datasets/tensorflow/tensorflow_model_dataset.py b/kedro-datasets/kedro_datasets/tensorflow/tensorflow_model_dataset.py
index ad9e844f..e13c2cb5 100644
--- a/kedro-datasets/kedro_datasets/tensorflow/tensorflow_model_dataset.py
+++ b/kedro-datasets/kedro_datasets/tensorflow/tensorflow_model_dataset.py
@@ -8,10 +8,17 @@
 
 import fsspec
 import tensorflow as tf
-from kedro.io.core import Version, get_filepath_str, get_protocol_and_path
 
-from .._io import AbstractVersionedDataset as AbstractVersionedDataSet
-from .._io import DatasetError as DataSetError
+# TODO: Replace these imports by the appropriate ones from kedro_datasets._io
+# to avoid deprecation warnings for users,
+# see https://github.com/kedro-org/kedro-plugins/pull/255
+from kedro.io.core import (
+    AbstractVersionedDataSet,
+    DataSetError,
+    Version,
+    get_filepath_str,
+    get_protocol_and_path,
+)
 
 TEMPORARY_H5_FILE = "tmp_tensorflow_model.h5"

I'm 100 % sure there's something fishy with pytest fixtures or mocks, and not the dataset itself. However, investigating this can take some time that at the moment we don't have.

The result of this commit, for now, is that importing the tensorflow dataset will give a deprecationwarning the user cannot act upon (because they don't have control over the class hierarchy).

re-requesting approval from @merelcht before merging.

@astrojuanlu
Copy link
Member Author

xref kedro-org/kedro@18b1c4d

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging so deep into the tensorflow issue. I think this is a good solution and don't think it hurts the user experience too much 👍

@astrojuanlu astrojuanlu merged commit 9cb6a16 into main Jul 24, 2023
13 checks passed
@astrojuanlu astrojuanlu deleted the dataset-backwards-compatibility branch July 24, 2023 17:15
PtrBld pushed a commit to PtrBld/kedro-plugins that referenced this pull request Aug 27, 2023
* Unpin pytest-xdist to avoid deprecation error

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove deprecation warnings from kedro-datasets

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Adapt future deprecation of Abstract*DataSet

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add tests for backwards and forwards compatibility

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Typo

Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Make _io tests restore old values

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Lint

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Ignore faulty pylint check

See pylint-dev/pylint#8865

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Disable a few more faulty pylint checks

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove shim from tensorflow datasets

This needs to be addressed later on, see TODO.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make kedro-datasets forwards and backwards compatible
3 participants