Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references #66

anmyachev · 2024-01-05T01:33:11Z

The changes are aimed at getting rid of the use of the interchange_to_pandas function, so that the tests were implementation independent.

So far the new functions have only been applied to tests\column folder.

MarcoGorelli · 2024-01-05T06:46:09Z

nice idea! do we want to check the data type too?

anmyachev · 2024-01-06T23:57:11Z

pyproject.toml

@@ -75,6 +75,9 @@ ignore = [
 [tool.ruff.isort]
 force-single-line = true

+[tool.black]
+line-length = 90


To sync with pre-commit.

anmyachev · 2024-01-06T23:59:36Z

tests/utils.py

@@ -14,9 +14,8 @@
 import dataframe_api_compat.pandas_standard
 import dataframe_api_compat.polars_standard

-DType = TypeVar("DType")


Looks unused, can return if needed.

anmyachev · 2024-01-07T00:44:17Z

tests/column/col_sorted_indices_test.py



 def test_column_sorted_indices_ascending(library: str) -> None:
-    df = integer_dataframe_6(library).persist()


I deleted .persist() call in several places, since the same call occurs in new comparison functions, which generates warnings, but due to the repository settings - errors. If this is incorrect, then we need a public way to check the ._is_persisted field, so as not to call the method several times.

anmyachev · 2024-01-07T00:47:04Z

tests/column/pow_test.py

-    pd.testing.assert_frame_equal(result_pd, expected)
+    expected = {"a": [1, 2, 3], "b": [4, 5, 6], "result": [1.0, 32.0, 729.0]}
+    expected_dtype = {"a": ns.Int64, "b": ns.Int64, "result": ns.Float64}
+    compare_dataframe_with_reference(result, expected, expected_dtype)  # type: ignore[arg-type]


I don’t know exactly why in some places mypy gives an error that has to be turned off, because it is a false positive. The first thing that catches my eye is that the lists inside the dictionaries have different types, for example int and float (not a homogeneous type).

anmyachev · 2024-01-07T00:49:20Z

dataframe_api_compat/pandas_standard/__init__.py

@@ -104,12 +104,15 @@ def map_pandas_dtype_to_standard_dtype(dtype: Any) -> DType:
        return Namespace.Float32()
    if dtype == "Float32":
        return Namespace.Float32()
-    if dtype == "bool":
+    if dtype in ("bool", "boolean"):


I discovered it by accident while experimenting. It is possible that this is no longer necessary for the current changes.

anmyachev · 2024-01-07T00:50:51Z

dataframe_api_compat/pandas_standard/column_object.py

@@ -35,6 +35,7 @@
    "UInt16": "uint16",
    "UInt8": "uint8",
    "boolean": "bool",
+    "Float64": "float64",


I also discovered by accident, it seems that the float type was missing, but if it was done on purpose, I can try to redo it.

i probably just forgot it - let's add float32 too?

anmyachev · 2024-01-07T00:57:10Z

@MarcoGorelli ready for review :)

anmyachev · 2024-01-11T11:51:23Z

@MarcoGorelli friendly ping :)

A little information for context, after I manage to rewrite the tests in a backend-independent manner, I will try to integrate Modin into your repository. Such preliminary changes are necessary to avoid code duplication.

MarcoGorelli

awesome!

sorry it took a while to get to

just got two minor comments, but this is great

MarcoGorelli · 2024-01-18T19:40:43Z

dataframe_api_compat/pandas_standard/column_object.py

@@ -35,6 +35,7 @@
    "UInt16": "uint16",
    "UInt8": "uint8",
    "boolean": "bool",
+    "Float64": "float64",


i probably just forgot it - let's add float32 too?

MarcoGorelli · 2024-01-18T19:48:29Z

dataframe_api_compat/pandas_standard/__init__.py

+    if not hasattr(dtype, "startswith"):
+        dtype = str(dtype)


is it possible to do this in a less hacky way?

We can try to use name attribute if it exists.

anmyachev · 2024-01-19T14:10:23Z

@MarcoGorelli there are new deprecation warnings from new polars release:

FAILED tests/groupby/aggregate_test.py::test_aggregate[polars-lazy] - DeprecationWarning: `pl.count()` is deprecated. Please use `pl.len()` instead.
FAILED tests/groupby/aggregate_test.py::test_aggregate_only_size[polars-lazy] - DeprecationWarning: `pl.count()` is deprecated. Please use `pl.len()` instead.
FAILED tests/groupby/size_test.py::test_group_by_size[polars-lazy] - DeprecationWarning: `count` is deprecated. It has been renamed to `len`.

What should I do in this case?

MarcoGorelli · 2024-01-19T16:05:22Z

easiest thing would be to address that in a separate PR, and to set the new polars release as the minimum version (polars it moving quite fast so backwards compatibility is less of a concern there)

anmyachev · 2024-01-23T10:40:28Z

~~Blocked by #68~~

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev · 2024-01-23T12:02:49Z

tests/integration/scale_column_test.py

@@ -19,7 +21,7 @@ def test_scale_column_pandas() -> None:


 @pytest.mark.skipif(
-    tuple(int(v) for v in pl.__version__.split(".")) < (0, 19, 0),
+    parse(pl.__version__) < Version("0.19.0"),


This will help the tests work with release candidates, such as polars==0.20.6rc1

anmyachev · 2024-01-23T12:03:29Z

@MarcoGorelli ready for review

MarcoGorelli

thanks @anmyachev !

anmyachev · 2024-01-24T11:34:20Z

thanks for the review @MarcoGorelli!

anmyachev force-pushed the compare-columns branch 2 times, most recently from 00e34a6 to 91c4f31 Compare January 5, 2024 01:44

anmyachev force-pushed the compare-columns branch 8 times, most recently from d05595f to a88360c Compare January 6, 2024 02:09

anmyachev commented Jan 6, 2024

View reviewed changes

anmyachev force-pushed the compare-columns branch 2 times, most recently from 4a20813 to 55944ce Compare January 7, 2024 00:20

anmyachev commented Jan 7, 2024

View reviewed changes

anmyachev changed the title ~~Add function to compare Column objects with iterable references~~ Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references Jan 7, 2024

anmyachev force-pushed the compare-columns branch from cef7e10 to 6360940 Compare January 7, 2024 00:53

anmyachev marked this pull request as ready for review January 7, 2024 00:56

MarcoGorelli reviewed Jan 18, 2024

View reviewed changes

anmyachev force-pushed the compare-columns branch from 54173c3 to f9aa10d Compare January 23, 2024 11:58

Add function to compare Column objects with iterable references

d18b9b1

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev added 11 commits January 23, 2024 12:59

check Column dtype

7708bfb

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

another way to check Column dtype

05171fc

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

temp fix for mypy

f0005d8

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

add 'compare_dataframe_with_reference' func

65c1f66

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use new functions for more files [part1]

efdf097

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use new functions for more files [part2]

1ed8fbf

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use new functions for more files [final part for column tests]

1658fa8

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

refactor

f64ee52

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

workarounds for mypy errors

90467c3

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

address review comments

01fa14d

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

use 'parse' and 'Version' to compare package versions

a4c4aee

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>

anmyachev force-pushed the compare-columns branch from f9aa10d to a4c4aee Compare January 23, 2024 12:00

anmyachev commented Jan 23, 2024

View reviewed changes

anmyachev mentioned this pull request Jan 23, 2024

Finally get rid of interchange_to_pandas #69

Merged

MarcoGorelli approved these changes Jan 24, 2024

View reviewed changes

MarcoGorelli merged commit 107969c into data-apis:main Jan 24, 2024
13 checks passed

anmyachev deleted the compare-columns branch January 24, 2024 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references #66

Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references #66

anmyachev commented Jan 5, 2024 •

edited

Loading

MarcoGorelli commented Jan 5, 2024

anmyachev Jan 6, 2024

anmyachev Jan 6, 2024

anmyachev Jan 7, 2024

anmyachev Jan 7, 2024

anmyachev Jan 7, 2024

anmyachev Jan 7, 2024

MarcoGorelli Jan 18, 2024

anmyachev Jan 19, 2024

anmyachev commented Jan 7, 2024

anmyachev commented Jan 11, 2024

MarcoGorelli left a comment

MarcoGorelli Jan 18, 2024

MarcoGorelli Jan 18, 2024

anmyachev Jan 19, 2024

anmyachev commented Jan 19, 2024

MarcoGorelli commented Jan 19, 2024

anmyachev commented Jan 23, 2024 •

edited

Loading

anmyachev Jan 23, 2024

anmyachev commented Jan 23, 2024

MarcoGorelli left a comment

anmyachev commented Jan 24, 2024



		def test_column_sorted_indices_ascending(library: str) -> None:
		df = integer_dataframe_6(library).persist()

Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references #66

Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references #66

Conversation

anmyachev commented Jan 5, 2024 • edited Loading

MarcoGorelli commented Jan 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmyachev commented Jan 7, 2024

anmyachev commented Jan 11, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmyachev commented Jan 19, 2024

MarcoGorelli commented Jan 19, 2024

anmyachev commented Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

anmyachev commented Jan 23, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

anmyachev commented Jan 24, 2024

anmyachev commented Jan 5, 2024 •

edited

Loading

anmyachev commented Jan 23, 2024 •

edited

Loading