-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions to compare Column objects with iterable references and to compare DataFrame objects with mapping references #66
Conversation
00e34a6
to
91c4f31
Compare
nice idea! do we want to check the data type too? |
d05595f
to
a88360c
Compare
@@ -75,6 +75,9 @@ ignore = [ | |||
[tool.ruff.isort] | |||
force-single-line = true | |||
|
|||
[tool.black] | |||
line-length = 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To sync with pre-commit
.
@@ -14,9 +14,8 @@ | |||
import dataframe_api_compat.pandas_standard | |||
import dataframe_api_compat.polars_standard | |||
|
|||
DType = TypeVar("DType") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks unused, can return if needed.
4a20813
to
55944ce
Compare
|
||
|
||
def test_column_sorted_indices_ascending(library: str) -> None: | ||
df = integer_dataframe_6(library).persist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted .persist()
call in several places, since the same call occurs in new comparison functions, which generates warnings, but due to the repository settings - errors. If this is incorrect, then we need a public way to check the ._is_persisted
field, so as not to call the method several times.
pd.testing.assert_frame_equal(result_pd, expected) | ||
expected = {"a": [1, 2, 3], "b": [4, 5, 6], "result": [1.0, 32.0, 729.0]} | ||
expected_dtype = {"a": ns.Int64, "b": ns.Int64, "result": ns.Float64} | ||
compare_dataframe_with_reference(result, expected, expected_dtype) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know exactly why in some places mypy
gives an error that has to be turned off, because it is a false positive. The first thing that catches my eye is that the lists inside the dictionaries have different types, for example int and float (not a homogeneous type).
@@ -104,12 +104,15 @@ def map_pandas_dtype_to_standard_dtype(dtype: Any) -> DType: | |||
return Namespace.Float32() | |||
if dtype == "Float32": | |||
return Namespace.Float32() | |||
if dtype == "bool": | |||
if dtype in ("bool", "boolean"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered it by accident while experimenting. It is possible that this is no longer necessary for the current changes.
@@ -35,6 +35,7 @@ | |||
"UInt16": "uint16", | |||
"UInt8": "uint8", | |||
"boolean": "bool", | |||
"Float64": "float64", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also discovered by accident, it seems that the float type was missing, but if it was done on purpose, I can try to redo it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i probably just forgot it - let's add float32 too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
cef7e10
to
6360940
Compare
@MarcoGorelli ready for review :) |
@MarcoGorelli friendly ping :) A little information for context, after I manage to rewrite the tests in a backend-independent manner, I will try to integrate Modin into your repository. Such preliminary changes are necessary to avoid code duplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!
sorry it took a while to get to
just got two minor comments, but this is great
@@ -35,6 +35,7 @@ | |||
"UInt16": "uint16", | |||
"UInt8": "uint8", | |||
"boolean": "bool", | |||
"Float64": "float64", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i probably just forgot it - let's add float32 too?
if not hasattr(dtype, "startswith"): | ||
dtype = str(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to do this in a less hacky way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can try to use name
attribute if it exists.
@MarcoGorelli there are new deprecation warnings from new polars release: FAILED tests/groupby/aggregate_test.py::test_aggregate[polars-lazy] - DeprecationWarning: `pl.count()` is deprecated. Please use `pl.len()` instead.
FAILED tests/groupby/aggregate_test.py::test_aggregate_only_size[polars-lazy] - DeprecationWarning: `pl.count()` is deprecated. Please use `pl.len()` instead.
FAILED tests/groupby/size_test.py::test_group_by_size[polars-lazy] - DeprecationWarning: `count` is deprecated. It has been renamed to `len`. What should I do in this case? |
easiest thing would be to address that in a separate PR, and to set the new polars release as the minimum version (polars it moving quite fast so backwards compatibility is less of a concern there) |
|
54173c3
to
f9aa10d
Compare
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
f9aa10d
to
a4c4aee
Compare
@@ -19,7 +21,7 @@ def test_scale_column_pandas() -> None: | |||
|
|||
|
|||
@pytest.mark.skipif( | |||
tuple(int(v) for v in pl.__version__.split(".")) < (0, 19, 0), | |||
parse(pl.__version__) < Version("0.19.0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will help the tests work with release candidates, such as polars==0.20.6rc1
@MarcoGorelli ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @anmyachev !
thanks for the review @MarcoGorelli! |
The changes are aimed at getting rid of the use of the
interchange_to_pandas
function, so that the tests were implementation independent.So far the new functions have only been applied to
tests\column
folder.