Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST-#6016: make sure eval_general doesn't expect exceptions by default #6954

Merged
merged 61 commits into from
Mar 14, 2024

Conversation

anmyachev
Copy link
Collaborator

@anmyachev anmyachev commented Feb 21, 2024

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves TEST: eval_general should assume by default that pandas does not throw an error #6016
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@anmyachev anmyachev force-pushed the issue6016 branch 4 times, most recently from 50fa735 to 79b5ba0 Compare February 27, 2024 12:21
modin/pandas/test/test_series.py Fixed Show fixed Hide fixed
modin/pandas/test/test_series.py Fixed Show fixed Hide fixed
…ptions by default

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@anmyachev
Copy link
Collaborator Author

@dchigarev @YarShev ready for review

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@@ -1420,7 +1410,7 @@ def _csv_file_maker(
value=[char if (x + 2) == 0 else x for x in range(row_size)],
)

if thousands_separator:
if thousands_separator is not None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be empty string.

Comment on lines -1043 to -1048
finally:
if os.path.exists(unique_filename):
try:
os.remove(unique_filename)
except PermissionError:
pass
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cleanup, we use tmp_path fixture.

@@ -331,7 +331,7 @@
"sum of certain elements": lambda axis: (
axis.iloc[0] + axis.iloc[-1] if isinstance(axis, pandas.Series) else axis + axis
),
"should raise TypeError": 1,
"should raise AssertionError": 1,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more similar to the errors that I have seen. Can be called more neutrally (should raise Exception).

eval_general(
*create_test_dfs(test_data["float_nan_data"]),
lambda df: getattr(df, method)(axis=axis, skipna=skipna),
)


def test_rank_except():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was convenient to combine them.

@@ -395,7 +395,7 @@ jobs:
- ubuntu
- windows
python-version: ["3.9"]
engine: ${{ fromJSON( github.event_name == 'push' && '["python", "ray", "dask"]' || needs.execution-filter.outputs.engines ) }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for testing purpose. Need to revert.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@@ -279,7 +279,7 @@
test_string_list_data_values = list(test_string_list_data.values())
test_string_list_data_keys = list(test_string_list_data.keys())

string_seperators = {"empty sep": "", "comma sep": ",", "None sep": None}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid separators.

@anmyachev
Copy link
Collaborator Author

@YarShev ready for review

@@ -457,7 +457,10 @@ def _read_csv_check_support(
+ "'infer' header values",
)
if isinstance(parse_dates, list) and not set(parse_dates).issubset(names):
raise ValueError("Missing column provided to 'parse_dates'")
missed_columns = set(parse_dates) - set(names)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
missed_columns = set(parse_dates) - set(names)
missing_columns = set(parse_dates) - set(names)

@@ -319,7 +335,9 @@ def test_equals_with_nans():
@pytest.mark.parametrize(
"is_idx_aligned", [True, False], ids=["idx_aligned", "idx_not_aligned"]
)
def test_mismatched_row_partitions(is_idx_aligned, op_type, is_more_other_partitions):
def test_mismatched_row_partitions(
is_idx_aligned, op_type, is_more_other_partitions, request
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is request added?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

raising_exceptions = None
if (
"bool-bool" in request.node.callspec.id
or "bool scalar-bool" in request.node.callspec.id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are going to proceed using request.* to identify the exact errors and how difficult it is for developers to find the exact cases for those errors using request.*?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are going to proceed using request.* to identify the exact errors and how difficult it is for developers to find the exact cases for those errors using request.*?

Well, this is quite inconvenient, but this is the price we need to pay in order to be more confident that what is actually being tested is what is intended.

I would also like to note that to avoid this inconvenience, when writing a test, we need to separate test cases that should work correctly from those that should not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to note that to avoid this inconvenience, when writing a test, we need to separate test cases that should work correctly from those that should not.

Yes, separating tests would probably make sense here. Matching error-prone parameters with exceptions would help I think.

modin/pandas/test/dataframe/test_udf.py Show resolved Hide resolved
@@ -518,15 +513,6 @@ def test_median_skew_transposed(axis, method):
)


@pytest.mark.parametrize("numeric_only", [True, False, None])
@pytest.mark.parametrize("method", ["median", "skew", "std", "var", "rank", "sem"])
def test_median_skew_std_var_rank_sem_specific(numeric_only, method):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are already being tested in test_reduce.py IIRC.

@@ -924,6 +920,8 @@ def execute_callable(fn, inplace=False, md_kwargs={}, pd_kwargs={}):
assert (
pd_e.args == raising_exceptions.args
), f"not acceptable Pandas' exception: [{repr(pd_e)}]"
elif raising_exceptions is not False:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can we get into this branch if we already have if raising_exceptions, and why do we need this branch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only way to disable exception checking, it only works when raising_exceptions=False.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The not-so-obvious value for the parameter should probably be reconsidered separately.

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@YarShev YarShev merged commit c753436 into modin-project:master Mar 14, 2024
37 checks passed
@anmyachev anmyachev deleted the issue6016 branch March 14, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TEST: eval_general should assume by default that pandas does not throw an error
3 participants