Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Isin converted floats unnecessarily to int causing rounding issues #37770

Merged
merged 12 commits into from
Dec 29, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 11, 2020

@phofl phofl added Dtype Conversions Unexpected or buggy dtype conversions isin isin method labels Nov 11, 2020
# Try finding a dtype which would not change our values
values, _ = maybe_upcast(values, dtype=dtype)
dtype = values.dtype
except (ValueError, TypeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what cases raise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError when values is an array with pd.NaT (test_isin_nan_common_float64 in pandas/tests/indexes/test_base.py)
ValueError if values contains strings, which can't be converted to int (test_isin_level_kwarg in same file for example)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of this, can we not be explict here on the dtype checks (as we are elsewhere).?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use maybe_convert_numeric``` but would have to keep a try except`` block too. Is there a function I am not aware of to check, if we can convert to numeric from object without try except?

# Try finding a dtype which would not change our values
values, _ = maybe_upcast(values, dtype=dtype)
dtype = values.dtype
except (ValueError, TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of this, can we not be explict here on the dtype checks (as we are elsewhere).?

@phofl
Copy link
Member Author

phofl commented Nov 14, 2020

The problem I was facing is, that we sometimes get numpy arrays from dtype objects actually containing integers/floats, which we would like to cast to integer to try isin. So we can not really filter for integer dtypes on values. But sometimes they contain strings, which leads to raise in maybe_upcast, hence the try except block. I am not patricularly happy with this solution either, but I did not know a better way to convert to integer dtype if possible and pass in all other cases.

@phofl phofl changed the title [BUG]: Isin converted floats unnecessarily to int causing rounding issues BUG: Isin converted floats unnecessarily to int causing rounding issues Nov 15, 2020
@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

I think a recent PR from @jbrockmendel may have obviated the need for this, but merge master and let's see

� Conflicts:
�	doc/source/whatsnew/v1.2.0.rst
�	pandas/core/algorithms.py
�	pandas/tests/series/methods/test_isin.py
@phofl
Copy link
Member Author

phofl commented Nov 26, 2020

No, ensure_data is still called without taking the dtype of values into account. Causes the same problems as before unfortunately

@jbrockmendel
Copy link
Member

ive been looking at _ensure_data from another angle, and it basically needs to be split into two functions. one for everything besides isin, in which dtype is never passed, and another for isin in which we need to pass both comps and values at the same time and use something like find_common_dtype

@phofl
Copy link
Member Author

phofl commented Nov 26, 2020

That would work, yes. Will look into this during the weekend

@jbrockmendel
Copy link
Member

The one usage of _ensure_data that passed dtype has now been removed, so that can likely be simplified now.

� Conflicts:
�	doc/source/whatsnew/v1.2.0.rst
�	pandas/core/algorithms.py
@phofl
Copy link
Member Author

phofl commented Dec 5, 2020

Your commit yesterday fixed above cases, so only adding tests here.

@@ -801,6 +801,7 @@ Reshaping
- Bug in :func:`merge_ordered` returned wrong join result when length of ``left_by`` or ``right_by`` equals to the rows of ``left`` or ``right`` (:issue:`38166`)
- Bug in :func:`merge_ordered` didn't raise when elements in ``left_by`` or ``right_by`` not exist in ``left`` columns or ``right`` columns (:issue:`38167`)
- Bug in :func:`DataFrame.drop_duplicates` not validating bool dtype for ``ignore_index`` keyword (:issue:`38274`)
- Bug in :meth:`Series.isin` cast ``float`` unnecessarily to ``int`` when :class:`Series` to look in was from dtype ``int`` (:issue:`19356`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cast float unnecessarily" -> "unnecessarily casting float dtypes"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably needs to move to 1.3.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense removing the whatsnew entirely, since your fix went into 1.2. Would be confusing having this in 1.3

@phofl phofl changed the title BUG: Isin converted floats unnecessarily to int causing rounding issues TST: Isin converted floats unnecessarily to int causing rounding issues Dec 9, 2020
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

this might be fixed on master now, can youi revisit

@phofl
Copy link
Member Author

phofl commented Dec 29, 2020

Yes was fixed by @jbrockmendel, hence only adding tests here

@jreback jreback added this to the 1.3 milestone Dec 29, 2020
@jreback jreback merged commit 0976c4c into pandas-dev:master Dec 29, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

excellent, thanks @phofl

@phofl phofl deleted the 19356 branch December 30, 2020 09:10
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions isin isin method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: unwanted casting in .isin .isin implicitly converts data types
3 participants