Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: drop_duplicates not raising KeyError on missing key #19730

Merged
merged 5 commits into from
Feb 21, 2018
Merged

BUG: drop_duplicates not raising KeyError on missing key #19730

merged 5 commits into from
Feb 21, 2018

Conversation

NoahTheDuke
Copy link
Contributor

@NoahTheDuke NoahTheDuke commented Feb 16, 2018

Fix #17879 introduced an error by iterating over the columns in the dataframe, not the columns in the subset. This meant that passing in a column name missing from the dataframe would no longer raise a KeyError like it had previously.

This fix checks the subset first before pulling necessary columns from the dataframe, and raises the necessary KeyError when a given column doesn't exist.

Fix #17879 introduced an error by iterating over the columns in the dataframe,
not the columns in the subset. This meant that passing in a column name missing
from the dataframe would no longer raise a `KeyError` like it had previously.

This fix checks the subset first before pulling necessary columns from the
dataframe, and raises the necessary `KeyError` when a given column doesn't
exist.

Fixes #19726
@codecov
Copy link

codecov bot commented Feb 16, 2018

Codecov Report

Merging #19730 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19730      +/-   ##
==========================================
+ Coverage   91.58%   91.58%   +<.01%     
==========================================
  Files         150      150              
  Lines       48867    48890      +23     
==========================================
+ Hits        44755    44777      +22     
- Misses       4112     4113       +1
Flag Coverage Δ
#multiple 89.96% <100%> (ø) ⬆️
#single 41.79% <0%> (+0.04%) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.23% <100%> (+0.07%) ⬆️
pandas/core/series.py 94.46% <0%> (-0.11%) ⬇️
pandas/core/ops.py 96.74% <0%> (-0.09%) ⬇️
pandas/core/indexes/base.py 96.45% <0%> (-0.02%) ⬇️
pandas/plotting/_converter.py 65.22% <0%> (ø) ⬆️
pandas/core/panel.py 97.3% <0%> (ø) ⬆️
pandas/core/indexes/api.py 98.78% <0%> (ø) ⬆️
pandas/core/arrays/categorical.py 94.9% <0%> (+0.01%) ⬆️
pandas/core/indexes/category.py 97.31% <0%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fdf1e2...61481a4. Read the comment docs.

@@ -3655,6 +3655,10 @@ def f(vals):
isinstance(subset, tuple) and subset in self.columns):
subset = subset,

for name in subset:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here on what you are checking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this

diff = pd.Index(subset).difference(self.columns)
if len(diff):
     raise KeyError(diff)

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Feb 18, 2018
@jreback
Copy link
Contributor

jreback commented Feb 18, 2018

can you also check .duplicated() (and add a test)

@NoahTheDuke
Copy link
Contributor Author

Updated!

@jreback jreback added this to the 0.23.0 milestone Feb 21, 2018
@jreback jreback merged commit dbc601e into pandas-dev:master Feb 21, 2018
@jreback
Copy link
Contributor

jreback commented Feb 21, 2018

thanks!

@NoahTheDuke NoahTheDuke deleted the bugfix-drop_duplicates-when-column-name-misspelled branch February 21, 2018 14:16
harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas 0.22.0 does not raise KeyError for misspelled column with .drop_duplicates()
2 participants