Allow to select index in drop_duplicates and duplicated #9708

flying-sheep · 2015-03-23T12:11:47Z

there’s no way to drop rows with duplicated index using drop_duplicates.

we’d have to add a copy of the index as column, or do this:

df[np.logical_not(df.index.duplicated(take_last=True).values)]

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2015-03-23T12:18:40Z

Typically I'll use a df.groupby(level=0).last() (or more typically .first()). It works fine, but a groupby isn't necessarily the first thought for deduplication.

I'm +0 on whether we should have a dedicated method for this.

jreback · 2015-03-23T14:53:50Z

As @TomAugspurger indicates the following are equivalent.

I suppose the drop_duplicates section could have this an alterative example. If you would like to pull-request for a doc update would be ok.

In [6]: df = pd.DataFrame({'A' : range(4), 'B' : list('aabb')})            

In [7]: df                                                                 
Out[7]:                                                                    
   A  B                                                                    
0  0  a                                                                    
1  1  a                                                                    
2  2  b                                                                    
3  3  b                                                                    

In [9]: df2 = df.set_index('B')                                            

In [10]: df2                                                               
Out[10]:                                                                   
   A                                                                       
B                                                                          
a  0                                                                       
a  1                                                                       
b  2                                                                       
b  3

In [13]: df2.groupby(level=0).first()                        
Out[13]:                                                     
   A                                                         
B                                                            
a  0                                                         
b  2                                                         

In [16]: df2.reset_index().drop_duplicates(subset='B',take_last=False).set_index('B')                                                      
Out[16]:                                                                                                                                   
   A                                                                                                                                       
B                                                                                                                                          
a  0                                                                                                                                       
b  2

flying-sheep · 2015-03-23T20:32:10Z

sorry, i don’t get it. you mean i should add the second code block as exemple to the docs?

jreback · 2015-03-23T20:43:16Z

I would add the groupby method as an alternative as its is another common way of performing this task

flying-sheep · 2015-03-23T21:57:40Z

to which file? indexing.rtf?

jreback · 2015-03-23T22:36:54Z

http://pandas.pydata.org/pandas-docs/stable/indexing.html#duplicate-data
(which is in indexing.rst)

fixes pandas-dev#9708

zydariv · 2020-03-10T19:43:28Z

Where is the problem to just add this functionality?

df2.reset_index().drop_duplicates(subset='B',take_last=False).set_index('B')
looks not really clean to me.

df2.drop_duplicates(subset='index', take_last=False)
would look much cleaner and we could add the reset_index() and set_index() into drop_duplicates()

cheers

jreback · 2020-03-11T11:36:24Z

something like this was already added: #30405

TomAugspurger closed this as completed Mar 23, 2015

TomAugspurger reopened this Mar 23, 2015

jreback added Docs Groupby labels Mar 23, 2015

jreback added this to the Next Major Release milestone Mar 23, 2015

flying-sheep added a commit to flying-sheep/pandas that referenced this issue Mar 24, 2015

Document how to drop duplicate indices

09cd583

fixes pandas-dev#9708

flying-sheep mentioned this issue Mar 24, 2015

Documentation how to drop duplicate indices #9717

Closed

jreback closed this as completed in f7c7ee0 Mar 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to select index in drop_duplicates and duplicated #9708

Allow to select index in drop_duplicates and duplicated #9708

flying-sheep commented Mar 23, 2015 •

edited

Loading

TomAugspurger commented Mar 23, 2015

jreback commented Mar 23, 2015

flying-sheep commented Mar 23, 2015

jreback commented Mar 23, 2015

flying-sheep commented Mar 23, 2015

jreback commented Mar 23, 2015

zydariv commented Mar 10, 2020

jreback commented Mar 11, 2020

Allow to select index in drop_duplicates and duplicated #9708

Allow to select index in drop_duplicates and duplicated #9708

Comments

flying-sheep commented Mar 23, 2015 • edited Loading

TomAugspurger commented Mar 23, 2015

jreback commented Mar 23, 2015

flying-sheep commented Mar 23, 2015

jreback commented Mar 23, 2015

flying-sheep commented Mar 23, 2015

jreback commented Mar 23, 2015

zydariv commented Mar 10, 2020

jreback commented Mar 11, 2020

flying-sheep commented Mar 23, 2015 •

edited

Loading