-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add all
keyword argument to nonunique
#2238
Comments
Makes sense - I often missed this functionality |
The only thing is that now you can write:
To get exactly what you ask for (and immediately see the duplicate rows as separate data frames) |
|
Yeah, I agree it makes sense to maintain the type of the output as a vector of Bools. What could be done is the following: julia> df = DataFrame(a = [1, 2, 2], b = [3, 4, 4]);
julia> nonunique(df; all=true)
3-element Array{Bool,1}:
0
1
1 |
Now as I think of it I would rather do:
this should give you exactly what you want. Right? (the result structure is different, but you get all the information you require) |
Well, your solution above that returns a grouped data frame actually worked well for me. :) julia> df = DataFrame(a = [1, 2, 2, 3], b = [4, 5, 5, 6])
4×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 4 │
│ 2 │ 2 │ 5 │
│ 3 │ 2 │ 5 │
│ 4 │ 3 │ 6 │
julia> groupindices(groupby(df, :a))
4-element Array{Union{Missing, Int64},1}:
1
2
2
3 If one were to use julia> df = DataFrame(a = [1, 2, 2, 3], b = [4, 5, 5, 6]);
julia> df[nonunique(df; all=true), :]
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 2 │ 5 │
│ 2 │ 2 │ 5 │ |
We could define
example:
Do we think it is a useful addition? |
Fixed in #3260 |
Often when you use
nonunique(df, cols)
, you want to be able to look at the rows that are non-unique according tocols
to see if there are differences in the columns other thancols
. It would be handy if there were anall
keyword argument tononunique
that returns all the duplicates. (Right now the first occurrence of a row is not included in the output.) Ifall
is true, it might actually make more sense to return a vector of vectors of indices, like this:The text was updated successfully, but these errors were encountered: