Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kwarg to choose missing values for unstack #2205

Closed
kescobo opened this issue Apr 23, 2020 · 6 comments
Closed

Kwarg to choose missing values for unstack #2205

kescobo opened this issue Apr 23, 2020 · 6 comments
Labels
feature non-breaking The proposed change is not breaking
Milestone

Comments

@kescobo
Copy link
Contributor

kescobo commented Apr 23, 2020

Current behavior:

julia> df = DataFrame(key=[:a,:a,:b,:b], variable=["x","y","x","z"], value=rand(4));

julia> wide = unstack(df, :variable, :key, :value)
3×3 DataFrame
│ Row │ variable │ a        │ b        │
│     │ String   │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼──────────┤
│ 1   │ x        │ 0.4640250.672549 │
│ 2   │ y        │ 0.570984missing  │
│ 3   │ z        │ missing0.400742

What I'd like to get out instead:

julia> for col in names(wide)[Not(1)]
           wide[!,col] = disallowmissing(replace(wide[!,col], missing=>0.))
       end; wide
3×3 DataFrame
│ Row │ variable │ a        │ b        │
│     │ String   │ Float64  │ Float64  │
├─────┼──────────┼──────────┼──────────┤
│ 1   │ x        │ 0.464025 │ 0.672549 │
│ 2   │ y        │ 0.570984 │ 0.0      │
│ 3   │ z        │ 0.0      │ 0.400742 │

Keyword suggestions: missing_values, fillmissing (I don't love either of these, but thought it would be good to get the ball rolling)

@pdeffebach
Copy link
Contributor

This is a good idea. I ran into an issue recently where freqtables was slow (I can't reproduce right now), but a workflow using by and unstack was fast. But then you had to fill in 0 for missing values.

@bkamins bkamins added feature non-breaking The proposed change is not breaking labels Apr 23, 2020
@bkamins bkamins added this to the 1.x milestone Apr 23, 2020
@kescobo
Copy link
Contributor Author

kescobo commented Apr 23, 2020

Consideration that came up on slack: missings that were already in the data would be replaced with my version (and the more concise version from @alejandromerchan: coalesce.(unstack(df, :variable, :key, :value), 0)). If we only want to replace things that arise during the unstack, here's a different MWE to start:

julia> df = DataFrame(key=[:a,:a,:a,:b,:b], variable=["x","y","m","x","z"], value=[.1,.2,missing,.3,.4]);

And we'd want

julia> wide = unstack(df, :variable, :key, :value, missing_values=0.) # hypothetically
4×3 DataFrame
│ Row │ variable │ a        │ b        │
│     │ String   │ Float64⍰ │ Float64  │
├─────┼──────────┼──────────┼──────────┤
│ 1   │ m        │ missing  │ 0.0      │
│ 2   │ x        │ 0.1      │ 0.3      │
│ 3   │ y        │ 0.2      │ 0.0      │
│ 4   │ z        │ 0.       │ 0.4      │

Though actually, I'm not sure what we'd want at wide[1, :b] in this case...

@bkamins
Copy link
Member

bkamins commented Apr 23, 2020

The same pattern was already discussed in #1864, so we would be consistent between the functions (that PR is currently extremely hard to look at but I hope you will see the idea).

@kescobo
Copy link
Contributor Author

kescobo commented Apr 23, 2020

Oh, awesome - if the pattern has been considered and consensus reached, as long as it's consistent I think that's great. I don't have a strong intuition about it.

@metanoid
Copy link

metanoid commented Jul 1, 2020

I'd prefer something like filldefault or default instead of fillmissing because I think that makes it clearer that we're only inserting the default value where the row was omitted from the source df, not overwriting existing missings in the source df.

@bkamins
Copy link
Member

bkamins commented Oct 15, 2022

This is now supported with fill kwarg

@bkamins bkamins closed this as completed Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature non-breaking The proposed change is not breaking
Projects
None yet
Development

No branches or pull requests

4 participants