-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emulating Stata's rowtotal #2161
Comments
That's actually one situation where passing a named tuple is more useful since it stores the information about the column eltype: julia> sum(skipmissing(NamedTuple{(:a,:b), Tuple{Union{Int,Missing}, Union{Int,Missing}}}((missing, missing))))
0 The plan is to allow this via It's not very intuitive that you'd like to request names in order to be able to compute the sum -- maybe the name could be more general, to indicate that you get an object representing a row. Also it has the drawback that the function needs to be recompiled for each set of names even if you don't care about them. |
That's an interesting solution. It's a shame this doesn't work for normal
I do feel like this is going to come up a lot and making someone make a named tuple is a bit esoteric for new data-oriented users. |
I will also note that the behavior you describe doesn't work for a You can't even get around it by |
or (but this will be slower)
in both cases I assume it is relatively clear what value |
Thanks for this response. I was not aware of the Is the fact that Tuples can't have I will file a separate issue for DataFrameRow, though. |
That's due to covariance of tuple types, and that would be pretty hard to change. I think it's been considered, but not before 2.0 at the very least, and that would probably introduce lots of problems. What would a hypothetical |
I guess
where conversion to or maybe even just:
would be good enough? |
I'm not proposing to write a function called |
Here is a good idea that might solve this:
EDIT: This would only work for |
I think @nalimilan's solution of wrapping it in a NamedTuple is the easiest solution for now... Was there any discussion of having it be the default? And then having the user To which @bkamins replied No, I mean that named tuple
This was heavily discussed and the conclusion was that splatting by default is better because:
Still, we want to provide that option. We just need a good name for the wrapper. |
I wonder if the naming should be done as an alternative to |
I think that would a good solution, yes. I'm assuming that this would iterate over rows of |
|
it is just a question if we want to have a special function providing this functionality (or just assume that users would use the long form) |
Yes I see. I forgot that one still might want to pass a named tuple of columns. That syntax wouldn't just work for rows.
I agree this would be easy to do. However I would really like to get a solution that works elegantly in the new I will try and think of a better name for |
I had no idea that this works. this is the functionality of
|
I'm seeing the value in Additionally, the only function that you would both apply I still think that maybe the following could work
|
Just to be clear - the conclusion is that if we add an option to pass PS. my current proposal for the name of the wrapper is |
clused with the new |
I have been playing around with the new
select
merged into master. And inspired by this discourse thread I have been trying to emulate Stata'srowtotal
command.Stata's
rowtotal
fromegen
looks likeWhere it skips missing values and if all values are missing, sets the result to
missing
.In julia, we almost have this behavior with
skipmissing
except the sum would be0
instead of missing, which is fine. It's more consistent anyways and can be achieved with some keyword arguments inegen
.In the new
select
function, I tried to write the following and got an error.The correct syntax is
The error comes from the fact that the anonymous function written above only accepts one input argument.
We can't write
because there is no way to pass
skipmissing
s to this.The splatting is a bit awkward and there could be a performance penalty to making a
100
-length Tuple. But I don't know how the compiler is handling these cases.cc @juliohmcc @jmboehm
EDIT: Any discussion has to also consider the merits of a more complicated
transform
over something likeEDIT: There are more concerns when we have an empty collection. Note that
works because
Base.mapreduce_empty
has a method forInt
.However
will fail because the collection is empty and there is no "backup" eltype. The latter is what will be called with the working
select
method above. Something likewill also fail because
collect(x)
will returnMissing[missing, missing]
which has no "backup" eltype.I can't think of any obvious solutions that don't rely on a lot of machinery to make a
Vector{Union{T, Missing}}
of arow
and yet somehow also know whatT
is.The text was updated successfully, but these errors were encountered: