-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate DataFrame(::AbstractMatrix)
#2433
Comments
This is breaking and also there are a lot of scenarios when this is used. However, I think we can do one of two things:
Now regarding AxisKeys.jl on master it fails to pass What would be your preference here? |
Yeah, I think (1) is just trading one special case for another which defeats the point. From a user perspective I'm largely indifferent with (2) and (3). If you prefer (3) then I'm fine with adding a
I don't think that's true. I think Tables.jl handles that case for you if you've already registered the type.
|
OK - I will then go for option 2, so you do not have to do anything 😄. Except, the problem I reported:
The reason is the following definition in Tables.jl:
which is not overriden in AxisKeys.jl. |
Alright, thank you 👍 Again, I'm really not bothered if you'd prefer (3) from a maintainability standpoint. Oh, sorry, I didn't realize there was a special case fallback for specifically |
I just think @quinnj added it. I am OK with going for option (2) so let us go this way. I will ping you in the PR |
It sounds a bit weird to do something completely different for some EDIT: to complement this, KeyedArrays' README says
Also see previous discussion on similar issue in NamedArrays: davidavdav/NamedArrays.jl#76. |
@nalimilan - I think there are two issues:
|
Yes, right. I think your PR is right on its own (as I guess we can consider that the fact that columns names are Now the problem regarding 2. is that AFAICT it's perfectly legitimate for AxisKeys to implement the Tables.jl interface as they do (reshaping), since Tables.jl doesn't treat |
we have Given your comment I would think that what I propose in #2435 is OK (i.e. by default do what we did in the past, but if some |
Sorry, but after thinking more about this I'm really not happy with the fact that Consistency and predictability are among our strengths, so it's worth trying to preserve these. So I think we need a way to make our API unambiguous about the meaning of the input argument. I'm not sure yet what's the best solution though. It could be using As I said, I think it would also be useful to have a way to insert a column containing the row names, and use the |
So in short you say: let us deprecate (note that also If there is a consensus we prefer this (CC @oxinabox @pdeffebach) then I will not object, but this will create a small inconvenience. For me - in general - it is natural that a matrix is converted to a data frame the way we do it currently. |
I don't think of a matrix as collection of columns (or as a collection of rows) but rather as its own thing,
|
Fair enough. If there will be no more comments on this I will deprecate both methods (with and without column names passed). In particular I agree that:
and
are explicit. |
Regarding the I don't have a strong opinion regarding |
We cannot widen the signature as
It is, but it produces different column names, therefore the deprecation message should be with |
I think
This comes from doing a bunch of calculations where everything is a matrix, for model fitting, and then turning the big matrices back to data-frames to report results. This seems like a common use-case, no? |
Yes, I was only thinking of allowing @pdeffebach Nobody is proposing to remove the ability to build a data frame from a matrix -- only to use a slightly more verbose syntax. Does that sounds OK to you? The dplyr example you give sounds quite verbose too. :-p |
The only problem is that using The alternative route would be to say that We have such a situation in other cases already, e.g.:
EDIT: the question is what happens more often: wanting matrix to be converted into |
If others are on board then I think this is okay. I'm wary of exposing a So I have a preference for |
We could change that in Tables: accepting strings sounds reasonable. But as long as we don't use it for the deprecation it's not the end of the world if it's not a perfect replacement. Anyway I agree that we should weigh the pros and cons of each solution, depending on which case is the most commonly needed. However, apart from that criterion, I think the most consistent approach is to treat as many inputs as possible as Tables.jl objects. This isn't possible in all cases as @bkamins showed, because of our basic constructors, but we can reduce to a minimum the number of exceptions, so that you know that when you have a Tables.jl object is will be treated as such, whatever its exact type. |
Agreed, but still we will have "some" exceptions and the question is if |
This are more or less fully distinct and clear I don't think we need to mention using Tables.jl on a Matrix in the first place, since that is just a special case of the ubiquitious table sink constructor and would be assumed to work by anyone who even knows what tables are. |
Thats a good point I hadn't thought of. We can just throw an error and people can be aware of what a table is and figure it out if they want a matrix. I'm on board.
Constructing names automatically is annoying, so its nice to provide a way to construct a data frame where you dont care about the names. |
Let me give you a full menagerie that we support. I divided it into three groups: Will stay
Can stay (rarely useful, but do not hurt)
Last time to decide to drop it (these are most disputable)
|
FWIW, that seems reasonable to me... though I mostly only use the pairs and tables constructors. Shouldn't this constructor be addressed by the tables interface?
Isn't that just a rowtable? |
Yes; that is indeed handled by Tables.jl interface; it's just needed to avoid dispatching to some of the other |
Thanks for the list. So in the "disputable" list we have a series of constructors that accept tuples, similar to constructors that accept vectors, right? Not sure how useful they are but they probably don't do much harm either. AFAICT, the only ones conflicting with the interpretation of the input as a Table are the |
I would deprecate all the ones in the "maybe" category except for |
So my thinking and understanding is:
So I feel that only 3. is disputable (and that is why I put it in the second group). Can you give some more feedback here given these considerations please? |
I would get rid of the eltype one also. I think Is clearest.
works (right?) and often is what they really wanted. |
works, we call it pseudo-broadcasting, https://bkamins.github.io/julialang/2020/09/13/pseudobroadcasting.html. OK - so I will wait for 1-2 weeks for more feedback. If nothing changes we also prune the |
See #2464 for a fix. |
I feel like it's rare that folks actually want to call this constructor without specifying column names. Also, it isn't that hard to do this with the tables interface via
DataFrame(Tables.table(X))
.The benefit with dropping this constructor is that it would allow more inputs to fall back to the tables constructor, including array types that implement the tables interface like AxisKeys.jl. Currently, you need to special case
DataFrame(A)
toDataFrame(Tables.columns(A))
only whenA
is 2 dimensional.The text was updated successfully, but these errors were encountered: