-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
length(::DataFrame) returns number of columns #1200
Comments
cc: @ararslan @andyferris |
Yeah this is really weird. We probably shouldn't define |
There was debate about this when it was first added, mostly between those coming from an R background (to whom, I think, the current definition made sense) and those coming from a Pandas background (where length is the number of rows). So what makes the most sense probably depends on what you've used before. |
Having it be inconsistent between languages is another reason not to define it here, IMO. Then it confuses no one. 🙂 |
If we want to think of a dataframe in the relational algebra sense (as a collection of named tuples, i.e. rows), then iterating over rows and having There has been a lot of discussion about this surrounding Jeff's |
Given that more descriptive methods such as |
It goes with iteration, so if you can't iterate a |
I'm not sure |
|
|
Yeah, I recall that confusing me the first time I used dataframes cause I figured |
Okay, so the plan as I understand it:
|
Actually I'm afraid removing the |
Yes, but pandas determines whether that is a row or col based on what you give it. >>> df = pandas.DataFrame({ 'A' : 1., 'B' : pandas.Series(1,index=list(range(4)),dtype='float32'),})
>>> df
A B
0 1.0 1.0
1 1.0 1.0
2 1.0 1.0
3 1.0 1.0
>>> df[:1]
A B
0 1.0 1.0
>>> df["A"]
0 1.0
1 1.0
2 1.0
3 1.0
Name: A, dtype: float64 If we restricted column names to |
By "linear indexing," I meant specifically with a number. It's not immediately obvious what |
Interesting. Honestly, I find Pandas' behavior really confusing: returning either a row or a column depending on the argument type is too clever for my taste. We could stop supporting OTOH we can deprecate |
Ok, PR up at #1224. Deprecates |
@nalimilan I love pandas for being that clever 😄 There is some stuff which just seems weird to me in DataFrames.jl |
The policy general followed by Julia packages is to try to find a consistent design which makes sense for users once they are familiar with the package. We don't generally support features just because they sound "natural" to people used to other software (but of course we prefer being consistent when that doesn't hurt). Also there are lots of people coming from other software (e.g. R/dplyr/data.table), and what they find "natural" is often mutually exclusive. I think the way forward here is that once field overloading is available in Base (JuliaLang/julia#24960), we deprecate |
First of all I agree that overloading will make it easier and the general policy is reasonable. I'm wondering whether it is necessary to not support |
Does assigning to a field work in
the way one can now do
? I know in Also, in a similar vein, note that the dot-field notation causes problems with spaces in column names that are easier to address with the current |
Deprecating |
On Julia 0.7 you can use |
I feel like i rarely work with the symbols themselves. All of my cleaning is in |
IMHO I'm of a similar view as @pdeffebach. My view is that (a) pulling out one column is common enough we need a compact way to do it, and (b) I don't think the dot-field notation is a good substitute for the square-bracket-column-symbol notation. The problem, in my view, is that dot-field notation is fine for objects with stable field names (like So I think we should keep support for |
If we support |
OK -- I'm totally ok with using square-brackets as "indexing into columns". I just meant I have stronger feelings about losing ability to use symbols than losing ability to do numeric indexing into columns. @nalimilan You've sold me on not doing something pandas-like with sometimes-row-indexing. :) (EDITS: lots of sloppy typos) |
I've been playing around with a With the way dataframes is set up, it's difficult to make this performant, since we will have to collect (maybe not with This is fine, because row-wise operations, while I think important enough to live in DataFrames, are relatively uncommon, and DataFrame's structure is well-optimized for column-oriented operations, which is the dominant use-case. I guess my point is that if people expect something that acts on rows to be as easy and fast as |
Currently calling
length
on aDataFrame
returns the number of columns. This is strange aslength
usually returns the number of elements.The text was updated successfully, but these errors were encountered: