`isempty` checks number of columns, rather than number of rows #1230

spurll · 2017-09-11T18:34:32Z

Related to #1200.

Compare:

julia> arr = Array{Any}((0,2))
0×2 Array{Any,2}

julia> isempty(arr)
true

with

julia> df = DataFrame(col1=[], col2=[])
0×2 DataFrames.DataFrame


julia> isempty(df)
false

It seems clear to me that a DataFrame with zero rows should be considered empty.

FWIW, @quinnj's PR from a few days ago #1224 fixes this as well, but this particular issue seems less controversial than how we define length.

The text was updated successfully, but these errors were encountered:

quinnj · 2017-09-11T19:04:41Z

I think it's got to be a wholesale switch from viewing DataFrames as a columnar datastore to more of a "bag of tuples" definition (without needing to change the underlying actual representation obviously). The current view I think grew out of viewing a DataFrame as a "data-smart" Matrix (Array{T, 2}), which is column-oriented.

rofinn · 2017-09-11T19:16:25Z

I'd argue that the current behaviour is still broken even from the "data-smart" Matrix perspective though.

nalimilan · 2017-09-11T21:27:50Z

Yes, what justifies the current behavior is rather the definition of DataFrame as a vector of columns (which also justifies the behavior of df[i] and of length). That's clearly not very natural, the only issue with getting rid of this representation is that df[i] is quite convenient compared to df[:, i]. Maybe we could keep it even if we fix isempty and length.

rofinn · 2017-09-11T21:38:05Z

@nalimilan I don't want to derail this issue, but when (or how often) do you want to get a column based in its integer index? I don't think I've ever wanted to do that, but that could just be my use cases.

quinnj · 2017-09-11T21:43:46Z

FWIW, I do column integer-indexing all the time, but that's because in my workflows, I code towards integer indexing instead of symbol indexing; in my mind it's faster because I can avoid the extra indirection lookup of symbol=>integer, but that extra cost is probably negligible in production. Anyway, I'd re-iterate again though that I think we need to commit to either a column-oriented or row-oriented representation, regardless of the internal implementation. Currently things are (mostly) consistent for a column-orientation, but there's obviously some desire to switch that. For example, if we switch to a row-orientation, I would definitely expect df[1] to give me the first row instead of the first column.

ararslan · 2017-09-11T21:45:15Z

Either way, isn't it clearer to write df[:,i] and df[i,:] so it's immediately obvious what you're asking for?

nalimilan · 2017-09-11T21:52:46Z

@nalimilan I don't want to derail this issue, but when (or how often) do you want to get a column based in its integer index? I don't think I've ever wanted to do that, but that could just be my use cases.

@rofinn i wasn't necessarily an integer index in my example, it could have been a symbol too. I think the problem is the same.

@ararslan I agree df[:, i] is clearer than df[i], but it's less convenient to type, which is annoying since that's sometime you need to type all the time. Maybe with things like Query it shouldn't be as common as in R, though. If we get field overloading, we could use df.i instead (which I think is the reason why column names are required to be valid identifiers).

nalimilan · 2017-09-11T22:01:22Z

Anyway, I'd re-iterate again though that I think we need to commit to either a column-oriented or row-oriented representation, regardless of the internal implementation. Currently things are (mostly) consistent for a column-orientation, but there's obviously some desire to switch that. For example, if we switch to a row-orientation, I would definitely expect df[1] to give me the first row instead of the first column.

Seeing how people disagree on what's the most natural orientation, I'd rather make DataFrame orientation-agnostic and require people to be explicit about what they want, e.g. using for r in eachrow(df) / for r in eachcol(df) or df[i, :]/df[i, :].

ararslan · 2017-09-11T22:15:42Z

Discussion of notation aside, I think a 0-row DataFrame should be isempty regardless of whether DataFramess are column- or row-oriented. It seems a natural definition of emptiness, even if it doesn't correspond directly to length(df) == 0 (though when that's true we'd also necessarily have isempty).

spurll · 2017-09-12T01:19:11Z

I could put together a separate PR to address this in the morning, if there's interest.

Fixes JuliaData#1230.

rofinn · 2017-09-12T01:33:35Z

Oops, I missed your comment @spurll.

spurll · 2017-09-12T01:34:54Z

Hey, I would have done it, but I'm chairing a board meeting right now.

rofinn added a commit to rofinn/DataFrames.jl that referenced this issue Sep 12, 2017

isempty(df) should return true if either dimension == 0.

50c7917

Fixes JuliaData#1230.

rofinn mentioned this issue Sep 12, 2017

isempty(df) should return true if either dimension == 0. #1231

Merged

nalimilan closed this as completed in #1231 Sep 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`isempty` checks number of columns, rather than number of rows #1230

`isempty` checks number of columns, rather than number of rows #1230

spurll commented Sep 11, 2017 •

edited

Loading

quinnj commented Sep 11, 2017

rofinn commented Sep 11, 2017

nalimilan commented Sep 11, 2017

rofinn commented Sep 11, 2017 •

edited

Loading

quinnj commented Sep 11, 2017

ararslan commented Sep 11, 2017

nalimilan commented Sep 11, 2017

nalimilan commented Sep 11, 2017

ararslan commented Sep 11, 2017

spurll commented Sep 12, 2017

rofinn commented Sep 12, 2017

spurll commented Sep 12, 2017

isempty checks number of columns, rather than number of rows #1230

isempty checks number of columns, rather than number of rows #1230

Comments

spurll commented Sep 11, 2017 • edited Loading

quinnj commented Sep 11, 2017

rofinn commented Sep 11, 2017

nalimilan commented Sep 11, 2017

rofinn commented Sep 11, 2017 • edited Loading

quinnj commented Sep 11, 2017

ararslan commented Sep 11, 2017

nalimilan commented Sep 11, 2017

nalimilan commented Sep 11, 2017

ararslan commented Sep 11, 2017

spurll commented Sep 12, 2017

rofinn commented Sep 12, 2017

spurll commented Sep 12, 2017

`isempty` checks number of columns, rather than number of rows #1230

`isempty` checks number of columns, rather than number of rows #1230

spurll commented Sep 11, 2017 •

edited

Loading

rofinn commented Sep 11, 2017 •

edited

Loading