-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcat container and eltype promotion, internal speedups #747
Conversation
Added an internal _names(df) function to replace names(df) where the implicit copy was unhelpful. Replaced almost every occurence, which would suggest swapping meanings and using copy(names(df)) in those couple places, except that that would make downstream code perhaps too subtle.
Changes Unknown when pulling ee7444a on garborg:vcat2 into * on JuliaStats:master*. |
@@ -35,6 +35,7 @@ Base.next(itr::Cols, st) = (itr.df[st], st + 1) | |||
columns{T <: AbstractDataFrame}(df::T) = Cols{T}(df) | |||
|
|||
Base.names(df::AbstractDataFrame) = names(index(df)) | |||
_names(df::AbstractDataFrame) = _names(index(df)) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of great cleanups here. I don't follow this one. Why is _names
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less copying in general -- salient cases were eachcol()
and the DataFrameRow
iterator being O(ncol^2), copying the vector of colnames once per col, which amounted to a 4x slowdown by 20 cols, and I imagine blew up from there. I'd be up for making names
non-copying, because it's much rarer you want that and copy(names(df))
works in those less common cases -- I'm just guessing that would get pushback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as a side note, I guess I need to stop putting commentary in commit messages -- the compartmentalization is nice, but GitHub doesn't exactly show them off.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as a side reply, please don't stop doing that :) It makes it certainly easier to understand a commit after a PR has been closed. Maybe just include a short statement in the PR message that the commentary is in the commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks Sean! Know I know to look for longer commit messages. You're right that github doesn't make them stand out.
vcat container and eltype promotion, internal speedups
Supersedes and closes #733.