Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcat container and eltype promotion, internal speedups #747

Merged
merged 4 commits into from
Dec 23, 2014

Conversation

garborg
Copy link
Contributor

@garborg garborg commented Dec 23, 2014

Supersedes and closes #733.

Added an internal _names(df) function to replace names(df) where the implicit
copy was unhelpful. Replaced almost every occurence, which would suggest
swapping meanings and using copy(names(df)) in those couple places, except
that that would make downstream code perhaps too subtle.
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling ee7444a on garborg:vcat2 into * on JuliaStats:master*.

@@ -35,6 +35,7 @@ Base.next(itr::Cols, st) = (itr.df[st], st + 1)
columns{T <: AbstractDataFrame}(df::T) = Cols{T}(df)

Base.names(df::AbstractDataFrame) = names(index(df))
_names(df::AbstractDataFrame) = _names(index(df))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of great cleanups here. I don't follow this one. Why is _names needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less copying in general -- salient cases were eachcol() and the DataFrameRow iterator being O(ncol^2), copying the vector of colnames once per col, which amounted to a 4x slowdown by 20 cols, and I imagine blew up from there. I'd be up for making names non-copying, because it's much rarer you want that and copy(names(df)) works in those less common cases -- I'm just guessing that would get pushback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as a side note, I guess I need to stop putting commentary in commit messages -- the compartmentalization is nice, but GitHub doesn't exactly show them off.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as a side reply, please don't stop doing that :) It makes it certainly easier to understand a commit after a PR has been closed. Maybe just include a short statement in the PR message that the commentary is in the commit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vchuravy Much better solution -- will do.

@tshort Thanks for reviewing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks Sean! Know I know to look for longer commit messages. You're right that github doesn't make them stand out.

garborg added a commit that referenced this pull request Dec 23, 2014
vcat container and eltype promotion, internal speedups
@garborg garborg merged commit a0eb194 into JuliaData:master Dec 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants