Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update join #536

Merged
merged 2 commits into from
Feb 9, 2014
Merged

Update join #536

merged 2 commits into from
Feb 9, 2014

Conversation

garborg
Copy link
Contributor

@garborg garborg commented Feb 8, 2014

Deprecate natural joins (user must specify join key).
Add kind = :cross.

This is a product of discussion in #531.

end

function addx!(d::DataFrame, x::DataFrame, times::Int, each::Int)
for c in x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we explicitly iterate over columns? This kind of iteration is something I'd like to remove from DataFrames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, but I'm still fuzzy on what functions should and shouldn't access. Like this?

        for n in names(x)
            d[n] = rep(x[n], times, each)
        end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest using eachcol, but I just realized that it doesn't give the column names.

What about giving eachcol the same behavior as the default iterator (i.e., returning a tuple of (colname, coldata)), and deprecating the default iterator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like Kevin's suggestion.

@garborg garborg mentioned this pull request Feb 8, 2014
@garborg
Copy link
Contributor Author

garborg commented Feb 9, 2014

I implemented @kmsquire's suggestion for eachcol.

@johnmyleswhite Natural joins are now being deprecated. And join is now a wrapper here, to avoid having a union-typed argument to the main function -- it's 1% slower for big data sets, 20% slower for little data sets, but I left in in for now so you could see what I did (I don't know of any other way because you can't have two methods that vary only by their kwargs).

@garborg
Copy link
Contributor Author

garborg commented Feb 9, 2014

I switched join from being a wrapper back to the original form to make it ready to merge. Seems good to me.

@johnmyleswhite
Copy link
Contributor

Let's merge this.

One way we could handle the dispatch issue that prompted your wrapping is to make on a positional argument, which is reasonable now that it's required.

johnmyleswhite added a commit that referenced this pull request Feb 9, 2014
@johnmyleswhite johnmyleswhite merged commit fba8b12 into JuliaData:master Feb 9, 2014
@johnmyleswhite
Copy link
Contributor

Thanks for doing this, Sean! And thanks for slugging through all my comments.

@garborg
Copy link
Contributor Author

garborg commented Feb 9, 2014

Of course!

@nalimilan
Copy link
Member

Nice addition indeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants