Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fix vcat to handle type promotion and preserve PDAs #733

Closed
wants to merge 1 commit into from

Conversation

garborg
Copy link
Contributor

@garborg garborg commented Nov 30, 2014

Compared to the alternatives outlined in the gist in #732, this is looking 4-80x faster (contingent on a forthcoming edit to copy! in DataArrays).

@tshort
Copy link
Contributor

tshort commented Dec 1, 2014

Thanks, Sean! Stefan used to jokingly suggest submitting half-baked solutions given that might spur Jeff to "implement it right". Now I feel like that guy:)

This should take care of the type promotion. I was also hoping that vcat for columns would take care of promotion of column containers. For example, if you vcat columns that are both PooledDataArrays, the result will be a DataArray. The same is true for Arrays and other AbstractArray types. I don't see any mechanism for that in the DataArray package, however, so that seems like a bigger challenge.

@garborg
Copy link
Contributor Author

garborg commented Dec 1, 2014

Oh yes, I had thought about that for combining Nullable/DataArrays with other AbstractArrays, but had totally forgotten about PooledDataArrays, which seems more immediate. I'm interested to see if container promotion lands in base soon -- I'm not sure what we should do with corner cases (e.g. promote_type(DataArray, PooledDataArray), promote_type(NullableArray, DataArray)) -- ideas? Maybe for now, just calling similar on the first AbstractDataArray container type if found, otherwise, on the first container type?

@nalimilan
Copy link
Member

Maybe for now, just calling similar on the first AbstractDataArray container type if found, otherwise, on the first container type?

I think it would be more reliable to define promotion rules. DataArray and PooledDataArray should give a DataArray, as you can assume that if it made sense to store the DataArray as a PooledDataArray, the user who have converted it (e.g. it contains too many different values).

As for DataArray and NullableArray, shouldn't one replace the other at some point anyway?

@garborg
Copy link
Contributor Author

garborg commented Dec 1, 2014

Makes sense re: DataArray and PooledDataArray, makes sense -- I was actually thinking of the hypothetical case where what I called PooledDataArray was something holding cardinal or ordinal data.

Re: DataArray & NullableArray, maybe, but generally, there will be more container types that don't exist yet, and I suppose I just can't picture a world where they'll all work seamlessly together. I wanted to say we should throw an error in cases where promote_type falls back to an abstract type, but that would be wrong given two different AbstractArray subtypes for which similar returns a Array.

@johnmyleswhite
Copy link
Contributor

Yes, DataArray needs to be completely removed once NullableArray arrives. There's no way that NA and Nullable{T}() can safely co-exist.

@garborg
Copy link
Contributor Author

garborg commented Dec 1, 2014

Updated to preserve PDAs. As with the other proposed solutions, but unlike before, now behaves like vcat for Base and DataArrays -- i.e. vcat([1, 2], @data([NA, 4]) fails.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) when pulling 2293996 on vcat into 23e090d on master.

@garborg garborg changed the title Fix vcat to handle type promotion Fix vcat to handle type promotion and preserve PDAs Dec 1, 2014
@nalimilan
Copy link
Member

@garborg Why should it fail? Rather, I think that combining Array and DataArray should always give a DataArray, just like promoting a float and a complex gives a complex. The more general type should win, as it's the only one able to hold values from both types.

@garborg
Copy link
Contributor Author

garborg commented Dec 1, 2014

Agreed that it probably should work, but everywhere. We've always gone with parity with Base methods -- it should not fail on DataFrames if it should not fail on vcat(::Array, ::DataArray). Tom has opened a DataArrays issue for that, though it may be a Base issue. Holding up the the ability to promote eltypes or accept DataFrames that have PooledDataArrays (which work in other vcat implementations) seems like putting the burden in the wrong place.

@garborg
Copy link
Contributor Author

garborg commented Dec 1, 2014

(P.S. Tests have already been added, commented out, for if/when it works universally, in anticipation of someone pushing that through.)

@garborg garborg changed the title Fix vcat to handle type promotion and preserve PDAs WIP: Fix vcat to handle type promotion and preserve PDAs Dec 2, 2014
@garborg garborg closed this in #747 Dec 23, 2014
@garborg garborg deleted the vcat branch January 1, 2015 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants