Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add defaultarray function to figure out optimal array type based on the element type #31601

Open
piever opened this issue Apr 3, 2019 · 4 comments
Labels
arrays [a, r, r, a, y, s]

Comments

@piever
Copy link
Contributor

piever commented Apr 3, 2019

When collecting an iterable into a column based table, it is not obvious what array type to use for the various elements. For example CategoricalValue from CategoricalArrays should clearly be collected in a CategoricalArray that is optimized for that type, WeakRefString from WeakRefStrings belongs in a StringArray (which is optimized to store those), DataValue naturally belongs to a DataValueArray etc.

This makes it very hard to write code that would collect an iterable into its "optimized container" without depending on all the above packages (or using Requires like here), which in my view is a design that does not scale. I feel that this could be solved by adding a defaultarray(T, sz) = Array{T}(undef, sz) function in Base that the various packages (CategoricalArrays, WeakRefStrings, DataValueArrays) could then overload. In this way one could write a collect optimized for the element type without any dependency.

@nalimilan
Copy link
Member

In Base itself, it could make sense to have defaultarray(Bool, dims) return a BitArray.

@piever
Copy link
Contributor Author

piever commented Apr 4, 2019

I'm unfamiliar with the broadcasting machinery, but if as I imagine there is something similar to defaultarray there to determine whether to collect things as Array or as BitArray, we could just expose that interface and allow packages to overload it for their custom type (so that broadcasting a function that returns a CategoricalValue would return a CategoricalArray).

@davidanthoff
Copy link
Contributor

This could go further: [a, b, c] array construction could also use this.

@nalimilan
Copy link
Member

I'm unfamiliar with the broadcasting machinery, but if as I imagine there is something similar to defaultarray there to determine whether to collect things as Array or as BitArray, we could just expose that interface and allow packages to overload it for their custom type (so that broadcasting a function that returns a CategoricalValue would return a CategoricalArray).

Good idea. Currently broadcast relies on similar(::Broadcasted{DefaultArrayStyle{N}}, ::Type) for this, but that's really equivalent to using a custom function since it's completely different from similar(::AbstractArray, ::Type).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

No branches or pull requests

4 participants